Weeding out Duplicate files in Hardy Heron 8.04
Patton Echols
p.echols at comcast.net
Sun Jan 17 22:51:36 UTC 2010
On 01/17/2010 01:55 PM, Loïc Grenié wrote:
> 2010/1/17 Patton Echols <p.echols at comcast.net>:
>
>> On 01/15/2010 06:47 AM, Loïc Grenié wrote:
>>
>>> 2010/1/15 Rafiq Hajat <ipi.malawi at gmail.com>:
>>>
>>>
>>>> Hi,
>>>> I've been transferring all my music (MP3, OGG, FLAC, M4A, WMA) into my
>>>> laptop and have noticed numerous duplications of the same song. The
>>>> directory size is now 16GB and it's a nightmare to to collate and
>>>> categorise. Any ideas on how to get rid of the duplications?
>>>>
>>> If the files are the same you can use md5sum
>>>
>>> Loic
>>>
>> I can see how you would use md5sum to see if the files are the same, but
>> how would you use it to remove duplicates?
>>
>
> If your files are in a directory /home/dir/mm, you can try the
> following:
>
> find /home/dir/mm -type f -print0 | xargs -0 md5sum > /tmp/files
>
> (all has to be typed on a single line) and then
>
> sort /tmp/files | uniq -w 32 -D > /tmp/dupes
>
> (all has to be typed on a single line). In /tmp/dupes you have
> the list of duplicate files preprended with its md5 sum. You
> can look at the list using
>
> less /tmp/dupes
>
> (type q to quit). If the duplicate list is not too big, you can
> remove all but one of each "duplicate" (that way you can also
> visually check if these are names of real duplicate). If you
> want something more automatic you can try the following:
>
> sum=z;while read md name;
> do
> if [ "$md" = "$sum" ]; then
> rm "$name"
> else
> sum="$md"
> fi
> done < /tmp/dupes
>
> Warning: if automatically erases files, it may be dangerous.
>
> Hope this helps,
>
> Loïc
>
>
Ok, as I suspected, md5sum is a tool in the process (depending on
several other tools) Interesting method.
There is a tutorial on fsdups I found here:
http://dosnlinux.wordpress.com/2007/02/18/fdupes-tutorial/
that has a similar workflow.
Thanks for the reply. It was educational.
More information about the ubuntu-users
mailing list