Weeding out Duplicate files in Hardy Heron 8.04

Patton Echols p.echols at comcast.net
Sun Jan 17 22:51:36 UTC 2010


On 01/17/2010 01:55 PM, Loïc Grenié wrote:
> 2010/1/17 Patton Echols <p.echols at comcast.net>:
>   
>> On 01/15/2010 06:47 AM, Loïc Grenié wrote:
>>     
>>> 2010/1/15 Rafiq Hajat <ipi.malawi at gmail.com>:
>>>
>>>       
>>>> Hi,
>>>> I've been transferring all my music (MP3, OGG, FLAC, M4A, WMA) into my
>>>> laptop and have noticed numerous duplications of the same song. The
>>>> directory size is now 16GB and it's a nightmare to to collate and
>>>> categorise. Any ideas on how to get rid of the duplications?
>>>>         
>>>      If the files are the same you can use md5sum
>>>
>>>         Loic
>>>       
>> I can see how you would use md5sum to see if the files are the same, but
>> how would you use it to remove duplicates?
>>     
>
>    If your files are in a directory /home/dir/mm, you can try the
>  following:
>
> find /home/dir/mm -type f -print0 | xargs -0 md5sum > /tmp/files
>
>   (all has to be typed on a single line) and then
>
> sort /tmp/files | uniq -w 32 -D > /tmp/dupes
>
>  (all has to be typed on a single line). In /tmp/dupes you have
>  the list of duplicate files preprended with its md5 sum. You
>  can look at the list using
>
> less /tmp/dupes
>
>  (type q to quit). If the duplicate list is not too big, you can
>  remove all but one of each "duplicate" (that way you can also
>  visually check if these are names of real duplicate). If you
>  want something more automatic you can try the following:
>
> sum=z;while read md name;
> do
> if [ "$md" = "$sum" ]; then
> rm "$name"
> else
> sum="$md"
> fi
> done < /tmp/dupes
>
>  Warning: if automatically erases files, it may be dangerous.
>
>     Hope this helps,
>
>          Loïc
>
>   

Ok, as I suspected, md5sum is a tool in the process (depending on 
several other tools)  Interesting method. 

There is a tutorial on fsdups I found here:
http://dosnlinux.wordpress.com/2007/02/18/fdupes-tutorial/

that has a similar workflow. 

Thanks for the reply.  It was educational.




More information about the ubuntu-users mailing list