Weeding out Duplicate files in Hardy Heron 8.04

Loïc Grenié loic.grenie at gmail.com
Sun Jan 17 21:55:21 UTC 2010


2010/1/17 Patton Echols <p.echols at comcast.net>:
> On 01/15/2010 06:47 AM, Loïc Grenié wrote:
>> 2010/1/15 Rafiq Hajat <ipi.malawi at gmail.com>:
>>
>>> Hi,
>>> I've been transferring all my music (MP3, OGG, FLAC, M4A, WMA) into my
>>> laptop and have noticed numerous duplications of the same song. The
>>> directory size is now 16GB and it's a nightmare to to collate and
>>> categorise. Any ideas on how to get rid of the duplications?
>>
>>      If the files are the same you can use md5sum
>>
>>         Loic
>
> I can see how you would use md5sum to see if the files are the same, but
> how would you use it to remove duplicates?

   If your files are in a directory /home/dir/mm, you can try the
 following:

find /home/dir/mm -type f -print0 | xargs -0 md5sum > /tmp/files

  (all has to be typed on a single line) and then

sort /tmp/files | uniq -w 32 -D > /tmp/dupes

 (all has to be typed on a single line). In /tmp/dupes you have
 the list of duplicate files preprended with its md5 sum. You
 can look at the list using

less /tmp/dupes

 (type q to quit). If the duplicate list is not too big, you can
 remove all but one of each "duplicate" (that way you can also
 visually check if these are names of real duplicate). If you
 want something more automatic you can try the following:

sum=z;while read md name;
do
if [ "$md" = "$sum" ]; then
rm "$name"
else
sum="$md"
fi
done < /tmp/dupes

 Warning: if automatically erases files, it may be dangerous.

    Hope this helps,

         Loïc




More information about the ubuntu-users mailing list