Weeding out Duplicate files in Hardy Heron 8.04

Rafiq Hajat ipi.malawi at gmail.com
Wed Jan 20 14:18:27 UTC 2010


   If your files are in a directory /home/dir/mm, you can try the
 following:

find /home/dir/mm -type f -print0 | xargs -0 md5sum > /tmp/files

  (all has to be typed on a single line) and then

sort /tmp/files | uniq -w 32 -D > /tmp/dupes

 (all has to be typed on a single line). In /tmp/dupes you have
 the list of duplicate files preprended with its md5 sum. You
 can look at the list using

less /tmp/dupes

 (type q to quit). If the duplicate list is not too big, you can
 remove all but one of each "duplicate" (that way you can also
 visually check if these are names of real duplicate). If you
 want something more automatic you can try the following:

sum=z;while read md name;
do
if [ "$md" = "$sum" ]; then
rm "$name"
else
sum="$md"
fi
done < /tmp/dupes

 Warning: if automatically erases files, it may be dangerous.

    Hope this helps,

         Lo?c

Wow....this is pretty far out! I'm gonna give it a shot and see how it
turns out.

Thanks Loic

Rafiq Hajat
Executive Director
The Institute for Policy Interaction (IPI)
P. O. Box E14,
Post Dot Net - Blantyre
Malawi
Mobile: +265 999 968800
www.ipimalawi.org 






More information about the ubuntu-users mailing list