command/script for finding *non*duplicate files?

Nils Kassube kassube at gmx.net
Mon Feb 8 17:22:59 UTC 2016


Adam Funk wrote:
> On 2016-02-08, Karl Auer wrote:
> > On Mon, 2016-02-08 at 10:14 +0000, Adam Funk wrote:
> >> I'm looking for a
> >> way to find files in one directory (& its subdirectories) that are
> >> *not* duplicated in another one.

> > So 1) make a list of all the files in both directories
> >    2) remove all the duplicates
> >    3) the remainder are the non-duplicates
> > 
> > ls -c 1 /that/directory/path > t1.txt
> > ls -c 2 /this/directory/path > t2.txt
> > cat t1.txt t2.txt | sort | uniq -u

> ...but that won't work because the files may have been renamed.

Then you could try an approach with md5sum. Something like this:

find directory1 -type f -exec md5sum {} \; >md5sums.txt
find directory2 -type f | while read n;do
  m=$(md5sum "$n")
  m="${m%% *}"
  grep -q "$m" md5sums.txt || echo "$n"
done

This way you find files that are in directory2 (incl. subdirectories) 
but not in directory1 (incl. subdirectories).


Nils





More information about the ubuntu-users mailing list