command/script for finding *non*duplicate files?
Nils Kassube
kassube at gmx.net
Mon Feb 8 17:22:59 UTC 2016
Adam Funk wrote:
> On 2016-02-08, Karl Auer wrote:
> > On Mon, 2016-02-08 at 10:14 +0000, Adam Funk wrote:
> >> I'm looking for a
> >> way to find files in one directory (& its subdirectories) that are
> >> *not* duplicated in another one.
> > So 1) make a list of all the files in both directories
> > 2) remove all the duplicates
> > 3) the remainder are the non-duplicates
> >
> > ls -c 1 /that/directory/path > t1.txt
> > ls -c 2 /this/directory/path > t2.txt
> > cat t1.txt t2.txt | sort | uniq -u
> ...but that won't work because the files may have been renamed.
Then you could try an approach with md5sum. Something like this:
find directory1 -type f -exec md5sum {} \; >md5sums.txt
find directory2 -type f | while read n;do
m=$(md5sum "$n")
m="${m%% *}"
grep -q "$m" md5sums.txt || echo "$n"
done
This way you find files that are in directory2 (incl. subdirectories)
but not in directory1 (incl. subdirectories).
Nils
More information about the ubuntu-users
mailing list