Find missing files but with same file name

rikona rikona at sonic.net
Sat May 21 21:58:41 UTC 2016


Thursday, May 19, 2016, 2:48:53 AM, Joel wrote:

> rikona wrote:
>> I lost some photo files, possibly doing backups because they have the
>> same file name as a different photo. In this case pc050049.jpg is NOT
>> the same pix as pc050049.jpg, for example. Pictures that I know I had
>> [I have a print :-) ] at an event are missing.
>> 
>> I have many archive disks, CDs and DVDs with perhaps a hundred or so
>> large dir trees of archived pix.
>> 
>> What I'd like to do is find archived pix that may have the SAME file
>> name as a current pix, but are DIFFERENT pix. Given MANY archives, is
>> there an efficient way to do this in Ububtu? About 20,000 current pix
>> to process.
>> 
>> And, I'd also like to see ones with different names that are in the
>> archives but not in the current pix file trees.

> For this type of job, I would typically write a perl script, but you
> could use any scripting language that has hash variables
> (dictionaries).

I seem to need a script now and then, and have considered learning a
bit of python. :-) It seems to have md5 and sha1, so could work. Any
reason not to use it?

> It would be most straightforward assuming your system has RAM to
> keep the entire data structure in memory, and if you could have all
> pictures mounted on your system at once.

I could transfer all the archives to subdirs on a large USB drive.
Don't know about the data structures, though, the total number of
files is quite large.

> For each name, such as pc050049.jpg you want to store 
> the path, and the size and probably md5 hash of the file
> contents. Here is an approximate YAML representation.

> pc050049.jpg:
>  - /home/rikona/photos/2012-5-16
>    size: 25683934
>    md5: adfec80203fedcab
>  - /mnt/backup1/dvd12/photos/2003-4-8
>    size: 18403456
>    md5: 8938475deadbeef8383894

> Then you iterate through the data structure, looking for
> same named files of differing content.

Agreed - the process looks good. As i think this through a bit more it
does get a bit more complicated - some have been 'retouched' or
otherwise altered. I guess in the end I'll have to visually confirm
what I find.

> That's an analysis of the process. It's a good first project
> for learning to code in a language of your choice.

> Have fun,

Learning is fun, if I can find the time. :-))

Thanks for the reply,

-- 

 rikona







More information about the ubuntu-users mailing list