Find missing files but with same file name
rikona
rikona at sonic.net
Sat May 21 21:58:41 UTC 2016
Thursday, May 19, 2016, 2:48:53 AM, Joel wrote:
> rikona wrote:
>> I lost some photo files, possibly doing backups because they have the
>> same file name as a different photo. In this case pc050049.jpg is NOT
>> the same pix as pc050049.jpg, for example. Pictures that I know I had
>> [I have a print :-) ] at an event are missing.
>>
>> I have many archive disks, CDs and DVDs with perhaps a hundred or so
>> large dir trees of archived pix.
>>
>> What I'd like to do is find archived pix that may have the SAME file
>> name as a current pix, but are DIFFERENT pix. Given MANY archives, is
>> there an efficient way to do this in Ububtu? About 20,000 current pix
>> to process.
>>
>> And, I'd also like to see ones with different names that are in the
>> archives but not in the current pix file trees.
> For this type of job, I would typically write a perl script, but you
> could use any scripting language that has hash variables
> (dictionaries).
I seem to need a script now and then, and have considered learning a
bit of python. :-) It seems to have md5 and sha1, so could work. Any
reason not to use it?
> It would be most straightforward assuming your system has RAM to
> keep the entire data structure in memory, and if you could have all
> pictures mounted on your system at once.
I could transfer all the archives to subdirs on a large USB drive.
Don't know about the data structures, though, the total number of
files is quite large.
> For each name, such as pc050049.jpg you want to store
> the path, and the size and probably md5 hash of the file
> contents. Here is an approximate YAML representation.
> pc050049.jpg:
> - /home/rikona/photos/2012-5-16
> size: 25683934
> md5: adfec80203fedcab
> - /mnt/backup1/dvd12/photos/2003-4-8
> size: 18403456
> md5: 8938475deadbeef8383894
> Then you iterate through the data structure, looking for
> same named files of differing content.
Agreed - the process looks good. As i think this through a bit more it
does get a bit more complicated - some have been 'retouched' or
otherwise altered. I guess in the end I'll have to visually confirm
what I find.
> That's an analysis of the process. It's a good first project
> for learning to code in a language of your choice.
> Have fun,
Learning is fun, if I can find the time. :-))
Thanks for the reply,
--
rikona
More information about the ubuntu-users
mailing list