[Bug 502224] [NEW] findimagedupes should be parallelizable

gwern gwern0 at gmail.com
Sat Jan 2 03:09:30 UTC 2010


Public bug reported:

Binary package hint: findimagedupes

An excellent feature for findimagedupes would be hashing/analyzing
multiple images at once, in parallel. Each image can be analyzed
independently, and the file IO makes up a minuscule amount of the
runtime - the problem is embarrassingly parallel. Practically linear
speedups should be perfectly possible.

And the benefits are real: on large collections, the runtime can be many
minutes or hours. I have 4 cores which are generally not doing much; why
can't they all be used to cut the runtime by half or more?

I looked into running 4 findimagedupes concurrently and then using
--merge to bring together their results, but this is deeply hacky and I
worry about race-conditions and data consistency in the ultimate
fingerprint database; parallelism is something the application should be
handling internally.

** Affects: findimagedupes (Ubuntu)
     Importance: Undecided
         Status: New

-- 
findimagedupes should be parallelizable
https://bugs.launchpad.net/bugs/502224
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs at lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs




More information about the universe-bugs mailing list