Managing a photo library with bzr
John Arbash Meinel
john at arbash-meinel.com
Mon Jul 28 19:06:16 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Sébastien Barthélemy wrote:
| Hello everybody,
|
| I'm in the process of writing a bunch of python scripts to manage my
| photo library.
|
| I would like it to be versioned, and thus I'm wondering of bzr would
| be suited for that.
| To give you an idea, I currently have 6100 pictures, "weighting" 7,2 Go.
So, there are a few questions about why you want it to be versioned,
etc. Are you modifying the pictures, such that you want to be able to
get back to an older version of them?
Is this *really* better than a real backup system?
For starters, bzr is tuned around working as a version control system
for source code (mainly text files). It probably would work under what
you need, but you would probably have to workaround bits here and there
because of how we've tuned the system.
For example, we generally keep at least 1 full-copy of the contents of a
file in memory while doing operations. (We try to limit it to 3 copies,
but stuff like merge needs 3 copies to work from. And there are
certainly "bugs" where we might hold more than that.)
Our storage layer generally doesn't do very well optimizing the size of
binary files. (Though to be fair, if you are using PNG or JPG files as
your source, those aren't going to do very well with just about
anything, because they are already compressed.)
You probably won't be able to commit all 7GB of files in one pass. You
probably could commit a few at a time, until all 7GB was versioned. I
know some versions of bzr would also have problem copying this between
repositories, as it would tend to buffer the stream before sending. We
are trying to remove those code paths, but I would be surprised if we
buffered less than 1 text content at a time.
|
| So what do you think about it ? Could bzr handle that much data on a
| regular computer ? Let's say regular=mine ;) : 1Go RAM and a core duo
| processor.
|
| Another think I would like to do is to store somewhere the md5sum of
| each (version of each) picture in order to ease duplicate detection.
| Is there versioned properties à la svn in Bzr ?
Revisions can hold arbitrary metadata, but not really like you have in svn.
However, we already store the sha1 sum of every version of every file,
so you could just hook into the inventory logic if all you are wanting
is to check hash collisions.
|
| There will also be some need to handle/fix the exim (and so) metadata
| of the picture at some time. Do you think it would be wise to
| implement this as a bzr merger plugin ?
Well, if you could export it to a file, we would merge it for you.
Otherwise, yes, you would need a custom merge algorithm.
|
| At last, it would be great to have the ability to checkout a "low disk
| space" version of the library, with only the metadata, and low
| resolution pictures, for instance. While it seems quite out of the
| scope of bzr, maybe some one as a clean solution for this too.
This could be done with a layering approach. So that you actually have 2
branches/repositories. One with the full versions, and one with the
thumbnails.
|
| That's it, feel free to criticize if the idea sounds bad to you.
|
| Cheers
|
I don't think bzr is specifically well suited to the task. If you have
some development ability in you (as you are at least writing python
scripts to manage it), we would be open to patches which make things
work better for you. (Subject to the standard: are of good quality,
don't reduce test coverage, don't reduce code clarity, sort of constraints.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkiOCpgACgkQJdeBCYSNAAO2cQCfdpPJeH9Wz2DsppCsNSnJI3VD
fu8AnjQegx8ltVRRgb/ukpxrvF4U7Oky
=qZnZ
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list