large files and storage formats

Chad Dombrova chadrik at gmail.com
Fri Jul 9 04:37:28 BST 2010


hi all,
i've got some questions regarding bzr and large binary files.

first of all, i've read about bzr's long-standing issues with large files (
https://bugs.launchpad.net/bzr/+bug/109114).  while fixing this issue would
be a worthy and noble cause, i have a fairly specific use case, and based on
a lot of recent experience i know there's a *very* high probability that
once this issue is fixed i'll run into other roadblocks with the current
storage format.

what interests me about bazaar is what the docs tout as its flexible
architecture: that it "is cleanly layered to support multiple file formats".
 that got me thinking: could i implement a more git-like loose object
storage format into bazaar?

for those who aren't familiar with git's loose object model, it works
something like this:  blobs represent data, trees represent the location of
data, a commit represents a change, and every object, regardless of type, is
stored as a separate loose file in the store.

this is great for working with large files for 2 reasons:
1) files can be moved/renamed without generating duplicate data in the
object store: it's just a new tree object
2) it does not use delta compression, which is not time or size efficient on
large binary files. blobs are compressed using zlib and the compression
strength is configurable

why don't i just use git?  i abhor the way that it is designed. i need a vcs
that is user friendly, doubles as it's own api, is easily extended, and is
preferably written in python (with support for pure-python hooks).  so far
bazaar seems to fit these requirements quite well.

so, i'd like some honest opinions:
- is bazaar really so well layered that new storage formats can be added
without the need to rewrite higher level code?
- how difficult a task is this, approximately, in man hours?  (keep in
mind, git's object model has already been implemented in python (
http://samba.org/~jelmer/dulwich/), so i'm mostly concerned with the time it
would take to interface this with bazaar in all the right places.)

i've looked through the docs and i can't find any information on how to get
started on writing a new storage format (which i take as a sign that it is
probably very difficult).  assuming that this goal is not laughably lofty,
and that there are not other better alternatives, i'd love some guidance on
how this might be pulled off.

thanks,
chad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ubuntu.com/archives/bazaar/attachments/20100708/5403ff2b/attachment.htm 


More information about the bazaar mailing list