About single user setup for lightweights

Avery Pennarun apenwarr at gmail.com
Fri Mar 19 17:14:11 GMT 2010


On Fri, Mar 19, 2010 at 5:53 AM, Martin Geisler <mg at lazybytes.net> wrote:
> Avery Pennarun <apenwarr at gmail.com> writes:
>> git sucks at handling large binary files (>50 megs or so) unless you
>> have boatloads of RAM. If your binary files are moderately sized (a
>> few megs) then it'll probably be reasonably efficient. I don't know
>> about hg and bzr for memory usage.
>
> Mercurial also uses lots of RAM, way more than I had hoped. I did some
> tests with this recently:
>
>  http://markmail.org/message/uxqtmmnkyimxse5b
>
> They show a factor 3-6 blowup when working with a 256 MB file.
>
> We don't really recommend storing such large files in Mercurial. Instead
> we recommend storing the files outside of the tree, e.g., on a server
> with a huge disk. The bfiles extension can do this:
>
>  http://mercurial.selenic.com/wiki/BfilesExtension

You might find my "bup" program entertaining:

  http://github.com/apenwarr/bup/

It happens to use the git file format, but the hashsplitting algorithm
would work with any repo and the code is written mostly in python.
Because it breaks larges files into chunks, it tends to avoid the
memory growth problems (at the cost of somewhat worse compression and
deltas).  At least you can then store them in your repository.

bup is intended for use as a full-system backup tool, but it would be
interesting to take the same techniques and use them to solve the
general case of large files in git/hg.

Have fun,

Avery



More information about the bazaar mailing list