2GB limit

Martin Geisler mg at aragost.com
Sun Oct 3 12:20:58 BST 2010


Maritza Mendez <martitzam at gmail.com> writes:

> On Sat, Oct 2, 2010 at 5:24 PM, Martin Geisler <mg at aragost.com> wrote:
>
>> Maritza Mendez <martitzam at gmail.com> writes:
>>
>> > 2. Does anyone know if any other dvcs system has solved the VM
>> > problem? If so we might put our "big file" projects in git or
>> > Mercurial until bzr can handle them.
>>
>> Mercurial is also not designed for working with very large files
>> since it loads them into memory when merging and when computing
>> diffs. People in #mercurial tell me that Git has the same limitation.
>>
>> However, we have several extensions that people use to tackle this
>> problem. The extensions all use the same basic idea: let Mercurial
>> track a small file that has a reference to the big file.
>>
>> When you checkout a particular revision with 'hg update', the
>> extension will notice that you checkout a certain version of the
>> small file. It then follows the reference to the big file and writes
>> that into your working copy instead of the small file.
>>
>> The big files are stored on a HTTP server or a shared network drive
>> or similar -- the idea being that you will setup a central server
>> that has enough disk space to keep all versions of the big files
>> around. The clients only download the one version of the big files
>> they need.
>>
>> Here are links to two such extensions which are used in production:
>>
>>  http://mercurial.selenic.com/wiki/BfilesExtension
>>  http://mercurial.selenic.com/wiki/SnapExtension
>
> Thanks. I skimmed the documentation at the links you sent. My
> philosophy is that committing binaries should be a rare use-case and
> merging binaries should be a non-existent use-case. So something like
> bfiles could work for me. it sounds like the bfile server starts with
> an initial copy of the bfile and maintains a labeled sequence of
> deltas and maybe occasionally stores a full copy to trade storage for
> speed.

No, the server holds full versions of the files. What you describe there
is essentially how our normal "revlog" format works.

> It sounds like the deltas are being computed client-side and passed to
> the server. Is that right? If so, then there must already be a
> bfile-diff-engine on the client. And since the client may have a
> 32-bit VM space, I'm guessing that the diff works in segments.

That would be the ideal way to do it: make Mercurial compute diffs in a
streaming fashion where it only ever loads a small segment of the file
into memory.

> So it seems like the local problem was solved already. The real
> benefit of bfiles seems to be that the bulky history of binary files
> is confined to the server and does not gum up the network and all the
> clients. Do I have that right?

That is the key advantage of both extensions: they give you a hybrid
between centralized and distributed revision control. Centralized
revision control is good at keeping track of huge files since it's only
the central server that must carry the burden of storing all revisions.

-- 
Martin Geisler

aragost Trifork
Professional Mercurial support
http://aragost.com/mercurial/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20101003/b06ee108/attachment.pgp 


More information about the bazaar mailing list