Out of Memory a bridge too far

Sun Nov 13 07:25:29 UTC 2011

Oh, I don't mean it's not a bunch of work, I just mean doing it right
versus doing it hackily (like hg's bfiles, in my opinion) is not going
to be that much extra work relative to how long it will take to test it,
get it stable, and how long we'll have to live with the results.

Chris

On 2011/11/12 22:48, Andrew Bennetts wrote:
> Chris Hecker wrote:
>>
>>>> What I am actually looking for (I believe) is something along the
>>>> lines of the new largefile support in Mercurial.
>>> Yes, we are interested in both helping you write it, and in getting
>>> it merged in to core.
>>
>> My fear about doing a hacky "external largefile solution" like in hg is
>> that it will be "good enough" and relieve any pressure to solve the real
>> problem, but it's really a quite crappy solution to the actual problem.
>>  Solving it the right way seems like it would be only a little more work
>> (once you take into account testing and everything over the lifetime),
>> yet it would set bzr up to be a real "dvcs 2.0" project, leaving the
>> hacks behind.
>>
>> Of course, I ranted about this before on this list, and since I don't
>> have time to do it myself, I guess it's just a bunch of hot air right now.
> 
> I think you underestimate the effort involved.  If it were only a
> “little more work” to do this “the right way” we would have done it by
> now.  There's been threads delving into the details in the past, but
> I'll recap the basic points I remember.
> 
> In principle, it's not too hard to modify the way bzr stores large files
> to store them as N moderate-sized chunks rather than one really huge
> record.
> 
> But there are significant practical issues:
> 
> 1. it's a format change.  We don't do those lightly.
> 
> 2. it does involve considerable (but entirely feasible) work to make
>    sure the internals of bzr always deal with chunks or streams for
>    large files, never just a string of the whole file.  There has been
>    constant, quiet progress on this in basically every release for ages,
>    but it does mean fixing up a *lot* of code paths so it's not there
>    yet.
> 
> If I were to try make a tasteful workaround in current formats I'd
> probably look into a different approach to the ones suggested on this
> thread so far: a view plugin (or something in that style) to
> transparently notice when a new revision adds a large file and break it
> into a series of 10MB (say) chunks and commit those instead.  The
> plugin would then of course recombine those chunks when extracting that
> file.
> 
> My thinking here is that:
> 
>  - this has a fairly clear upgrade path to a future format change that
>    works along the same lines;
>  - clients without the plugin can still make use of the branch and
>    revision involved (and with some effort, even reconstruct the large
>    file manually if they have to: 'cat largefile-chunk-* > largefile' or
>    similar);
>  - and, of course, this greatly mitigates the memory consumption caused
>    by bzr's current implementation that IIRC needs roughly 2-3x the
>    memory of the largest file in the tree.
> 
> Even this I don't think I'd call “easy”: I bet there are some fiddly
> corner cases, and also IIRC the existing view hooks assume a 1-to-1
> relationship between transformed and untransformed files.
> 
> -Andrew.
> 
>