Getting started with a content filter

Thu Jul 28 03:02:07 UTC 2011

> each problem is. If you want a totally smooth "works just like small
> files" experience, I think this is the way to go.

This would be super awesome and the right path for the future, if you 
think it's a viable way to go.  However, how would you solve the "need 
to support partial history" part that's needed to make this work?

In other words, there are (at least) three problems with large binary files:

1.  Memory.  This is the easiest one to fix, since it's just an 
optimization thing, iterating to find the problem spots, use streams, 
different packs, etc. like you say.

2.  Can't have entire history in all branches because it gets too big 
too fast on clients, need to store most of the large binary history on a 
server.  Is there a way to specify a partial history like this now?  Are 
stacked branches something that could help here?   It'd have to be 
per-file, though, not global to the entire branch.  In other words, I 
want all the code revisions local (or most of them), but only the past 
two large binary revisions, or whatever.  The code would need to delete 
old history locally as new history is checked in (assuming it's been 
successfully pushed to the server, of course).

3.  Need some kind of locking protocol so two artists don't edit an 
unmergable file at the same time.  I know this is heresy for dvcs, but 
it's going to have to get solved somehow if people are going to use bzr 
for these media projects that need the large binary files.  I think this 
isn't that big of a deal for this use-case, because #2 is going to 
require a server be accessible often anyway.  Not sure what to do when 
both artists are offline, but at least warning would be something.

Chris

On 2011/07/27 18:15, Martin Pool wrote:
> On 28 July 2011 02:44, Anteru<newsgroups at catchall.shelter13.net>  wrote:
>> Hi,
>>
>>> You can look up per-branch configuration options if you want to let
>>> users control what is stored or where.  But based on the previous
>>> discussion it seemed to me that you'd probably want to have a file in
>>> .bzrmeta that gives some additional files that ought to be checked
>>> out, and the places to put them at?
>> the main problem is how to distinguish between files with the same path.
>> I.e. the user might to
>>
>> bzr add-large foo.png&&  bzr ci
>> // mark foo.png as large
>> bzr rm foo.png&&  bzr ci
>> bzr add foo.png&&  bzr ci
>> // foo.png is a small file now, and shouldn't be tracked any more
>
> bzr adds files with a unique id (file_id), so you can distinguish
> these cases.  However, I think it's perhaps jumping ahead a bit to
> assume this is the way that users want to deal with very large files.
>
> I think storing files outside of the repository is essentially a bit
> of a hack and if we're going to do that, perhaps we should just do the
> simplest thing that would possibly help.  To me, that's something like
> just having a post-wt-update hook that reads a file with a list of
> URLs and paths, and downloads the relevant files.
>
> There's another possible path, which is to tune the in-repository code
> to handle very large files efficiently: probably putting them into
> their own groupcompress packs, not compressing them against anything
> else, being careful to use streaming interfaces, never doing text
> diffs on them.  It's not impossible and the foundations are there but
> it will take some work; probably mostly just iterating to see where
> each problem is.  If you want a totally smooth "works just like small
> files" experience, I think this is the way to go.
>
> m
>
>