Binary file storage similar to Kiln

Sat Jul 23 19:29:31 UTC 2011

I think large binary support is one of the biggest problems preventing 
adoption of DVCS in industry, at least in my industry (video games), so 
I'm glad there is some work in the area.  I didn't know about Kiln until 
your mail, and after reading about kbfiles it seems like they've made 
some progress on the problem, but it's a bummer it appears to be tightly 
integrated with their commercial product (and it's hg).

I have some notes I've been meaning to post about "requirements" for a 
large binaries feature to be really usable and valuable, but there's 
been a fair amount of talk about it on various lists and wikis and 
whatnot so you should check those out.

I think the main problem is that as long as the core folks don't think 
it's an important problem to solve and that it should be relegated to 
plugins, we're never going to get something that "just works" in a 
really robust way.  This could be a great place for bzr to innovate and 
differentiate, but most open source projects (and a fair percentage of 
commercial projects, to be fair) don't have a lot of binary files, so 
there's not much pressure to solve the problem.  The entire game 
industry would switch overnight if this and the nested subtree thing 
were solved well, though.

It seems like the Right Way to do this is to have the concept of 
branches which don't have all the revisions in them and that refer to 
other repositories for parts of the history if it's requested as a 
fundamental feature, and then you can build large binary support off 
that relatively easily (a large binary file is marked as only storing 
the latest revision, or better yet, the latest n revisions, where n is 
settable by users when they branch, or anytime).  The key thing here is 
that the large binaries are stored in the real repository on the server 
where disk space is unlimited, just not on every client (unless the 
client wants a full branch).  Storing the large binaries in another 
system with a plugin is always going to be hack compared to doing it 
right in the repo.

That said, if you do a stop-gap hack as a plugin that works well enough, 
I'll definitely help you test it!  :)  Anything is better than running a 
parallel svn tree for binaries.

Thanks,
Chris

On 2011/07/23 11:16, Anteru wrote:
> Hi,
>
>> or it could even go into a single file eg.
>> .bzrmeta/bigfiles
>> which contains lines the path and the hash of each big file.
> all right, thanks for the keywords link, that definitely looks like a
> starting point. So how do you suggest to start? By having some meta
> files storing the hash which are tracked "as usual" by bzr, and get hold
> of the binary file associated with it using content filters?
>
> I'm new to Bazaar, and I haven't written a plugin for it yet, so here's
> some guesswork: Wouldn't this break when the file is actually changed?
> I.e. let's assume I want a work-flow like this ...
>
> bzr add-big foo.bin
> // foo.bin.meta gets added to the repository
> bzr ci
> // foo.bin.meta is committed
> // foo.bin gets transferred using some method to the server
>
> // other machine
> bzr pull
> // foo.bin.meta is checked out
> // content filter recognizes the file and fetches foo.bin
> touch foo.bin
> bzr status
> // will show foo.bin.meta as untouched, right?
> bzr ci
> // won't update foo.bin.meta, will it?
>
> My plan is initially to store the large files locally on disk and just
> get the hooking part working (i.e. no server-side communication.) I
> think it's enough if every binary file gets hashed and gets stored under
> its hash, delta compression is not important here at first and could be
> possibly added later on anyway. That should be the easy part. I assume
> the hard part is to get the transport layer to transfer the files and
> get them stored server-side.
>
> Cheers,
>    Anteru
>
>
>