bzr struggling with large trees
John A Meinel
john at arbash-meinel.com
Wed Oct 19 23:58:39 BST 2005
Rob Holland wrote:
> On Sun, 2005-10-16 at 17:17 -0500, John Arbash Meinel wrote:
>
>
>>Well, I can say from the start, that the part that is killing you is the
>> parsing of the inventory XML. Do you have cElementTree installed? That
>>is probably the biggest performance boost for what you are seeing.
>
>
> I do have cElementTree installed :)
>
>
>>As far as a long-term solution, I'm not really sure. Because right now
>>we store the entire inventory in a single file. So if you add a file,
>>the entire inventory needs to be read, modified, and then written out again.
>
>
> And it takes 8 seconds to read in, add one line and write it out?
> Scary :/
>
I realize that nobody else seemed to answer you. Have you had any more
progress with this?
I can think of a few alternatives.
One thing that actually might be very intriguing would be to use an
sqlite database. Access would be very fast, as could commits, etc.
I have some code to convert things into that database (available from
http://bzr.arbash-meinel.com/plugins/revstore2sql/)
It doesn't do any sort of branch work to make it actually usable, it was
just a potential prototype of the database schema (and trying to keep it
reasonably small, though indexes cost quite a bit).
The structure of bzr should make it relatively easy to have various
back-end storage models. So that while we might recommend the standard
weave files + XML revision and inventories, we could also state
something like "but if you need to scale to 80k files, you can use this
other model".
I certainly haven't convinced myself of it. And it suffers from the fact
that you know have 1 giant file, which can not be incrementally updated.
(So you would also have to run a smart server to handle that branch,
otherwise you download the entire binary database each time).
But it might be something. I know in my testing, extracting an inventory
was lightning fast. And possibly adding a single file could be done such
that it didn't have to read the entire inventory each time. (Though it
might, in order to verify there wasn't some sort of conflict).
>
>>How does this compare with other trees that you have used? 79k files
>>seems like a lot, and I certainly think tla/baz would have done very
>>poorly too. bk would do a lot better, since it requires "bk edit". But
>>have you tried git?
>
>
> I haven't tried this tree in any other VCS other than the one we're
> importing from (CVS). I'm keen to try and help get bzr working before
> looking elsewhere :)
>
I'm also concerned about what will happen when you get 1k revisions into
bzr, because right now you can watch the inventory.weave file slow down
as bzr.dev gets 2k total revisions (about 50k lines in inventory.weave).
Note, this includes merge revisions.
But we are already discussing an alternative weave, which might help
that. Something that could be downloaded and updated in an incremental
fashion, rather than all at once.
This problem was what made me investigate an SQL backend, but it is
probably applicable to both. (Actually I investigated SQL back with
tla/baz, I just updated things for the newer [vastly different] codebase).
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051019/eb525896/attachment.pgp
More information about the bazaar
mailing list