compressed weaves, and revision.weave

Fri Oct 28 17:03:37 BST 2005

Michael Ellerman wrote:
> On Fri, 28 Oct 2005 14:32, John A Meinel wrote:
>> Michael Ellerman wrote:
>>> Numbers for format 7 compressed:
>>>
>>> concordia ~/src/work/ckexec$ time $cbzr st
>>> real    6m40.589s		<-------- WOW
>>> user    0m23.455s
>>> sys     0m4.864s
>> The first time you run "status" or some of the other commands, bzr
>> updates the .bzr/stat-cache file.
>> This file indexes files based on their path, and keeps their sha1 hash,
>> linked to their stat result.
>> Which means the first time you run "bzr status", it has to compute the
>> sha1 has of *every* file.
> 
> Yeah I figured it was the stat/hash-cache. It'd be nice if upgrade did that 
> for you, it could even serve as a test (of sorts) that the upgrade succeeded.
> 

Actually, right now, the last thing upgrade does is delete the
hash-cache. In the past, the format would change, so it would cause
spurious "bad line in hash-cache" messages.

But I think it would be reasonable to have "bzr upgrade" do all of its
stuff, and then effectively do a "bzr status" at the end.
Though preferably one attached to a progress bar.

>> 8< snip
> 
>> One obvious place that we are doing extra work is in status.py
>> list_paths(). It looks like we have 2 generators (new.iter_conflicts() and
>> new.unknowns()).
>> Probably both of these go through the entire inventory, looking for a
>> file which is either conflicted or unknown.
>> On an inventory as large as yours, that is an extra 1.5s.
>> This probably could be avoided with a combined call, which returns the
>> status of the file (conflicted or unknown), which can be turned into a
>> list, and printed out later.
> 
> Yeah I was looking at that too. I was thinking they should be folded into 
> compare_trees, so we have only one loop.
> 
> It'd also be nice if compare_trees was a generator, so we can start printing 
> results almost immediately (this is nice if you're running "bzr st | less"), 
> but that makes sorting impossible. 
> 
>> So I'm guessing that long term, format 7 would stay a little slower than
>> format 6, but it shouldn't get worse.
> 
> That could be good. They say disk is cheap, but it's not _that_ cheap, 
> especially for laptops. So if the compressed format is only ~10-20% slower it 
> might be a reasonable trade off, at least for trees that you don't interact 
> with much.

Actually, my real thought is that format 7 compressed branches would be
good for published branches.
Because right now, you have to download the complete inventory.weave and
 the complete weave for all texts that were modified.
So cutting that down would help a lot.

I think we need a better way to speed up weave operations rather than
just avoiding extra lines.

Then again, with Martin's work on "knit"s, I'm not sure how everything
will change. I think the idea is to create a main file, with an index,
and inside the main file would be compressed changes (compressed one by
one).

So not quite as compressed, but being able to grab them 1 by 1 is a big
benefit.

John
=:->

> 
> cheers
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051028/1bcc5b54/attachment.pgp