brisbane:CHKMap.iteritems() tweaks

Tue Mar 24 02:24:52 GMT 2009

John Arbash Meinel wrote:
> 
>> 1) Change 'bzr pack' so that it creates a separate set of groups for
>> items that are available in the newest XX (say 100) revisions. Or
>> possibly group everything into 100 rev chunks.
> 
> This was easy to implement for CHK streams. And it changes "time bzr ls
> -r-1" from 4.4s down to 1.6s. (I implemented as splitting at 10
> revisions). And the time without the patch is 2.2s up from 1.6s. So the
> patch makes a bigger difference when we aren't swamped with extract time.

Nice.

> The total size on disk after packing is barely noticeable:
>  125666
>  125981
> 
> I guess that is 300KiB. But out of 125MiB, that is only 0.2%.

Well worth it IMO.

> Also, it seems that the breakdown now is 0.93s spent in
> 'get_build_details'. I suppose in one sense it is nice to be down to the
> point where the index performance is the bottle neck, as it means we've
> gotten the other things faster. It just happens that chk indexes are
> just super wide and have no sense of locality, so the page cache just
> gets thrashed. I see 960 btree pages getting read, and there are only
> 3100 total. So to extract just one full inventory is causing us to read
> 1/3rd of the .cix... :(

Interesting. It certainly is an important milestone w.r.t. tuning when
index performance is our top concern.

BTW, I'm in the process of converting a full Python branch to 1.9 and
gc-chk255-big formats with the aim of running a full benchmark. I'll
kick that off later today hopefully. If you have low risk stuff you're
happy to land into brisbane-core, please go ahead and land it so I'm
testing your latest code.

Ian C.