[Merge] lp:~abentley/bzr/fix_get_mtime into lp:~bzr/bzr/trunk

Robert Collins robertc at robertcollins.net
Tue Sep 29 05:55:11 BST 2009


On Tue, 2009-09-29 at 00:27 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Moving to list...
> 
> Robert Collins wrote:
> > On Tue, 2009-09-29 at 03:25 +0000, Aaron Bentley wrote:
> >>> Creating a path<->id map is expensive on large trees;
> >> I don't understand why there needs to be such a map.  Can we not
> >> retrieve only the data we care about?  That would scale with the
> >> amount of data we're actually showing, which I think is an acceptable
> >> scaling factor.
> > The map is needed because dirstate is laid out internally to match the
> > IO pattern of 'status'
> 
> It's hard to know how often the input tree will be dirstate-backed, but
> I believe it will be in the cases qbzr hit.  I assume CHK-backed trees
> can do id2path without constructing a map?

CHKInventory has a hash table to do id2path. the id is hashed and that
becomes the key into the CHKMap to look up the entry. The entries
parents are then walked recursively to the root to resolve the full
path. This scales well even in very large trees, with the main factor
being the depth of the path.

> >, and thats very heavily biased to paths, to match
> > disk layout and locality of reference. 
> > So we can answer 'path2id' _very_ fast, but we cannot answer 'id2path'
> > without scanning the whole dirstate to find out if 'id' exists, and thus
> > where it is.
> 
> That's an unfortunate weakness considering a lot of our code was written
> to use file-ids as the primary identifier.

It is, however at the time we were designing dirstate we had a bunch of
tradeoffs to choose between, and were looking at hg for inspiration; it
turns out that we perhaps should have stored both maps persistently, or
done something even more clever.

Nevertheless we've gotten pretty good performance thus far; avoiding
id2path maps is talking about future wins: most code paths trigger the
map generation at the moment, so the main thing is to be careful when
adding API's that will _require_ an id2path map on WorkingTree.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090929/33caace7/attachment-0001.pgp 


More information about the bazaar mailing list