'unshelve is slow'

Aaron Bentley aaron at aaronbentley.com
Wed Jun 30 04:53:53 BST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/28/2010 02:48 PM, John Arbash Meinel wrote:
> We just had a comment about unshelve seeming to be very slow (taking 15s
> or so). I had some analysis on it, and was wondering if it fits what you
> know about unshelve (since you did 95+% of all the implementing and
> follow up improvements.)

Well, IME PreviewTrees are slow, but not intolerably slow.

> Basically, it looks like unshelve and PreviewTree.iter_changes() is
> having to extract the entire Inventory, and that is being slow (5.2s to
> compute iter_entries_by_dir() and 6.8s to compute _changes_from_entries()).

I did my best to avoid directly touching the inventory.  I hope that
someone will fix InterTree.iter_changes so that it doesn't use the
inventory, either.

> It also seems that PreviewTree.iter_changes() could make use of the
> delta it generated as part of being serialized.

PreviewTrees are based on TreeTransforms, not serialized data.  The
TreeTransforms have exactly the same content as the serialized data
(it's a serialized TreeTransform), so there's no advantage to using the
serialized form.  PreviewTree.iter_changes *does* provide a fast path
that makes use of the TreeTransform.  So the answer can be "no, the
serialized delta is irrelevant" or "yes, we can use the serialized delta
(by deserializing it into a TreeTransform), and we already are".

I believe that unshelve doesn't use the fast path when it uses the
PreviewTree as an input to a merge, and that may be what you're seeing
in the profile.

It may be that we can do more with the TreeTransform data in the
non-fast-path case, but it's a pretty subtle problem, and can easily go
wrong.

> Do you have any thoughts on it?

I tried to write something that was a well-behaved client of the
existing Tree API.  Optimizing the Tree API so that it doesn't load the
inventory would raise all boats.  Providing a way to get inventory
entries without calling iter_entries_by_dir([file_id]).next()[1] would
be another win.

I'm doubtful that directly optimizing PreviewTree.iter_changes will be
fruitful, but I'd be happy to be proved wrong.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwqv9EACgkQ0F+nu1YWqI0oNwCdGa7pOc9Cx7YnIpwmhehJX57Q
HO4An2fbTswqeWX2VO5+NPYyhsmWX56c
=Tvk5
-----END PGP SIGNATURE-----



More information about the bazaar mailing list