svn-import performance analysis
Jelmer Vernooij
jelmer at samba.org
Wed Dec 3 04:44:37 GMT 2008
On Tue, 2008-12-02 at 18:03 -0600, John Arbash Meinel wrote:
> Jelmer Vernooij wrote:
> > I've done some work analysing the critical points in importing from
> > Subversion using bzr-svn 0.5.
> >
> > The main culprits appear to be:
> >
> > * Inventory.copy() (33%)
>
> This is, unfortunately, a deep-copy rather than having CoW semantics.
> One thing that the split-inventory code changes is making Inventories a
> bit more immutable (because CHKInventory is going to be more explicitly
> CoW).
After Robert's explanation about inventory deltas on IRC, I've now
rewritten bits of the fetch code to use inventory deltas. It turns out I
don't really need the copy() anymore now.
> > * Repository.add_revision() (49%) (most of which is spent serialising
> > the inventory)
> >
> > While fetching a revision delta, bzr-svn makes a copy of the inventory
> > of the parent revision and applies the changes it sees to that. It needs
> > the previous inventory as well to figure out renames and the like in
> > roundtripped revisions.
> So it needs the parent revision, and the grandparent? Or it needs the
> parent revision both for a delta and for comparison?
>
> I'm curious if you hold on to the newly generated inventory when you go
> on to the next child, or whether you pull it out again.
I do hold on to
> > I'm at a bit of a loss as to how I can optimize this further. Does
> > anybody have any ideas? Also, I would expect CHK inventories to be of
> > help here - is that correct?
> CHK inventories will be much more capable of "partial update without
> full serialization/deserialization", as it is designed in from the
> beginning.
>
> One hack I did a long time ago was to have each InventoryEntry remember
> its serialized form (along with some other info like what serializer
> created it, etc.) and then have it throw away that value when it was
> modified. Though it really needs the calling code to be careful about
> modifications, because InventoryEntries are just plain ol' data
> structures, without a way to know if they have been made "dirty".
>
> So... CHK should be a big help here, and without doing some dirty hacks
> in bzrlib, I'm not sure if there is much else to be done.
Cool, I'm looking forward to them landing then :-) At least the code in
bzr-svn is ready for that now.
When using brisbane-core with 1.9-rich-root, the time spent seems to've
shifted significantly. Only 17% of the time is now spent in
Repository.add_inventory_delta(). Instead, more time is now spent in:
* Knit.add_lines() (30.11%)
* Knit.get_record_stream() (18.78%)
* Repository.get_inventory() (5.22%)
Unfortunately I can't use CHKInventories at the moment to see if it gets
that 17% down even further, since it errors out on me:
Traceback (most recent call last):
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/commands.py", line 893, in
run_bzr_catch_errors
return run_bzr(argv)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/commands.py", line 839, in
run_bzr
ret = run(*run_argv)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/commands.py", line 539, in
run_argv_aliases
return self.run(**all_cmd_args)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/commands.py", line 853, in
ignore_pipe
result = func(*args, **kwargs)
File "/home/jelmer/.bazaar/plugins/svn/__init__.py", line 280, in run
to_revnum=to_revnum)
File "/home/jelmer/bzr/bzr-svn/0.5/.plugins/svn/convert.py", line 249,
in convert_repository
inter.fetch(needed=revmetas)
File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 925, in
fetch
self._fetch_revisions(needed, pb, use_replay=use_replay)
File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 864, in
_fetch_revisions
self._fetch_revision_switch(editor, revmeta, parent_revmeta)
File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 822, in
_fetch_revision_switch
report_inventory_contents(reporter, parent_revnum, start_empty)
File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 673, in
report_inventory_contents
reporter.finish()
File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 193, in
close
self._close()
File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 331, in
_close
self.editor._finish_commit()
File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 502, in
_finish_commit
[r for r in rev.parent_ids if self.target.has_revision(r)])
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/repofmt/pack_repo.py", line
2134, in add_inventory_delta
basis_revision_id, delta, new_revision_id, parents)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/repository.py", line 692, in
add_inventory_delta
return self.add_inventory(new_revision_id, basis_inv, parents)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/repository.py", line 647, in
add_inventory
return self._add_inventory_checked(revision_id, inv, parents)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/repofmt/pack_repo.py", line
2108, in _add_inventory_checked
parent_id_basename_index=serializer.parent_id_basename_index)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/inventory.py", line 1525, in
from_inventory
result.id_to_entry.apply_delta(file_id_delta)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/chk_map.py", line 78, in
apply_delta
self.map(new, value)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/chk_map.py", line 349, in
map
self._root_node.add_node(split, node)
File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/chk_map.py", line 608, in
add_node
assert len(prefix) == len(self._prefix) + 1
AssertionError
bzr 1.10dev on python 2.5.2 (linux2)
arguments: ['/home/jelmer/bzr/bzr/bzr.dev/bzr', 'svn-import',
'svn://svn.gnome.org/svn/vala']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8'
plugins:
cia /usr/lib/python2.5/site-packages/bzrlib/plugins/cia
[1.0dev]
cvsps_import /usr/lib/python2.5/site-packages/bzrlib/plugins/cvsps_import [unknown]
dbus /usr/lib/python2.5/site-packages/bzrlib/plugins/dbus [unknown]
email /usr/lib/python2.5/site-packages/bzrlib/plugins/email [unknown]
gtk /usr/lib/python2.5/site-packages/bzrlib/plugins/gtk
[0.96.0.dev.1]
launchpad /home/jelmer/bzr/bzr/bzr.dev/bzrlib/plugins/launchpad [unknown]
loom /usr/lib/python2.5/site-packages/bzrlib/plugins/loom [1.4dev]
pqm /usr/lib/python2.5/site-packages/bzrlib/plugins/pqm
[1.3]
rebase /usr/lib/python2.5/site-packages/bzrlib/plugins/rebase [0.4.2]
search /usr/lib/python2.5/site-packages/bzrlib/plugins/search [1.7dev]
stats /usr/lib/python2.5/site-packages/bzrlib/plugins/stats [unknown]
svn /home/jelmer/.bazaar/dev-plugins/svn [0.5dev]
upload /usr/lib/python2.5/site-packages/bzrlib/plugins/upload [0.1]
*** Bazaar has encountered an internal error.
Please report a bug at https://bugs.launchpad.net/bzr/+filebug
including this traceback, and a description of what you
were doing when the error occurred.
**** entering debugger
> /home/jelmer/bzr/bzr/bzr.dev/bzrlib/chk_map.py(608)add_node()
-> assert len(prefix) == len(self._prefix) + 1
(Pdb) print prefix
1 at 637f28bd-e311-0410-add6-e2755f2ae1d4:trunk%2F
(Pdb) print self._prefix
1 at 637f28bd-e311-0410-add6-e2755f2ae1d4:trunk%2F
fwiw, this happens when adding the very first inventory delta. I'm
specifying this to set the root file id:
[(None, '', '1 at 637f28bd-e311-0410-add6-e2755f2ae1d4:trunk%2F',
InventoryDirectory('1 at 637f28bd-e311-0410-add6-e2755f2ae1d4:trunk%2F',
'', parent_id=None,
revision='svn-v4:637f28bd-e311-0410-add6-e2755f2ae1d4:trunk:1'))]
Thanks!
Cheers,
Jelmer
--
Jelmer Vernooij <jelmer at samba.org> - http://samba.org/~jelmer/
Jabber: jelmer at jabber.fsfe.org
More information about the bazaar
mailing list