svn-import performance analysis

Jelmer Vernooij jelmer at samba.org
Wed Dec 3 04:44:37 GMT 2008


On Tue, 2008-12-02 at 18:03 -0600, John Arbash Meinel wrote:
> Jelmer Vernooij wrote:
> > I've done some work analysing the critical points in importing from
> > Subversion using bzr-svn 0.5. 
> > 
> > The main culprits appear to be:
> > 
> >  * Inventory.copy() (33%)
> 
> This is, unfortunately, a deep-copy rather than having CoW semantics.
> One thing that the split-inventory code changes is making Inventories a
> bit more immutable (because CHKInventory is going to be more explicitly
> CoW).
After Robert's explanation about inventory deltas on IRC, I've now
rewritten bits of the fetch code to use inventory deltas. It turns out I
don't really need the copy() anymore now.

> >  * Repository.add_revision() (49%) (most of which is spent serialising
> > the inventory)
> > 
> > While fetching a revision delta, bzr-svn makes a copy of the inventory
> > of the parent revision and applies the changes it sees to that. It needs
> > the previous inventory as well to figure out renames and the like in
> > roundtripped revisions.
> So it needs the parent revision, and the grandparent? Or it needs the
> parent revision both for a delta and for comparison?
> 
> I'm curious if you hold on to the newly generated inventory when you go
> on to the next child, or whether you pull it out again.
I do hold on to 

> > I'm at a bit of a loss as to how I can optimize this further. Does
> > anybody have any ideas? Also, I would expect CHK inventories to be of
> > help here - is that correct?
> CHK inventories will be much more capable of "partial update without
> full serialization/deserialization", as it is designed in from the
> beginning.
> 
> One hack I did a long time ago was to have each InventoryEntry remember
> its serialized form (along with some other info like what serializer
> created it, etc.) and then have it throw away that value when it was
> modified. Though it really needs the calling code to be careful about
> modifications, because InventoryEntries are just plain ol' data
> structures, without a way to know if they have been made "dirty".
> 
> So... CHK should be a big help here, and without doing some dirty hacks
> in bzrlib, I'm not sure if there is much else to be done.
Cool, I'm looking forward to them landing then :-) At least the code in
bzr-svn is ready for that now.

When using brisbane-core with 1.9-rich-root, the time spent seems to've
shifted significantly. Only 17% of the time is now spent in
Repository.add_inventory_delta(). Instead, more time is now spent in:

 * Knit.add_lines() (30.11%)
 * Knit.get_record_stream() (18.78%)
 * Repository.get_inventory() (5.22%)

Unfortunately I can't use CHKInventories at the moment to see if it gets
that 17% down even further, since it errors out on me:

Traceback (most recent call last):
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/commands.py", line 893, in
run_bzr_catch_errors
    return run_bzr(argv)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/commands.py", line 839, in
run_bzr
    ret = run(*run_argv)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/commands.py", line 539, in
run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/commands.py", line 853, in
ignore_pipe
    result = func(*args, **kwargs)
  File "/home/jelmer/.bazaar/plugins/svn/__init__.py", line 280, in run
    to_revnum=to_revnum)
  File "/home/jelmer/bzr/bzr-svn/0.5/.plugins/svn/convert.py", line 249,
in convert_repository
    inter.fetch(needed=revmetas)
  File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 925, in
fetch
    self._fetch_revisions(needed, pb, use_replay=use_replay)
  File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 864, in
_fetch_revisions
    self._fetch_revision_switch(editor, revmeta, parent_revmeta)
  File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 822, in
_fetch_revision_switch
    report_inventory_contents(reporter, parent_revnum, start_empty)
  File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 673, in
report_inventory_contents
    reporter.finish()
  File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 193, in
close
    self._close()
  File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 331, in
_close
    self.editor._finish_commit()
  File "/home/jelmer/.bazaar/dev-plugins/svn/fetch.py", line 502, in
_finish_commit
    [r for r in rev.parent_ids if self.target.has_revision(r)])
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/repofmt/pack_repo.py", line
2134, in add_inventory_delta
    basis_revision_id, delta, new_revision_id, parents)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/repository.py", line 692, in
add_inventory_delta
    return self.add_inventory(new_revision_id, basis_inv, parents)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/repository.py", line 647, in
add_inventory
    return self._add_inventory_checked(revision_id, inv, parents)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/repofmt/pack_repo.py", line
2108, in _add_inventory_checked
    parent_id_basename_index=serializer.parent_id_basename_index)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/inventory.py", line 1525, in
from_inventory
    result.id_to_entry.apply_delta(file_id_delta)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/chk_map.py", line 78, in
apply_delta
    self.map(new, value)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/chk_map.py", line 349, in
map
    self._root_node.add_node(split, node)
  File "/home/jelmer/bzr/bzr/bzr.dev/bzrlib/chk_map.py", line 608, in
add_node
    assert len(prefix) == len(self._prefix) + 1
AssertionError

bzr 1.10dev on python 2.5.2 (linux2)
arguments: ['/home/jelmer/bzr/bzr/bzr.dev/bzr', 'svn-import',
'svn://svn.gnome.org/svn/vala']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8'
plugins:

cia                  /usr/lib/python2.5/site-packages/bzrlib/plugins/cia
[1.0dev]

cvsps_import         /usr/lib/python2.5/site-packages/bzrlib/plugins/cvsps_import [unknown]

dbus                 /usr/lib/python2.5/site-packages/bzrlib/plugins/dbus [unknown]

email                /usr/lib/python2.5/site-packages/bzrlib/plugins/email [unknown]

gtk                  /usr/lib/python2.5/site-packages/bzrlib/plugins/gtk
[0.96.0.dev.1]

launchpad            /home/jelmer/bzr/bzr/bzr.dev/bzrlib/plugins/launchpad [unknown]

loom                 /usr/lib/python2.5/site-packages/bzrlib/plugins/loom [1.4dev]

pqm                  /usr/lib/python2.5/site-packages/bzrlib/plugins/pqm
[1.3]

rebase               /usr/lib/python2.5/site-packages/bzrlib/plugins/rebase [0.4.2]

search               /usr/lib/python2.5/site-packages/bzrlib/plugins/search [1.7dev]

stats                /usr/lib/python2.5/site-packages/bzrlib/plugins/stats [unknown]
  svn                  /home/jelmer/.bazaar/dev-plugins/svn [0.5dev]

upload               /usr/lib/python2.5/site-packages/bzrlib/plugins/upload [0.1]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.
**** entering debugger
> /home/jelmer/bzr/bzr/bzr.dev/bzrlib/chk_map.py(608)add_node()
-> assert len(prefix) == len(self._prefix) + 1

(Pdb) print prefix
1 at 637f28bd-e311-0410-add6-e2755f2ae1d4:trunk%2F
(Pdb) print self._prefix
1 at 637f28bd-e311-0410-add6-e2755f2ae1d4:trunk%2F

fwiw, this happens when adding the very first inventory delta. I'm
specifying this to set the root file id:

[(None, '', '1 at 637f28bd-e311-0410-add6-e2755f2ae1d4:trunk%2F',
InventoryDirectory('1 at 637f28bd-e311-0410-add6-e2755f2ae1d4:trunk%2F',
'', parent_id=None,
revision='svn-v4:637f28bd-e311-0410-add6-e2755f2ae1d4:trunk:1'))]

Thanks!

Cheers,

Jelmer

-- 
Jelmer Vernooij <jelmer at samba.org> - http://samba.org/~jelmer/
Jabber: jelmer at jabber.fsfe.org




More information about the bazaar mailing list