[1.4 REGRESSION] Fetch brutally slow for packs <=>knits.

Mon Apr 14 18:59:40 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So I think I managed to debug why things are so much slower. Basically, it comes
down to:

~  GenericRepoFetcher._revids_to_fetch()
~ calls
~   to_repository.search_missing_revision_ids()
~ calls
~   InterRepository.search_missing_revision_ids()
~ calls
~   InterRepository._walk_to_common_revisions([revision_id])
~ calls
~   target.has_revisions(next_revs)
~ calls
~   for rev_id in rev_ids:
~     KnitRepository._revision_store.has_revision_id(rev_id, transaction)
~ calls
~     KnitRevisionStore.get_revision_file(transaction).has_version(revision_id)

However, because transactions are no longer caching any objects, this means that
we re-open the revisions.kndx file for *every* revision_id which is not in common.

So if you have to push 100 revisions, we open revisions.kndx 100 times. If you
are pushing 900, you open it and parse it 900 times. AKA Brutally slow.

I think this is a *very* serious regression as it makes inter-repository work
pretty much unusable.

So the "simple" fixes I can think of are:

1) have KnitRevisionStore.get_revision_file() cache the object itself (since
~   transactions won't anymore). This is bad because transaction was the only
~   think that actually figured out the lifetime of caching.

2) Reinstate at least some knit object caching. Perhaps only for Inventory and
~   Revision objects. Though I have the feeling we would then run into the same
~   problem about Knit text objects having then .kndx read 100 times during the
~   fetch. The only real advantage here is that they are less likely to have as
~   many revisions to fetch.

3) Put big "THIS IS BROKEN, UPGRADE TO THE SAME FORMAT REPOSITORIES" warnings in
~   our generic fetching code.

Even if you took out the for loop, we still generally only know about a small
handful of revision ids we are interested in at a time. (Because if I commit 100
linear revisions, you only know about the direct parent, and its parent, etc.)

There are some other possibilities I guess:

4) Change the generic fetching code to break the api abstraction and call
target._get_revision_vf() directly. And then use that object for
"target.has_revision_id()" instead of going through the regular apis.

5) Slightly better than (4), implement a Knit<=>Pack fetcher which does the same
thing. But at least you know that KnitRepo and KnitPackRepo will have the function.

Anyway, there are a few options, but I honestly consider the 1.4rc1 release to
be "broken" if it performs this badly fetching between knits <=> packs. (Note
that there are several canonical groups that decided they had to downgrade to
1.3.1 just to get their work done.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkgDm4wACgkQJdeBCYSNAAMU4gCeJNEeqmSIVLjU3NeQqZB0efTI
zCAAoMU8G4guerGr30ePWnALJDrxLspN
=YGAS
-----END PGP SIGNATURE-----