[MERGE, 1.4] Re: [1.4 REGRESSION] Fetch brutally slow for packs <=>knits.

Mon Apr 14 23:42:26 BST 2008

On Mon, 2008-04-14 at 20:59 +0300, John Arbash Meinel wrote:
> ~ calls
> ~   for rev_id in rev_ids:
> ~     KnitRepository._revision_store.has_revision_id(rev_id,
> transaction)
> ~ calls
> ~
> KnitRevisionStore.get_revision_file(transaction).has_version(revision_id)
> 
> However, because transactions are no longer caching any objects, this
> means that
> we re-open the revisions.kndx file for *every* revision_id which is
> not in common.
> 
> So if you have to push 100 revisions, we open revisions.kndx 100
> times. If you
> are pushing 900, you open it and parse it 900 times. AKA Brutally
> slow.
> 
> I think this is a *very* serious regression as it makes
> inter-repository work
> pretty much unusable.
> 
> So the "simple" fixes I can think of are:
> 
> 1) have KnitRevisionStore.get_revision_file() cache the object itself
> (since
> ~   transactions won't anymore). This is bad because transaction was
> the only
> ~   think that actually figured out the lifetime of caching.
>
> 2) Reinstate at least some knit object caching. Perhaps only for
> Inventory and
> ~   Revision objects. Though I have the feeling we would then run into
> the same
> ~   problem about Knit text objects having then .kndx read 100 times
> during the
> ~   fetch. The only real advantage here is that they are less likely
> to have as
> ~   many revisions to fetch.
> 
> 3) Put big "THIS IS BROKEN, UPGRADE TO THE SAME FORMAT REPOSITORIES"
> warnings in
> ~   our generic fetching code.
> 
> Even if you took out the for loop, we still generally only know about
> a small
> handful of revision ids we are interested in at a time. (Because if I
> commit 100
> linear revisions, you only know about the direct parent, and its
> parent, etc.)
> 
> There are some other possibilities I guess:
> 
> 4) Change the generic fetching code to break the api abstraction and
> call
> target._get_revision_vf() directly. And then use that object for
> "target.has_revision_id()" instead of going through the regular apis.
>
> 5) Slightly better than (4), implement a Knit<=>Pack fetcher which
> does the same
> thing. But at least you know that KnitRepo and KnitPackRepo will have
> the function.

6) the attached bundle uses the repository.get_graph() api rather than
repository.has_missing; this uses the Graph interface that for knits has
the revision vf as an instance variable.

Attached.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: regression.patch
Type: text/x-patch
Size: 4602 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080415/88d6b9eb/attachment-0001.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080415/88d6b9eb/attachment-0001.pgp