[MERGE][REGRESSION][1.6] fetching knits => packs

John Arbash Meinel john at arbash-meinel.com
Mon Aug 18 22:35:19 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Wouter van Heyst wrote:
> On Mon, Aug 18, 2008 at 02:17:50PM -0500, John Arbash Meinel wrote:
>> John Arbash Meinel wrote:
>>> https://bugs.edge.launchpad.net/bzr/+bug/256757
>>>
>>> This bug is an example of a regression in fetching between knit repositories
>>> and pack repositories. It is a combination of the api friction for
>>> 'get_record_stream' and Robert's recent change to the fetch ordering (being
>>> determined by the target repository.)
>>>
>>> Specifically, the bug starts with a knit repository that is 70MB on disk (35MB
>>> --apparent). And it ends up 170MB after copying into a pack repository.
>>> Further, it uses 700+MB of memory for the copy.
>> So, this was introduced by Robert's change:
>>  3584 Canonical.com Patch Queue Manager	2008-07-29 [merge]
>>       (robertc) Give repository objects more control over the generic fetch
>>       	process. (Robert Collins)
>>
>> We haven't noticed the problem in general fetching, because it is rare that
>> you fetch a lot of texts and when you already have the left-hand parent in
>> your repository, it just re-does the delta.
>>
>>
>>> I haven't sorted out all of the details, but I believe the issue is in using
>>> 'unsorted' as the fetch order.
>> It is that we are using 'unsorted' and that we have the wrong actual value for
>> 'include_delta_closure'. The variable talks about 'fetch uses deltas' but
>> setting include_delta_closure=True actually says that fetch has to use full-texts.
>>
>> Attached is a patch which seems to fix both the memory consumption and the
>> size of the target repository issues.
>>
>> The attached patch fixes the logic snafu in all the places I've found. I'd
>> like to have a test for it, but I'm not really sure how to go about it. Would
>> it be creating a knit repo and a pack repo, commit in the knit, and fetch
>> across and ensure that the target objects are still deltas?
> 
> The logic reversions seem logical enough with your explanation, but I
> can't confidently claim to be able to back it up from a reading of the
> code. 
> 
> For example, I don't understand why your patch inverts logic on
> _fetch_just_revision_texts but not _fetch_revision_texts . I assume it
> has to do with the latter calling the former, but that's all I got. 

Because I missed that one. I don't think I saw it in the diff, possibly
because they were right next to eachother. They both should have "not" set. It
doesn't actually matter for signatures and revisions because they aren't ever
delta'd, but the *correct* value is 'not'.

John
=:->


> 
> Reading the pack code to see what get_record_stream actually does with
> include_delta_closure needs some more time. I agree the docstring
> suggests that your logic inversion is correct.
> 
> Wouter van Heyst
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIqesXJdeBCYSNAAMRAi7YAKC8/JRHr2+wpUyMPA+f4zYsTICKtgCgvZsV
mJT62r9z/2IT4e8JhC2Rz1M=
=ukr0
-----END PGP SIGNATURE-----



More information about the bazaar mailing list