[BUG] (trivial) duplicate text in "bzr pull --help"

John Arbash Meinel john at arbash-meinel.com
Wed Jun 28 22:59:04 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>> Looking closely at the code, it seems like we always pull all of the
>>> file texts in all Fetchers. The Knit fetcher only is intelligent about
>>> fetching revisions.knit and revision-sigs.knit.
> 
> I don't understand.  If it were intelligent about fixing revisions.knit,
> it wouldn't be reading one at a time.

Well, its calling to_weave.join() for inventories, and for the Knit
fetcher, it is calling to_signature_file.join(), and to_revision.join().

And VersionedFile.join() is trying to grab the optimal
InterVersionedFile.join() which should be InterKnit.join()

Assuming it is (which is probably okay), then both inventories.knit and
revisions.knit should be calling InterKnit.join() which is what Robert
was saying.

And it seems to be calling 'read_records_iter_raw()', which is using
transport.readv().


I have seen what you are mentioning about get requesting one after the
next. But I thought that was for reading into a Knit when reading *from*
a Weave.
Which uses GenericRepoFetcher (I believe).

Which is the one that should be using 'get_revisions()' rather than
calling 'get_revision()' inside a loop.
But it needs a get_signature_texts() as well, since it is doing both
within the same loop.

> 
>>> The GenericRepoFetcher uses a loop over the revision_ids and fetches
>>> them one by one. So probably that could be updated to call
>>> get_revisions() instead of get_revision() many times.
>>>
>>> There is no multi-call (yet) for get_signature_texts(). But that could
>>> easily be implemented.
>>>
>>> I'm pretty sure that get_revisions() is a Repository API, that is
>>> implemented in all cases (it can just loop over get_revision() if that
>>> is all that is available).
> 
> Yes, I did make sure there was a fallback.
> 
>>>>>>> I think annotations should be considered a cache, and possibly invalid -
> 
>>>>> I think that would have some nasty side effects.  We would always have
>>>>> to reannotate before performing a knit merge, because we can't expect
>>>>> good results if the annotations are invalid.
> 
>>> I think the cached annotations can be considered 'as good as can be
>>> done'. Such that we realize re-diffing might return a different output
>>> (especially when changing diff algorithms).
>>> But at the time of creation/upgrading it was the best we could do.
> 
> I think it would be possible for the combination of two sets of
> annotations to produce an annotation in which two different lines were
> both attributed to a single original line.
> 
> So I think combining differently-annotated knits could indeed produce an
> invalid annotation.
> 
> Aaron

I can see your point. And it matters more if we start doing something
like line identity based merges.

I'm not sure if it is truly possible to mess up, though.
Because for any given revision, you would only pick 1 out of the 2
possible annotations.

I don't think we have true line-identity yet, though.

I'm trying to construct a failure case, and I seem to always poke holes
in it.

Specifically, I can see that one branch pulls over revisions in the top
of a file, and the other branch uses ones on the bottom. And they both
diverge from a common 'middle' line.
But when you finally merge the other branch, it will get resolved at
merge time.

I can see that on my branch I'm using Patience, and on yours you are
using difflib. So if we both pull from someone who was using weaves, we
would get different annotations for some lines in the same revision.
Then if I pull your revision based on the common revision...
But knits only annotate new lines, and delete old ones. So if you have a
new line, it would be replacing the old annotation.
I suppose if your new rev is a fulltext. But then we have decided to
pick your annotation for *all* of the lines.

So because of the properties that knits only annotate the changed lines,
or annotate *all* lines, I think we are okay.
If a knit would contain the hunk:

b foo
a bar
b baz

Where line 'bar' was not changed by revision 'b', it was just copied
along for simplicity. Then in your branch you might have annotated

a bar
z bar

and I would have annotated

y bar
a bar


And then if you change it to:
b foo
a bar
b baz
z bar

*and* your knit hunk records a single 'bab' hunk, not 2 'b' hunks.
If I pull your hunk into my repo, I would end up with something like:

b foo
a bar
b baz
a bar

Where both 'a bar' lines claim to be the same line from earlier.

I think KnitMerge would handle this, Weaves wouldn't let it happen. But
edge-merge would be confused because the same line identity is repeated.

Real edge-merge as written in Codeville wouldn't let it happen because
it keeps everything in a Weave where lines must have a position in the
weave. So you can't just pull hunks, you have to merge things into your
internal structure each time. Which breaks append-only stuff.
And certainly breaks the ability to just pull across other peoples
annotations.

I think the performance implication of not having to re-diff outweighs
the correctness of regenerating the diff each time. But I know you've
been interested in playing with edge-merge, so you may have stricter
requirements on the correctness of annotations.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEovuoJdeBCYSNAAMRAhRUAJ0YnPhWYmj2qPxni0ZpPo+XvKoFegCgmJwJ
38C7iZf25gEyYH8jueOJuuE=
=AjKv
-----END PGP SIGNATURE-----




More information about the bazaar mailing list