[RFC] Joining of annotated and plain knits

Robert Collins robertc at robertcollins.net
Mon Sep 24 20:51:33 BST 2007


Hi Ian, some comments below...

On Mon, 2007-09-24 at 23:34 +1000, Ian Clatworthy wrote:
> 
> +
> +
> +def _anno_to_plain_converter(lines, version_id, options):
> +    """Convert annotated lines to plain lines."""
> +    if 'fulltext' in options:
> +        return [line.split(' ', 1)[1] for line in lines]
> +    else:
> +        # line delta - leave 1st line alone
> +        return [lines[0]] + [line.split(' ', 1)[1] for line in
> lines[1:]]

This is a bit naughty - the KnitFactory class currently has the parsing
and serialisation code, layer wise it would be better to do something
like
'if needs_conversion:
  content = self.source.factory.parse_(fulltext|delta)
  size, bytes = self.target.factory.lower_(fulltext|delta)
'

> +def _plain_to_anno_converter(lines, version_id, options):
> +    """Convert plain lines to annotated lines."""
> +    if 'fulltext' in options:
> +        return ['%s %s' % (version_id,line) for line in lines]
> +    else:
> +        # line delta - leave 1st line alone
> +        return [lines[0]] + ['%s %s' % (version_id,line) for line in
> lines[1:]]

This case fails because you are claiming to know the annotations, but
this is bogus - you don't. (As well as being ugly because you are
copying the deserialisation logic again)

Consider a fulltext that is not the initial commit - it will have
versions different than its version in the annotations. (And if your
tests don't pick this bug up, you need an additional test :))
.
And for deltas, consider a merge case:
version:parents    lines
D:[B, C]          [C\n, B\n]
C:[A]             [C\n, C\n]
B:[A]             [B\n, B\n]
A:[]              [A\n, A\n]

The annotations for D should be: [C, B], but the delta will be:
0, 1, C\n
(that is, replace from line 0, for 1 lines, with C\n). This is because
the current deltas only follow one parent. So the correct annotation is
[C, B] in D, but your code when transcribing D from plain to knits will
look at the delta and put D in there - giving annotations of [D, B].
What you need to know is what lines listed in the delta from C to D
where not introduced by D, but rather by something in the merge graph C.
This is what _merge_annotations in KnitVersionedFile._add does (for full
texts and deltas).

So, the simplest thing when converting from plain to annotated, the
'right thing' to do is probably to reconstruct the full text always,
then to call self.target.add_lines_with_ghosts - you can pass in the
convert parent records to speed this up (the last element of the return
value of add_lines_with_ghosts can be put in a dict and passed back in).

A more complex solution would be to take advantage of the deltas in the
chain of the merged parents to reduce the amount of data processing; but
honestly, lets get this robust and visibly correct, and see how long it
takes, before stressing.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070925/ac08706e/attachment.pgp 


More information about the bazaar mailing list