[RFC] KnitIndex Parsing

John Arbash Meinel john at arbash-meinel.com
Mon Jul 2 22:39:53 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm trying to respond to the issues that Alexander raised when we are
dealing with invalid/semi-valid data in a .kndx file.

The primary difficulty is that we want to allow some inconsistencies
without considering things corrupt, but any more than that and we want
to fail.

Specifically, any line that was not written out completely is considered
ignored, because probably it just means that we were interrupted in the
middle of transmission. Which is something we want to recover gracefully
from.

However, what should we do in the case of actual bad data, is it okay to
have bad data on one line effect its surrounding data?

The reason this is an issue, is because the current format *starts* a
new record with '\n' and ends it with ' :'. But all of the current
parsers assume that a given record starts *after* '\n' and is terminated
by '\n'.

So as the parser is going through the lines, it searches for '\n', and
splits on that, and then splits each line up further. But that means
that when you write new data, if you don't start with '\n', the previous
record has been altered.

I was considering changing how the parser worked, so that it would
actually do a search for ' :'. There are some issues with that, though.

It is possible to have ':' at the beginning of a revision id. It might
be weird, but it has never been forbidden, and the current parsers would
be fine with it. We *want* to allow ':' in revision ids, since all of
the converters use it as a namespace prefix (Arch-v1:, svn...:, etc). In
fact, we have explicitly declared that names *ending* in ':' are
reserved Bazaar revision ids (for 'null:', and 'current:').

I'm hesitant to make the parser super robust, as it is one of the core
loops that we currently encounter. But I'm thinking we should:

1) come up with some sort of a decision, so that I can get my pyrex
patch merged
2) Keep this sort of thing in mind for future formats, so that we can
make existing data a bit more stable against future writers.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGiXCpJdeBCYSNAAMRAgmdAKCfdp2Hsckxs9uln9YOpk73W1s0ygCgm1UO
kSMe0CMxpniNxaKGPCo65mM=
=khnA
-----END PGP SIGNATURE-----



More information about the bazaar mailing list