[MERGE] Read .kndx files in pyrex (updated)

Mon Jul 2 19:40:15 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> Alexander Belchenko wrote:
>> John Arbash Meinel ?8H5B:
>>> Attached is a revised version which should address all of your comments.
>> John, I try your patch on win32.
>> Unfortunately, after I compile pyx to C-extension I have 1 failed test:
> 
>> C:\Temp\bzr+pyrex>python bzr --no-plugins selftest test_knit
>>        bzr: C:/Temp/bzr+pyrex/bzr
>>     bzrlib: C:\Temp\bzr+pyrex\bzrlib
> 
>> FAIL: #28 test_knit.BasicKnitTests.test_knit_format
>>     not equal:
>> a = ['revid', 'revid2']
>> b = ['revid']
> 
> ...

So I tracked down why it is failing. And arguably it is because the test is bad
on Windows. Specifically the test has a file with:

"bzr knit index 8\n"
"\nrev1 ... :"
"\nrev2 ... :"

So all of those lines start with '\n' and end with ' :'

And then it does:

indexfile = open('test.kndx', 'at')
indexfile.write('\nrevid3 ... .phwoar:demo ')

There are a couple things that this does:

1) Test that writing out an incomplete line fails. Even though there is a ':'
and a trailing space, there is no final ' :'.
2) On win32 it actually is writing '\r\nrevid3...'

Which looks like:
"bzr knit index 8\n"
"\nrev1 ... :"
"\nrev2 ... :\r"
"\nrev3 ... .phwoar:demo "

So in some ways, I would like writing extra data to an existing line to not
corrupt the previous lines. This succeeds with the current code path because we
use:

pieces = line.split()
if pieces[-1] != ':':
  raise

Which will ignore any consecutive whitespace. Which means that:

"\nrev2 ...:\r\t\t \r \t"
"\nrev3 ..."

Will still parse. However, if there are any other characters, like:

"\nrev2 ...:r"
"\nev3 ..."

It will break.

So there is a least a small concern that opening the file in 'at' mode is
invalid. We always write to the files in 'ab' mode, and read from them in 'rb'
mode.

Second, should we change the parser? So that we look for " :" as the delimiter,
rather than just breaking on newlines.

That has the nice property that once a line is correctly written, any future
writes cannot "break" it. It is probably a bit slower to do

end = line.search(' :')
if end < 0:
  raise
pieces = line[:end].split()

I would also consider changing the line to be:

pieces = line.split(' ')
assert pieces[-1] == '\n' # we might just ignore any trailing data
assert pieces[-2] == ':'
...

But note, that will still fail for "\nrevid...:foo" because the ':foo' will be
the final chunk, and not just ':'.

So the 'simple' fix is to just update the test so it doesn't write '\r'. The
complex fix means changing both parsers, since the first one is only succeeding
by accident.

I do want this to get in, so I'm going to create the easy fix, and then look at
doing the hard one. Attached is the patch which makes the test pass. It
honestly wasn't the right place for the test anyway. Tests for corrupt knits
should be done at a lower level direct KnitIndex test. Especially since that
means they will be tested against both implementations.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGiUaOJdeBCYSNAAMRApS8AJ99ko3TnIVJTcOa/90WxQdTKSxN8gCfW1g5
d3xXt28Q09TBUB4lTu6qj+o=
=t7gQ
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: knit_index_pyrex.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20070702/72442dff/attachment-0001.diff