[MERGE] Streaming-friendly container APIs
John Arbash Meinel
john at arbash-meinel.com
Wed Oct 24 21:24:23 BST 2007
John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
I'm a little surprised to see 0 direct tests for ContainerSerialiser
+ def _consume_until_byte(self, byte):
+ """Take all bytes up to the given out of the buffer, and return
it.
+
+ If the specified byte is not found in the buffer, the buffer is
+ unchanged and this returns None instead.
+ """
+ newline_pos = self._buffer.find('\n')
+ if newline_pos != -1:
+ line = self._buffer[:newline_pos]
+ self._buffer = self._buffer[newline_pos+1:]
+ return line
+ else:
+ return None
^- This seems to always assume 'byte' is '\n'. (Sounds like you need a
test here)
...
+ # XXX: This causes a memory copy of bytes in size, but is
usually
+ # faster than two write calls (12 vs 13 seconds to output a gig
of
+ # 1k records.) - results may differ on significantly larger
records
+ # like .iso's but as they should be rare in any case and thus
not
+ # likely to be the common case. The biggest issue is causing
extreme
+ # memory pressure in that case. One possibly improvement here
is to
+ # check the size of the content before deciding to join here vs
call
+ # write twice.
+ return ''.join(byte_sections)
You could change the api to return an iterator of bytes, so that you
could do:
if len(bytes) > XXX:
return (''.join(byte_sections), bytes)
else:
byte_sections.append(bytes)
return (''.join(byte_sections),)
And then the caller would do:
for section in self._serializer.bytes_record(...):
self.write_func(section)
I'm not sure that this is performance critical code, bit it is something
to think about.
I feel like you did fairly well a testing the bytes on the wire. I
wonder about marking these tests somehow so that it is clear that they
should not be changed as long as we want to maintain protocol
compatibility.
+ def test_readline(self):
+ """Test using readline() as ContainerReader does.
+
+ This is always within a readv hunk, never across it.
+ """
+ transport = self.get_transport()
+ transport.put_bytes('sample', '0\n2\n4\n')
+ f = pack.ReadVFile(transport.readv('sample', [(0,2), (2,4)]))
+ results = []
+ results.append(f.readline())
+ results.append(f.readline())
+ results.append(f.readline())
+ self.assertEqual(['0\n', '2\n', '4\n'], results)
^- It seems a little odd to read all of the file with your readv. What
if instead you did:
transport.put_bytes('sample', '0\n2\n4\n6\n')
f = pack.ReadVFile(transport.readv('sample', [(0, 2), (4, 4)]))
results = []
results.append(f.readline())
results.append(f.readline())
results.append(f.readline())
self.assertEqual(['0\n', '4\n', '6\n'], results)
I'm not sure why you are adding tests for ReadVFile when you aren't
implementing ReadVFile. Though the tests themselves seem useful.
For details, see:
http://bundlebuggy.aaronbentley.com/request/%3C20071022070646.GU16660%40steerpike.home.puzzling.org%3E
More information about the bazaar
mailing list