[MERGE] Streaming-friendly container APIs

John Arbash Meinel john at arbash-meinel.com
Wed Oct 24 21:24:23 BST 2007


John Arbash Meinel has voted tweak.
Status is now: Conditionally approved
Comment:
I'm a little surprised to see 0 direct tests for ContainerSerialiser

+    def _consume_until_byte(self, byte):
+        """Take all bytes up to the given out of the buffer, and return 
it.
+
+        If the specified byte is not found in the buffer, the buffer is
+        unchanged and this returns None instead.
+        """
+        newline_pos = self._buffer.find('\n')
+        if newline_pos != -1:
+            line = self._buffer[:newline_pos]
+            self._buffer = self._buffer[newline_pos+1:]
+            return line
+        else:
+            return None

^- This seems to always assume 'byte' is '\n'. (Sounds like you need a 
test here)


...

+        # XXX: This causes a memory copy of bytes in size, but is 
usually
+        # faster than two write calls (12 vs 13 seconds to output a gig 
of
+        # 1k records.) - results may differ on significantly larger 
records
+        # like .iso's but as they should be rare in any case and thus 
not
+        # likely to be the common case. The biggest issue is causing 
extreme
+        # memory pressure in that case. One possibly improvement here 
is to
+        # check the size of the content before deciding to join here vs 
call
+        # write twice.
+        return ''.join(byte_sections)

You could change the api to return an iterator of bytes, so that you 
could do:

if len(bytes) > XXX:
   return (''.join(byte_sections), bytes)
else:
   byte_sections.append(bytes)
   return (''.join(byte_sections),)

And then the caller would do:
   for section in self._serializer.bytes_record(...):
       self.write_func(section)

I'm not sure that this is performance critical code, bit it is something
to think about.


I feel like you did fairly well a testing the bytes on the wire. I
wonder about marking these tests somehow so that it is clear that they
should not be changed as long as we want to maintain protocol
compatibility.


+    def test_readline(self):
+        """Test using readline() as ContainerReader does.
+
+        This is always within a readv hunk, never across it.
+        """
+        transport = self.get_transport()
+        transport.put_bytes('sample', '0\n2\n4\n')
+        f = pack.ReadVFile(transport.readv('sample', [(0,2), (2,4)]))
+        results = []
+        results.append(f.readline())
+        results.append(f.readline())
+        results.append(f.readline())
+        self.assertEqual(['0\n', '2\n', '4\n'], results)

^- It seems a little odd to read all of the file with your readv. What
if instead you did:

transport.put_bytes('sample', '0\n2\n4\n6\n')
f = pack.ReadVFile(transport.readv('sample', [(0, 2), (4, 4)]))
results = []
results.append(f.readline())
results.append(f.readline())
results.append(f.readline())
self.assertEqual(['0\n', '4\n', '6\n'], results)

I'm not sure why you are adding tests for ReadVFile when you aren't
implementing ReadVFile. Though the tests themselves seem useful.



For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C20071022070646.GU16660%40steerpike.home.puzzling.org%3E



More information about the bazaar mailing list