[MERGE] Streaming-friendly container APIs
Andrew Bennetts
andrew at canonical.com
Thu Oct 25 15:31:25 BST 2007
John Arbash Meinel wrote:
> John Arbash Meinel has voted tweak.
> Status is now: Conditionally approved
> Comment:
> I'm a little surprised to see 0 direct tests for ContainerSerialiser
It emerged out of tidying other code, rather than something I wrote from
scratch. But it probably deserves direct tests. I'll add some and resubmit.
> + def _consume_until_byte(self, byte):
> + """Take all bytes up to the given out of the buffer, and return
> it.
> +
> + If the specified byte is not found in the buffer, the buffer is
> + unchanged and this returns None instead.
> + """
> + newline_pos = self._buffer.find('\n')
> + if newline_pos != -1:
> + line = self._buffer[:newline_pos]
> + self._buffer = self._buffer[newline_pos+1:]
> + return line
> + else:
> + return None
>
> ^- This seems to always assume 'byte' is '\n'. (Sounds like you need a
> test here)
Ah drat. I'd actually fixed that already, but hadn't committed it. I don't
need a separate _consume_until_byte method, just _consume_line. Anyway, fixed.
> ...
>
> + # XXX: This causes a memory copy of bytes in size, but is
> usually
> + # faster than two write calls (12 vs 13 seconds to output a gig
> of
> + # 1k records.) - results may differ on significantly larger
> records
> + # like .iso's but as they should be rare in any case and thus
> not
> + # likely to be the common case. The biggest issue is causing
> extreme
> + # memory pressure in that case. One possibly improvement here
> is to
> + # check the size of the content before deciding to join here vs
> call
> + # write twice.
> + return ''.join(byte_sections)
>
> You could change the api to return an iterator of bytes, so that you
> could do:
>
> if len(bytes) > XXX:
> return (''.join(byte_sections), bytes)
> else:
> byte_sections.append(bytes)
> return (''.join(byte_sections),)
>
> And then the caller would do:
> for section in self._serializer.bytes_record(...):
> self.write_func(section)
>
> I'm not sure that this is performance critical code, bit it is something
> to think about.
It is performance critical in that it affects the speed the speed we can emit
bundles.
That isn't a new comment (or new code), just slightly refactored code, so I'm
inclined not to worry much about this in this patch.
> I feel like you did fairly well a testing the bytes on the wire. I
> wonder about marking these tests somehow so that it is clear that they
> should not be changed as long as we want to maintain protocol
> compatibility.
Good point. I'll update the docstrings for the test cases at least to make that
clearer.
> + def test_readline(self):
> + """Test using readline() as ContainerReader does.
> +
> + This is always within a readv hunk, never across it.
> + """
> + transport = self.get_transport()
> + transport.put_bytes('sample', '0\n2\n4\n')
> + f = pack.ReadVFile(transport.readv('sample', [(0,2), (2,4)]))
> + results = []
> + results.append(f.readline())
> + results.append(f.readline())
> + results.append(f.readline())
> + self.assertEqual(['0\n', '2\n', '4\n'], results)
>
> ^- It seems a little odd to read all of the file with your readv. What
> if instead you did:
>
> transport.put_bytes('sample', '0\n2\n4\n6\n')
> f = pack.ReadVFile(transport.readv('sample', [(0, 2), (4, 4)]))
> results = []
> results.append(f.readline())
> results.append(f.readline())
> results.append(f.readline())
> self.assertEqual(['0\n', '4\n', '6\n'], results)
>
> I'm not sure why you are adding tests for ReadVFile when you aren't
> implementing ReadVFile. Though the tests themselves seem useful.
That is odd. I'm sure I didn't touch this, and this branch is a simple branch
off bzr.dev with no merges. Ah, it appears I must have accidentally copied and
pasted the existing tests. I've removed the duplicated readv tests. So, you
ignore that bit.
Thanks for the review!
-Andrew.
More information about the bazaar
mailing list