[MERGE] Container format basic implementation

Andrew Bennetts andrew at canonical.com
Fri Jun 15 17:02:39 BST 2007


Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Aaron Bentley wrote:
> > Or alternatively, if ContainerReader demanded a file-like object, then
> > you wouldn't need to implement read_line, either.  Seek would also be a
> > great thing to support, for skipping undesired records.  This option
> > really makes the most sense to me.

Yeah, we definitely want to support seeking over undesired records, if the
source supports that — files can seek, but sockets can't.  The best we can do
with sockets is minimise processing of bytes we don't care about.  Of course,
with containers streamed over sockets we shouldn't normally want to seek, as we
shouldn't be sending uninteresting records if we can help it.

> I've discovered that for processing a 56K bundle, 95% of its time is
> spent in BaseReader._readline.  It is called 946 times, resulting in 29,
> 914 calls to BaseReader.reader_func.

That's interesting, although not surprising, as I haven't yet tried to profile
or optimise this code.  It's probably time I wrote my first addition to our
benchmark suite.  I do want this layer to be as fast as possible.

> So I recommend fixing BaseReader._read_line, preferably by passing in a
> file-like object.

My tentative plan is to change it to be a push-based API, like
bzrlib.smart.protocol, rather than the current pull-based one.  So rather than
handing the constructor a source to read from, I'll put an accept_bytes method
on the Reader, and perhaps also next_read_size (which gives the source a hint
about the minimum number of bytes expected).  It makes sense to me to try to
re-use an approach that's already worked.

This is a particularly convenient for sockets and pipes, which can just keep
feeding bytes in as soon as they arrive.  A file source can just read in e.g. 4k
chunks and feed those in — I'll write a helper to do this.

I'm not sure what the best way for the reader to inform the source that it
should seek is in this proposed scheme.  Perhaps just a return value from
accept_bytes...

Thank you very much for this feedback, it's very helpful!

-Andrew.




More information about the bazaar mailing list