[RFC] Repository.get_file_texts API and planning for it

John Arbash Meinel john at arbash-meinel.com
Wed Aug 15 16:38:16 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> John Arbash Meinel wrote:
>>> Iterables of bytes is a very convenient one.  Text lines is nice only
>>> when working with text.  File objects have high API demands, but even
>>> strings are iterables of bytes.
>> You've said this in the past, and while I agree it is convenient, it has
>> some odd performance characteristics. Specifically (edited for clarity):
> 
> I've no doubt it does.
> 
>> So while I agree that iterable of bytes is a convenient and very
>> adaptable api. We really don't want to be passing a plain string to that
>> api.
> 
> It is very convenient for test cases, though.
> 
> So we can
> 1. recommend wrapping strings in lists (or maybe tuples-- how fast are
> they?)

For these purposes, I would say that lists are equivalent to tuples.
[many]  538us
(many)  544us
[single] 135us
(single) 126us

I'm guessing the difference is mostly noise. (Considering using StringIO
was 10x slower than [single], and plain string was about 100x slower.)

> 2. automatically wrap strings in lists/tuples as an optimization.
> 
> But I should say that passing in a single string to create_file suggests
> that you're not being memory-efficient, because you must have read a
> whole file into memory.

Well, often we already have. Certainly I could see an intermediate
generator style/custom class that can read incrementally.

> 
> So there are other reasons than writelines performance to avoid this in
> real code.
> 
> Aaron

Sure. Mostly I wanted to point out a standard api that takes "an
iterator of bytes", and point out why passing it the correct iterators
is important.

Also interesting from my results was that if you use an "iterator of
bytes" interface and you have a plain string, don't use StringIO() and
don't use a plain string, just wrap the string in a list (or tuple) instead.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGwx3oJdeBCYSNAAMRArvQAKCo259whBZJMzQv24CcbeczcY1HCQCgxFjG
5ruqvnKxVRp8DWQVbX5hiBY=
=5mqA
-----END PGP SIGNATURE-----



More information about the bazaar mailing list