[BUG] bundle cannot handle binary files

John Arbash Meinel john at arbash-meinel.com
Sat Jul 8 16:51:21 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belchenko wrote:
> Alexander Belchenko пишет:

...

> I propose next fix to avoid .splitlines() method and use fast
> alternative StringIO.readlines():
> 
> === modified file 'bzrlib/bundle/bundle_data.py'
> --- bzrlib/bundle/bundle_data.py    2006-06-22 18:24:25 +0000
> +++ bzrlib/bundle/bundle_data.py    2006-07-08 13:05:08 +0000
> @@ -723,4 +723,5 @@
>      from bzrlib.iterablefile import IterableFile
>      if file_patch == "":
>          return IterableFile(())
> -    return IterableFile(iter_patched(original,
> file_patch.splitlines(True)))
> +    return IterableFile(iter_patched(original,
> +                                     StringIO(file_patch).readlines()))
> 
> 
> In this file already use StringIO class from cStringIO module.
> After making this change I can merge my binary.patch successfully.
> 
> -- 
> Alexander

I think it is reasonable. I did a little bit of testing, and with
shorter strings, 'x.splitlines(True)' is faster (I had a file with about
20 lines, and I got:

7.24 usec	x.splitlines(True)
9.26 usec	cStringIO.StringIO(x).readlines()
188 usec	StringIO.StringIO(x).readlines()

Obviously we don't want to use StringIO :)

Anyway, then I tried it with 'bzrlib/branch.py', which is ~1400 lines
long. I was surprised to see this:

326 usec	x.splitlines(True)
297 usec	cStringIO.StringIO(x).readlines()
7.07 msec	StringIO.StringIO(x).readlines()

So for larger strings, it is actually faster to use a cStringIO
.readlines().

The other interesting thing is that Aaron's code just wants a line
iterator, it doesn't need a list/tuple/whatever. So you can actually
just do:

return IterableFile(iter_patched(original, StringIO(file_patch)))

However, that is probably slightly slower:

466 usec	[y for y in cStringIO.StringIO(x)]

Anyway, +1 to change it to StringIO(x).readlines(), but I think we need
a test case that shows bundles working with files with a '\r' in them.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEr9R5JdeBCYSNAAMRAkr4AJ46qLDFFe+fT25MZ23X35pVKEeZEgCgymOY
RkYnH0UCUFT6wVYH+3c6fv8=
=3N3/
-----END PGP SIGNATURE-----




More information about the bazaar mailing list