[RFC] Reworking 'commit' internals to work in texts rather than 'lines'

John Arbash Meinel john at arbash-meinel.com
Mon Apr 27 18:52:22 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So I was poking around with decreasing our peak memory consumption
during commit. (part of https://bugs.launchpad.net/bzr/+bug/109114 )

Anyway, I tracked down that at present during commit (*without* a delta)
we end up holding approximately 3-4 copies of the file text in memory.
Also, for my test file, the python object overhead is huge (average line
size is 21bytes, which is less than the 24byte PyString header.)

I prototyped an alternate code path here:
  https://code.edge.launchpad.net/~jameinel/bzr/1.15-peak-memory-109114

Note that it doesn't pass the test suite, it may eat your children, etc.

Mostly I just wanted to prototype something that worked with a single
text buffer, rather than lines. I was successful at avoiding copying the
data, and since I use a single string buffer, it ends up both *faster*
and much lower memory for 'initial commit'.

As a benchmark 'bzr commit' was approx 500MB and 6.0s, my prototype was
140MiB peak, and 4.6s.

The main change is to avoid "KnitVersionedFile.add_lines()" and instead
go for something along the lines of "KVF.add_text()" which takes a
single string.

This means that the "record_iter_changes" code can do "file.read()"
rather than "file.readlines()", etc. It also means that we can compress
the text without either iterating 5M lines (and the associated 5M*3
function calls to compress the bytes and compute crc, etc.)

I also think this will be a big win for 'dev6' repositories. Because in
that case there *is* no delta generated at commit time. So getting the
fulltext makes it all that much faster to shove it down into the
repository, and get on with your life. (The main downside is that we
won't even get cross-file deltas because we end up with 1-group per
file, rather than 1-group per commit.)

I just wanted to get some response on it, before I went to the effort of
actually implementing it.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkn18NYACgkQJdeBCYSNAAORzACfZpqqYbXvUgFY+oX0R7XyMF9t
q6YAmwdp1BsDFLrf5S1Zd2biel4hGMeq
=9v9V
-----END PGP SIGNATURE-----



More information about the bazaar mailing list