[MERGE] Faster code for pushing from packs back into knits

Tue Nov 20 17:49:45 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> As I mentioned to Robert earlier the code we have for packs => knits
> rebuids the complete history of every text that was modified. Very
> painful. The attached code just changes it to add them text by text.
> It isn't 100% optimal, because it extracts full texts one by one. However
> I'm trying to be more sensitive towards memory consumption. (I think a lot
> of our fetch() code buffers far too much in memory.)

bb:comment

This is very much a tradeoff.  Now that we have an LRU, we ought to be
able to make text extraction less memory-intensive, at the cost of
occasionally needing to regenerate old texts.

Have you looked at the performance impact on non-pack Knits or between
packs?  It looks like this patch might reduce performance significantly
there.

>    def _text_by_text_join(self, pb, msg, version_ids,
ignore_missing=False):

I don't find this name very intuitive.  I think _copy_texts would be
much better.

> +        # TODO: jam 20071116 It would be nice to have a streaming interface to
> +        #       get multiple texts from a source. The source could be smarter
> +        #       about how it handled intermediate stages.
> +        # TODO: jam 20071116 Consider using 'get_line_list' instead of lots of
> +        #       calls to get_lines()

These TODO lines look contradictory at first.  Could they maybe be unified?

Also, using mpdiffs (VF.make_mpdiffs, VF.install_mpdiffs) may be more
efficient than fulltexts, because in the single-parent case, you don't
have to do sequence matching.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHQx450F+nu1YWqI0RAiHiAJ4o5asQIlVNCnOfgW4wtxO+f8RqPwCeP7ns
s57QuOzqQIhwuXUrTU+nC8g=
=/89f
-----END PGP SIGNATURE-----