[MERGE] 22% Faster logs by optimizing get_texts

Sat Jun 17 22:39:44 BST 2006

You're right.  Currently, the assert's broken, too!  I'll fix that.

> I'm starting to wonder if we are hurting ourselves by working with
> everything as lines rather than working on them as string blobs.

I'm not sure whether that would have much effect.  It is rather annoying
converting content objects to lines to strings, but I think things like
gzip decoding are more costly.

> I know Weaves were very line based, but Knits don't have to be as line
> based. I suppose difflib is also line based.

Not so much line based as sequence based.  I think difflib's pretty
flexible about whether the entries in the sequence should be lines or
strings or anything.

> And while PatienceDiff is line based, that could be an implementation
> detail, rather than being a public api.

Right.  With Patience, lines are a convenient unit of meaning, but not
necessarily the only one.

> +    def get_text(self, version_id):
> +        """See VersionedFile.get_text"""
> +        return self.get_texts([version_id])[0]
> +
> +    def get_texts(self, version_ids):
> +        return [''.join(l) for l in self.get_line_list(version_ids)]
> +
> 
> 
> ...
> 
> This is something I just read on some Python tutorial. There is a
> compiled module 'operator'.
> And you can do:
> import operator
> needed_records.sort(key=operator.item_getter(1))
> Since it is a compiled C function, it should be faster than a lambda.

Interesting.  The difference is pretty marginal in read_records_iter().
 The whole thing is taking ~.5275, but switching to getitem can change
the inline time from .0295 to .0257.

My initial reaction is that it's not as clear as a lambda.  I guess
that's a pretty cruel thing to say :-).  But I don't have strong
feelings either way.

> 
>          if len(needed_records):
> +            needed_records.sort(key=lambda x:x[1])
>              # We take it that the transport optimizes the fetching as

> In general, I think it looks good. A relatively small change, just to
> request groups instead of one at a time.

It may look small, but it was a lot of work!

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFElHag0F+nu1YWqI0RAgtJAJ0dNvYAaUQF/3wijzHahenKtyxaTgCfQwAX
1y+oR0ogJ9k8pdY7pn8MWj4=
=1ssM
-----END PGP SIGNATURE-----