[MERGE] 22% Faster logs by optimizing get_texts
Aaron Bentley
aaron.bentley at utoronto.ca
Sat Jun 17 22:39:44 BST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> components = {}
> if basis_versions:
> + assert True, "I am broken"
> + basis = self.basis_knit
>
> Shouldn't the above be 'assert False, "I am broken"
You're right. Currently, the assert's broken, too! I'll fix that.
> I'm starting to wonder if we are hurting ourselves by working with
> everything as lines rather than working on them as string blobs.
I'm not sure whether that would have much effect. It is rather annoying
converting content objects to lines to strings, but I think things like
gzip decoding are more costly.
> I know Weaves were very line based, but Knits don't have to be as line
> based. I suppose difflib is also line based.
Not so much line based as sequence based. I think difflib's pretty
flexible about whether the entries in the sequence should be lines or
strings or anything.
> And while PatienceDiff is line based, that could be an implementation
> detail, rather than being a public api.
Right. With Patience, lines are a convenient unit of meaning, but not
necessarily the only one.
> + def get_text(self, version_id):
> + """See VersionedFile.get_text"""
> + return self.get_texts([version_id])[0]
> +
> + def get_texts(self, version_ids):
> + return [''.join(l) for l in self.get_line_list(version_ids)]
> +
>
>
> ...
>
> This is something I just read on some Python tutorial. There is a
> compiled module 'operator'.
> And you can do:
> import operator
> needed_records.sort(key=operator.item_getter(1))
> Since it is a compiled C function, it should be faster than a lambda.
Interesting. The difference is pretty marginal in read_records_iter().
The whole thing is taking ~.5275, but switching to getitem can change
the inline time from .0295 to .0257.
My initial reaction is that it's not as clear as a lambda. I guess
that's a pretty cruel thing to say :-). But I don't have strong
feelings either way.
>
> if len(needed_records):
> + needed_records.sort(key=lambda x:x[1])
> # We take it that the transport optimizes the fetching as
> In general, I think it looks good. A relatively small change, just to
> request groups instead of one at a time.
It may look small, but it was a lot of work!
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFElHag0F+nu1YWqI0RAgtJAJ0dNvYAaUQF/3wijzHahenKtyxaTgCfQwAX
1y+oR0ogJ9k8pdY7pn8MWj4=
=1ssM
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list