[MERGE] Use a regex for fileids_altered_by_revision_ids

John Arbash Meinel john at arbash-meinel.com
Wed Dec 6 20:57:43 GMT 2006


The attached patch has a few changes to the
fileids_altered_by_revision_ids algorithm.

1) Instead of doing str.find() twice, using a regex to extract both
entries. One very interesting thing I found is that match.group() is
fairly expensive to call. So it really is better to call
  a, b = match.group('a', 'b')
rather than
  a = match.group('a')
  b = match.group('b')

I don't know exactly *why*, but in my lsprof timings it was adding about
800 ms (partially because we have to call it so many times).

2) Cache the unescaped xml => revision_id and xml => file_id strings.
--lsprof said that _unescape_xml() was about 10% of the total processing
time. And for all of bzr.dev's inventory, it is about 400,000 calls, but
we only have a few hundred file ids, and a few thousand revision ids. So
it seems like a good case for caching. Under --lsprof it changes the
time from 4.8s down to 2.0s. Though I think this is one of the cases
where lsprof is overly favoring the new form. But I still think it is
better than not doing it.

3) Only decode the file_id if the revision_id matches. This is sort of
obvious, but for merge revisions, there are a frequently lots of file
ids listed that aren't actually modified in this revision. With this
change, we don't need to spend any time in _unescape_xml.

These changes bring the total time spent in
fileids_altered_by_revision_ids down to 7.85s for my complete
inventory.knit. So it is another 250ms or so.

If people want to do some of this patch, but not all of it, I'm happy
enough to split it up.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fileids_altered.patch
Type: text/x-patch
Size: 3345 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061206/78b49411/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061206/78b49411/attachment.pgp 


More information about the bazaar mailing list