[RFC] optimizing bzr-grep

John Arbash Meinel john at arbash-meinel.com
Thu Mar 18 13:38:18 GMT 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Parth Malwankar wrote:
> On Wed, Mar 17, 2010 at 2:29 AM, John Arbash Meinel
> <john at arbash-meinel.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Parth Malwankar wrote:
>>> Hello,
>>>
>>> I am working on optimizing bzr-grep searching specific revs[1].
>>> I managed to get the time down from ~33s to ~23s for specific
>>> rev search (e.g. -r last:1, not revision range). To get this down
>>> further my experiments show that majority of the time is now
>>> spent in:
>>>     file_text = tree.get_file_text(id)
>>>
>>  tree.iter_files_bytes()
>>
>> This was designed as a way to favor extraction speed. Specifically, it
> 
> Cheers for iter_files_bytes. The grep time for specific rev is now grep is
> down to 8.5s for emacs tree (from 33s initial and 23s earlier today).
> Its ~4s for working copy. This also has the optimization for -F/--fixed-string
> from earlier today.
> 
> [emacs-bzr]% time bzr grep -r last:10 ffo  > /dev/null
> bzr grep -r last:10 ffo > /dev/null  7.60s user 0.92s system 99% cpu 8.576 total
> [emacs-bzr]% time bzr grep -r last:10..last:9 ffo  > /dev/null
> bzr grep -r last:10..last:9 ffo > /dev/null  20.57s user 1.53s system
> 94% cpu 23.318 total
> [emacs-bzr]%
> 
> I plan to work some more to see if it can be optimized further. Range
> grep is still a little slow.

One thing to check for "Range grep" is that the text in -9 may be
identical to the text in -10. (completely unchanged). Are you grepping
it 2x, or just once?

You can tell if you look at tree.inventory[file_id].revision, I'm not
sure if we expose that anywhere else.
 for path, ie in tree.iter_entries_by_dir(): ie.revision
also works.

Anyway, you want the set of file_id,revision_id keys that are unique.
Note that I don't know how you are presenting this to the user. (How do
you tell them that 'ffo' is present in myfoo.txt in both revision 9 and
10, but it is a different text, or it is the same text, or ...)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuiLMoACgkQJdeBCYSNAANjuACfTVKJ5QaqhkbOeTelXLixllyd
cTcAn23jDDXZijZ5Pl3VBrIxzwAEBW+w
=bnMt
-----END PGP SIGNATURE-----




More information about the bazaar mailing list