[RFC] optimizing bzr-grep

Parth Malwankar parth.malwankar at gmail.com
Tue Mar 16 14:27:44 GMT 2010


Hello,

I am working on optimizing bzr-grep searching specific revs[1].
I managed to get the time down from ~33s to ~23s for specific
rev search (e.g. -r last:1, not revision range). To get this down
further my experiments show that majority of the time is now
spent in:
    file_text = tree.get_file_text(id)

So, if grep takes ~23s, merely commenting out the above line
brings the time down to ~1.5s.

[emacs-bzr]% time bzr grep -r last:10 ffo > /dev/null
bzr grep -r last:10 ffo > /dev/null  19.19s user 3.77s system 99% cpu
23.054 total
[emacs-bzr]% time bzr grep -r last:10 ffo > /dev/null
bzr grep -r last:10 ffo > /dev/null  1.07s user 0.20s system 89% cpu 1.421 total

Is there anything I can do to speedup getting the full text of
a revision?

Another optimization comes to mind. bzr-grep checks the
first 1024 bytes of the file text before rejecting it as binary
or accepting it as text as continuing further. However, in order
to do the above check I still read the whole file using get_file_text.
I see this as an issue especially for large binary files.

Is there a API that would allow me to get just 1KByte chunk,
e.g. chunk = tree.get_file_text(id, size=1024)?
This way, if the tree has many binary files they won't be read
fully into the memory before getting rejecting, saving space
and time.

I would appreciate any suggestion or comments.

Regards,
Parth

[1] https://bugs.launchpad.net/bzr-grep/+bug/539429



More information about the bazaar mailing list