[RFC] optimizing bzr-grep

Wed Mar 17 02:21:57 GMT 2010

On Wed, Mar 17, 2010 at 5:29 AM, Martin Pool <mbp at canonical.com> wrote:
> On 17 March 2010 01:27, Parth Malwankar <parth.malwankar at gmail.com> wrote:

>> Is there anything I can do to speedup getting the full text of
>> a revision?
>

> Well, if by commenting this line out you're grepping a 0-byte string
> it wouldn't be surprising if it's fast :-)
>

So I took 4 measurements on hot cache on the emacs tree:

grep last:1 : ~26.4s
last:1 but return just after get_file_text (no grepping) : ~22.5s
return just before get_file_text (no grepping) : ~1.1s
grep working copy: ~4.3s

The above seems consistent as the working copy grep takes
around 4s.

> You should make sure you're holding a read lock on the whole
> repository for the whole time, so that things can be cached. -Drelock
> may help.
>
> Using log+file://.... for the repository may indicate inefficient IO.
>
> Using iter_file_bytes may be faster, or even better iter_files_bytes
> will let the repository choose a more efficient order.  This will also
> let you check for binaries inline with grepping.
>
> It may be faster to grep the whole thing as a string before splitting
> it into lines.
>
> Use --lsprof.
>
> Compare the time to grep a revision to the time to export it.
>

'bzr export' is surprisingly fast compared to the grep implementation.
It takes just ~7.8s to export the entire tree (-r last:10). I will look into
that to see how its doing this.

Thanks for all the pointers. I will experiment with them.

Regards,
Parth