[RFC] optimizing bzr-grep

Fri Mar 19 04:45:50 GMT 2010

On Thu, Mar 18, 2010 at 7:08 PM, John Arbash Meinel
<john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Parth Malwankar wrote:
>> On Wed, Mar 17, 2010 at 2:29 AM, John Arbash Meinel
>> <john at arbash-meinel.com> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Parth Malwankar wrote:
>>>> Hello,
>>>>

>>
>> I plan to work some more to see if it can be optimized further. Range
>> grep is still a little slow.
>
> One thing to check for "Range grep" is that the text in -9 may be
> identical to the text in -10. (completely unchanged). Are you grepping
> it 2x, or just once?
>

Thats a very good point. Right now we just grep twice. It might be
much faster if we can skip this.

> You can tell if you look at tree.inventory[file_id].revision, I'm not
> sure if we expose that anywhere else.
>  for path, ie in tree.iter_entries_by_dir(): ie.revision
> also works.
>

I will explore this approach for speedup further.

> Anyway, you want the set of file_id,revision_id keys that are unique.
> Note that I don't know how you are presenting this to the user. (How do
> you tell them that 'ffo' is present in myfoo.txt in both revision 9 and
> 10, but it is a different text, or it is the same text, or ...)
>

Right now the format is filename~rev:line_num:text with line_num being
optional.

[grep]% bzr grep note -r last:3..last:2
test_grep.py~86:        # note: set --verbose/-v flag to get the skip message.
test_grep.py~86:        # note: set --verbose/-v flag to get the skip message.
INSTALL~86:Also, note that the plugin should be placed as 'grep' and NOT
test_grep.py~85:        # note: set --verbose/-v flag to get the skip message.
test_grep.py~85:        # note: set --verbose/-v flag to get the skip message.
INSTALL~85:Also, note that the plugin should be placed as 'grep' and NOT
[grep]%

So if the tree.inventory[file_id].revision is the same between two iterations
we can skip the text extraction and search to just print previous results.
Maybe the last rev grep result can be saved in a list, this shouldn't be too
much memory overhead. So we would just need to extract the lines that
starts_with(filename), change the revno between ~ and : and print the
same line.

Regards,
Parth