fulltext searching of bzr repositories

Robert Collins robertc at robertcollins.net
Mon Jun 9 14:08:22 BST 2008


https://edge.launchpad.net/bzr-search

In the weekend, I felt this was an interesting topic to have a stab at.

The result is a plugin which creates a search index. Currently it only
indexes the revision commit messages, but its pretty straight forward to
index additional components - all it needs is logic to generate a
posting list from them, and a Hit instance to provide reporting when the
results are found.

There are currently two caveats: The disk format is not finalised, so
users will need to remove the indices and recreate as I tweak it more.
And secondly, the reason for the disk format not being finalised - when
a posting list is more than 2K long, the bzrlib index bisection logic
will fail to parse the output index. This means that running it on bzr
itself fails :(. But it works fine on bzr-svn for instance:

plugins/svn/trunk$ time bzr index .

real    0m1.518s
user    0m1.140s
sys     0m0.100s
plugins/svn/trunk$ time bzr search workaround
Revision id 'jelmer at samba.org-20080511215646-kxxs86xvurf96nuq'. Summary:
'Fix workaround for bug in http ra backend.'

real    0m0.411s
user    0m0.324s
sys     0m0.080s

Cheers,
Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080609/5031857c/attachment.pgp 


More information about the bazaar mailing list