bug search and source package names
Clint Byrum
clint at ubuntu.com
Wed Feb 2 17:38:12 UTC 2011
On Wed, 2011-02-02 at 19:09 +1300, Robert Collins wrote:
> Hi, as you may have noticed, bug search in Launchpad is not as fast as
> you might wish :). One of the contributing factors to the search
> performance is that we do a substring match for package names.
>
Could you cheat a little and add some tokens to the text that is
searched?
I presume right now the bug search uses postgres's text searching to
scan the descriptions of the bugs. So something like
WHERE searchable_description @@ 'ibche' OR package_name = '%ibche%'
So the question is, could you add this on save:
UPDATE foo SET searchable_description = concat (description,' ___packagename:',package_name)
Then you can build it as an | query in the text search?
WHERE searchable_description @@ to_tsquery('ibche | ___packagename:ibche')
Not sure how the ranking would be affected.
> We're looking at schema changes and additional short-term solutions
> (long term we're moving to a dedicated search engine such as Lucene),
> but I'm wondering - how important is this substring matching?
I think its an intuitive, and powerful feature. typing mysql, and
matching on php5-mysql is actually really, really important.
> What do I mean by this? If you type 'ibche' into the bug search for
> Ubuntu it will time out. But if it didn't timeout it would find all
> bugs on libchewing (because ibche is a substring on libchewing).
>
> If Ubuntu as a whole is open to this being removed temporarily(*) then
> we can drop some representative queries down from 7 seconds to 380ms
> with relatively little effort. We may be able to achieve this sort of
> result with more significant effort - but if its not actually a
> valuable feature, its much more efficient to disable it for a while.
>
If nothing else, allowing the user to turn it off in advanced search to
make things faster might be helpful.
Another thing might be to simply disable it for words less than 5 chars.
I'm assuming that pgsql's string match uses the shortcut of looking for
the last letter at strlen() chars, then jumping nchars, looking for it
again, and so on. Plus you definitely don't care if lib matches.
> -Rob
>
> (*): a year or maybe two. We hope to get to an overhaul of our search
> engine late 2011, and I'm positive we could reestablish this then if
> desirable.
>
Count me in as interested in helping with this. I've rolled out SOLR
before, and if the amount of single-concern document data is under 500GB
its a piece of cake. If we have more than that, well.. Lucandra or
SOLandra await us. :)
More information about the ubuntu-devel
mailing list