<div dir="ltr">On Mon, Nov 13, 2017 at 11:50 PM, Julian Andres Klode <span dir="ltr"><<a href="mailto:jak@debian.org" target="_blank">jak@debian.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">(forwarding this to ubuntu-devel-discuss and Zygmunt)<br>
<span><br>
On Mon, Nov 13, 2017 at 10:33:39PM -0800, Shawn Landden wrote:<br>
> Package: command-not-found<br>
> Severity: wishlist<br>
><br>
> I re-wrote command-not-found to get rid of the python dependancy, and<br>
> to reduce the database size, as to reduce memory usage.<br>
><br>
> <a href="https://github.com/shawnl/command-not-found" rel="noreferrer" target="_blank">https://github.com/shawnl/comm<wbr>and-not-found</a><br>
><br>
> I was preparing to upload it to mentors as command-not-found-ng<br>
<br>
</span>I also rewrote it years ago, but using the same database format,<br>
just in C. It was a lot faster. I don't understand the memory usage<br>
bit - it should not matter how large the database is, it's memory<br>
mapped, and not read into memory, as such memory usage should be<br>
roughly constant.<br>
<br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Questions/Comments for your approach:<br>
<br>
* Did you test your format on a slow HDD with caches dropped? It<br>
must not be slower than the Python one (that one is way too slow<br>
already) - I did, it seems to be faster (0.4 vs 0.68 seconds)<br>
- I believe the database-based C rewrite was even much faster,<br>
though.<br></blockquote><div>Yes, as the disk IO is all the time, I think its best to keep the file size small. Then it has more chance of staying in memory. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* update-command-not-found should use apt-get indextargets<br></blockquote><div>fixed <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* You don't store components, hence you cannot tell people to enable<br>
component. That's a very important use case for Ubuntu, where<br>
not all components are enabled by default, but the database is<br>
shipped in the package.<br>
<br>
You could just append /<component> to each package name I think,<br>
and strip it away when displaying.<br></blockquote><div>fixed <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* You should use getopt_long() to parse command-line options, and<br>
support -h, --help :)<br></blockquote><div>fixed <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* pts_lbsearch belongs into usr/lib/..., not usr/share/...<br></blockquote><div>the seperate binary is gone <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
* You don't implement a closest matches function:<br>
<br>
$ command-not-found thunderbrd<br>
No command 'thunderbrd' found, did you mean:<br>
Command 'thunderbird' from package 'thunderbird' (main)<br>
thunderbrd: command not found<br>
$ ./command-not-found thunderbrd<br>
thunderbrd: command not found<br>
<br>
This one is really important. People do make typos or misremember<br>
command names, so the tool needs to be able to deal with that<br>
<br>
Should be easy to implement though, although you might have to<br>
search multiple times - once for each alternative. All you need is<br>
<br>
def similar_words(word):<br>
""" return a set with spelling1 distance alternative spellings<br>
<br>
based on <a href="http://norvig.com/spell-correct.html" rel="noreferrer" target="_blank">http://norvig.com/spell-correc<wbr>t.html</a>"""<br>
alphabet = 'abcdefghijklmnopqrstuvwxyz-_'<br>
s = [(word[:i], word[i:]) for i in range(len(word) + 1)]<br>
deletes = [a + b[1:] for a, b in s if b]<br>
transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]<br>
replaces = [a + c + b[1:] for a, b in s for c in alphabet if b]<br>
inserts = [a + c + b for a, b in s for c in alphabet]<br>
return set(deletes + transposes + replaces + inserts)<br>
<br>
And search for what that returns. And you don't need to search for those<br>
at all if you have a direct match.<br>
<br></blockquote><div>fixed, and I believe bit-for-bit identical <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* It needs to be translated - also very important.<br></blockquote><div>I made a pot file and used translations from the python version, but I can't get my app to look for translations (as examined through strace). I read the gettext manual and do not know what I am doing wrong. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
* You need to Conflict with command-not-found and not Break AFAIUI<br>
<br></blockquote><div>fixed <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* You should not depend on grep, sed, coreutils, they are Essential.<br>
<br></blockquote><div>fixed, now it uses ruby as my shell was hacky. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* You do have to Depend on apt-file, as that configures apt to download<br>
the Contents files<br>
<br></blockquote><div>fixed <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
* You should not have identifiers starting with _ in the program, these<br>
are reserved for the C implementation (like _cleanup_free_).<br>
<br></blockquote><div>fixed <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Yes, and these are basically the same reasons my C prototype is<br>
not in the archive. Also, I did not put a lot of work into it, as<br>
I was waiting for PackageKit to take that over, but that was not<br>
done yet.<br>
<br>
I think it's a worthwhile approach, and I can see it replacing<br>
command-not-found if those tiny issues have been fixed. Then you<br>
could also avoid the -ng moniker, and just take over the main<br>
package (if Zygmunt does not mind), which also avoids a month<br>
long NEW process :)<br>
<span class="m_2069971577309996656HOEnZb"><font color="#888888"><br>
--<br>
Debian Developer - <a href="http://deb.li/jak" rel="noreferrer" target="_blank">deb.li/jak</a> | <a href="http://jak-linux.org" rel="noreferrer" target="_blank">jak-linux.org</a> - free software dev<br>
Ubuntu Core Developer de, en speaker<br>
</font></span></blockquote></div><br></div></div>