IRC log analysis for everyone!
Andrew Sayers
andrew-bugs.launchpad.net at pileofstuff.org
Mon Jul 27 11:29:48 UTC 2009
This idea really interested me, and got me thinking about whether we
could make a program to just let ordinary users search for
previously-answered questions. It wouldn't provide the same quality of
answers as the "20 questions" approach we talked about before, but could
at least do a bit of good.
Over the weekend, I had a go at writing a proof-of-concept based on
Phil's general approach. The result is attached, and it should show
what I'm thinking, even if it's not my greatest ever technical work.
To run the program, do "./runme.py" and enter the start date. The
program will download and compile everything between the start date and
the current date, then let you search for words used in questions.
Incidentally Phil, here are some thoughts I had while programming that
might interest you:
"how" and "when" don't seem to be very good indicators of questions -
they're mostly "I don't know how to do that" or "this happens when that
happens". People that do use "how" and "when" in a question generally
end their statements with a "?" anyway, so I've removed them.
A fairly good indicator of a question is when someone hasn't talked for
the past 24 hours, then makes a statement that is not of the form "name:
message", "!command | name", etc. A better indicator would be the first
statement after joining a channel, but sadly the logs don't include that
information.
I've downloaded the IRC logs to a folder, so that people won't hammer
the servers if they want to reset their database a lot.
- Andrew
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uirclog-andrew.tar.gz
Type: application/x-gzip
Size: 5656 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-doc/attachments/20090727/aa39afa7/attachment.bin>
More information about the ubuntu-doc
mailing list