IRC log analysis for everyone!

Andrew Sayers andrew-bugs.launchpad.net at pileofstuff.org
Mon Jul 27 11:29:48 UTC 2009


This idea really interested me, and got me thinking about whether we 
could make a program to just let ordinary users search for 
previously-answered questions.  It wouldn't provide the same quality of 
answers as the "20 questions" approach we talked about before, but could 
at least do a bit of good.

Over the weekend, I had a go at writing a proof-of-concept based on 
Phil's general approach.  The result is attached, and it should show 
what I'm thinking, even if it's not my greatest ever technical work.

To run the program, do "./runme.py" and enter the start date.  The 
program will download and compile everything between the start date and 
the current date, then let you search for words used in questions.

Incidentally Phil, here are some thoughts I had while programming that 
might interest you:

"how" and "when" don't seem to be very good indicators of questions - 
they're mostly "I don't know how to do that" or "this happens when that 
happens".  People that do use "how" and "when" in a question generally 
end their statements with a "?" anyway, so I've removed them.

A fairly good indicator of a question is when someone hasn't talked for 
the past 24 hours, then makes a statement that is not of the form "name: 
message", "!command | name", etc.  A better indicator would be the first 
statement after joining a channel, but sadly the logs don't include that 
information.

I've downloaded the IRC logs to a folder, so that people won't hammer 
the servers if they want to reset their database a lot.

	- Andrew
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uirclog-andrew.tar.gz
Type: application/x-gzip
Size: 5656 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-doc/attachments/20090727/aa39afa7/attachment.bin>


More information about the ubuntu-doc mailing list