Hi. I know this isn't a very innovative idea, but how about you just reset it every half month? (Just kidding, sort of)<br><br>I'm not a super expert with Ubuntu, but - just an idea - do you think the problem have something to do with cron? I don't know much about cron, so I don't know. It's probably not that, because no one else has said suggested it yet.<br>
<br>David<br><br><div class="gmail_quote">
On Sun, Jul 26, 2009 at 9:38 AM, Hal Burgiss <span dir="ltr"><<a href="mailto:hal@burgiss.net" target="_blank">hal@burgiss.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I have an issue with an 8.04 server, that about once a month, stops<br>
responding. It doesn't "crash", really, it just stops responding.<br>
<br>
Testing open ports:<br>
<br>
$ nmap <a href="http://example.com" target="_blank">example.com</a><br>
<br>
Starting nmap 3.70 ( <a href="http://www.insecure.org/nmap/" target="_blank">http://www.insecure.org/nmap/</a> ) at 2009-07-26 08:33 EDT<br>
Interesting ports on <a href="http://example.com" target="_blank">example.com</a>:<br>
(The 1655 ports scanned but not shown below are in state: closed)<br>
PORT STATE SERVICE<br>
22/tcp open ssh<br>
25/tcp open smtp<br>
80/tcp open http<br>
443/tcp open https<br>
3306/tcp open mysql<br>
<br>
Looks good. Problem is none of those will fully establish connection. An<br>
attempt to connect via ssh:<br>
<br>
$ tcpdump -v host <a href="http://example.com" target="_blank">example.com</a><br>
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes<br>
<br>
08:35:07.529666 IP (tos 0x0, ttl 64, id 63108, offset 0, flags [DF], proto 6,<br>
length: 60) example2.com.48625 > example.com.ssh: S<br>
[tcp sum ok] 365499356:365499356(0) win 5840 <mss 1460,sackOK,timestamp<br>
3810846040 0,nop,wscale 2><br>
<br>
08:35:07.530225 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto 6,<br>
length: 60) example.com.ssh > example2.com.48625: S<br>
[tcp sum ok] 2913998847:2913998847(0) ack 365499357 win 5792 <mss<br>
1460,sackOK,timestamp 143947824 3810846040,nop,wscale 6><br>
<br>
08:35:07.530281 IP (tos 0x0, ttl 64, id 63110, offset 0, flags [DF], proto 6,<br>
length: 52) example2.com.48625 > example.com.ssh: .<br>
[tcp sum ok] ack 1 win 1460 <nop,nop,timestamp 3810846041 143947824><br>
<br>
But it dies right there. No further response at all. Consistently. Ever. Until<br>
the reset button is hit. Then runs flawlessly for a month or so.<br>
<br>
Typically what I find if I dig through log files is the system clock seems to<br>
get wierd. Example just prior to system going belly up:<br>
<br>
<br>
65.55.110.76 - - [26/Jul/2009:06:51:08 -0400] "GET<br>
/academic-programs/teacher-education/ba-elementary-p-5 HTTP/1.1" 200<br>
<br>
65.55.110.76 - - [26/Jul/2009:06:51:08 -0400] "GET<br>
/academic-programs/teacher-education/ba-elementary-p-5 HTTP/1.1" 200<br>
<br>
123.149.115.33 - - [26/Jul/2009:06:41:34 -0400] "GET<br>
/academic-programs/teacher-education/ HTTP/1.1" 404 -<br>
<br>
123.149.115.33 - - [26/Jul/2009:06:41:34 -0400] "GET<br>
/academic-programs/teacher-education/ HTTP/1.1" 404 - "-" "-"<br>
<br>
74.6.22.182 - - [26/Jul/2009:07:45:07 -0400] "GET<br>
/alumni_development/endowingCampaign.html HTTP/1.0" 404 20<br>
<br>
74.6.22.182 - - [26/Jul/2009:07:45:07 -0400] "GET<br>
/alumni_development/endowingCampaign.html HTTP/1.0" 404 20 "-" "Mozil<br>
<br>
65.55.210.87 - - [26/Jul/2009:06:58:03 -0400] "GET<br>
/future-students/grad/why-mc<br>
HTTP/1.1" 200 20<br>
<br>
65.55.210.87 - - [26/Jul/2009:06:58:03 -0400] "GET<br>
/future-students/grad/why-mc<br>
HTTP/1.1" 200 20 "-" "msnbot/1.1 (+http<br>
<br>
74.6.22.182 - - [26/Jul/2009:07:45:08 -0400] "GET<br>
/calendar/athletics/2009-07-02<br>
HTTP/1.0" 404 20<br>
<br>
74.6.22.182 - - [26/Jul/2009:07:45:08 -0400] "GET<br>
/calendar/athletics/2009-07-02<br>
HTTP/1.0" 404 20 "-" "Mozilla/5.0 (com<br>
<br>
123.149.115.33 - - [26/Jul/2009:06:41:32 -0400] "GET<br>
/academic-programs/academic-calendar/ HTTP/1.1" 404 -<br>
<br>
123.149.115.33 - - [26/Jul/2009:06:41:32 -0400] "GET<br>
/academic-programs/academic-calendar/ HTTP/1.1" 404 - "-" "-"<br>
<br>
This is a pretty active site. The correct time was 6:41.<br>
<br>
Typically there is not anything interesting in syslog, but this time there was<br>
a bunch oom-killer actions against apache processes at 7:45. The time is wrong<br>
and after the wierdness started so I don't know whether to trust this. Or<br>
whether its an effect or a cause of another problem.<br>
<br>
This server is headless in a datacenter, so I am limited with what I can do<br>
remotely (especially if I can't connect).<br>
<br>
Any ideas how to hunt this down?<br>
<br>
--<br>
Hal<br>
<font color="#888888"><br>
--<br>
ubuntu-users mailing list<br>
<a href="mailto:ubuntu-users@lists.ubuntu.com" target="_blank">ubuntu-users@lists.ubuntu.com</a><br>
Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/ubuntu-users" target="_blank">https://lists.ubuntu.com/mailman/listinfo/ubuntu-users</a><br>
</font></blockquote></div><br><br clear="all"><br>-- <br>David McNally<br><a href="mailto:david3333333@gmail.com" target="_blank">david3333333@gmail.com</a><br>apt-get moo<br>