Server stops responding

David McNally david3333333 at gmail.com
Wed Jul 29 14:06:00 UTC 2009


Hi. I know this isn't a very innovative idea, but how about you just reset
it every half month? (Just kidding, sort of)

I'm not a super expert with Ubuntu, but - just an idea - do you think the
problem have something to do with cron? I don't know much about cron, so I
don't know. It's probably not that, because no one else has said suggested
it yet.

David

On Sun, Jul 26, 2009 at 9:38 AM, Hal Burgiss <hal at burgiss.net> wrote:

> I have an issue with an 8.04 server, that about once a month, stops
> responding. It doesn't "crash", really, it just stops responding.
>
> Testing open ports:
>
> $ nmap example.com
>
> Starting nmap 3.70 ( http://www.insecure.org/nmap/ ) at 2009-07-26 08:33
> EDT
> Interesting ports on example.com:
> (The 1655 ports scanned but not shown below are in state: closed)
> PORT     STATE SERVICE
> 22/tcp   open  ssh
> 25/tcp   open  smtp
> 80/tcp   open  http
> 443/tcp  open  https
> 3306/tcp open  mysql
>
> Looks good. Problem is none of those will fully establish connection. An
> attempt to connect via ssh:
>
> $ tcpdump -v host example.com
> tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96
> bytes
>
> 08:35:07.529666 IP (tos 0x0, ttl  64, id 63108, offset 0, flags [DF], proto
> 6,
> length: 60) example2.com.48625 > example.com.ssh: S
> [tcp sum ok] 365499356:365499356(0) win 5840 <mss 1460,sackOK,timestamp
> 3810846040 0,nop,wscale 2>
>
> 08:35:07.530225 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF], proto 6,
> length: 60) example.com.ssh > example2.com.48625: S
> [tcp sum ok] 2913998847:2913998847(0) ack 365499357 win 5792 <mss
> 1460,sackOK,timestamp 143947824 3810846040,nop,wscale 6>
>
> 08:35:07.530281 IP (tos 0x0, ttl  64, id 63110, offset 0, flags [DF], proto
> 6,
> length: 52) example2.com.48625 > example.com.ssh: .
> [tcp sum ok] ack 1 win 1460 <nop,nop,timestamp 3810846041 143947824>
>
> But it dies right there. No further response at all. Consistently. Ever.
> Until
> the reset button is hit. Then runs flawlessly for a month or so.
>
> Typically what I find if I dig through log files is the system clock seems
> to
> get wierd. Example just prior to system going belly up:
>
>
> 65.55.110.76 - - [26/Jul/2009:06:51:08 -0400] "GET
> /academic-programs/teacher-education/ba-elementary-p-5 HTTP/1.1" 200
>
> 65.55.110.76 - - [26/Jul/2009:06:51:08 -0400] "GET
> /academic-programs/teacher-education/ba-elementary-p-5 HTTP/1.1" 200
>
> 123.149.115.33 - - [26/Jul/2009:06:41:34 -0400] "GET
> /academic-programs/teacher-education/ HTTP/1.1" 404 -
>
> 123.149.115.33 - - [26/Jul/2009:06:41:34 -0400] "GET
> /academic-programs/teacher-education/ HTTP/1.1" 404 - "-" "-"
>
> 74.6.22.182 - - [26/Jul/2009:07:45:07 -0400] "GET
> /alumni_development/endowingCampaign.html HTTP/1.0" 404 20
>
> 74.6.22.182 - - [26/Jul/2009:07:45:07 -0400] "GET
> /alumni_development/endowingCampaign.html HTTP/1.0" 404 20 "-" "Mozil
>
> 65.55.210.87 - - [26/Jul/2009:06:58:03 -0400] "GET
> /future-students/grad/why-mc
> HTTP/1.1" 200 20
>
> 65.55.210.87 - - [26/Jul/2009:06:58:03 -0400] "GET
> /future-students/grad/why-mc
> HTTP/1.1" 200 20 "-" "msnbot/1.1 (+http
>
> 74.6.22.182 - - [26/Jul/2009:07:45:08 -0400] "GET
> /calendar/athletics/2009-07-02
> HTTP/1.0" 404 20
>
> 74.6.22.182 - - [26/Jul/2009:07:45:08 -0400] "GET
> /calendar/athletics/2009-07-02
> HTTP/1.0" 404 20 "-" "Mozilla/5.0 (com
>
> 123.149.115.33 - - [26/Jul/2009:06:41:32 -0400] "GET
> /academic-programs/academic-calendar/ HTTP/1.1" 404 -
>
> 123.149.115.33 - - [26/Jul/2009:06:41:32 -0400] "GET
> /academic-programs/academic-calendar/ HTTP/1.1" 404 - "-" "-"
>
> This is a pretty active site. The correct time was 6:41.
>
> Typically there is not anything interesting in syslog, but this time there
> was
> a bunch oom-killer actions against apache processes at 7:45. The time is
> wrong
> and after the wierdness started so I don't know whether to trust this. Or
> whether its an effect or a cause of another problem.
>
> This server is headless in a datacenter, so I am limited with what I can do
> remotely (especially if I can't connect).
>
> Any ideas how to hunt this down?
>
> --
> Hal
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>



-- 
David McNally
david3333333 at gmail.com
apt-get moo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20090729/2fdd5d7c/attachment.html>


More information about the ubuntu-users mailing list