Unable to start new processes
Chris MacDonald
chris at fourthandvine.com
Tue Aug 24 03:31:04 UTC 2010
Hello all,
You'll have to bear with me here; these are remote machines I don't
have physical access to, but a co-worker does so I'll be relaying most
requests for info or commands to be run through him when I can't
access them remotely.
I have fully-updated 10.04 installed on 12 different machines, all
Intel D945GCLF2's with 1GB of RAM and Micron eUSB 4GB flash drives
formatted ext3. After installing I would periodically run in to
problems where the devices would stop allowing me to SSH in.
>From my machine the problem manifests itself as an inability to
request much in the way of data from the remote machine, for instance,
when I SSH in (ssh -v) it opens a connection, attempts to negotiate a
session (I get a response from the remote machine), but then promptly
closes the connection remotely before I get prompted for a password.
Likewise for the running instance of Tomcat, I'll connect to the http
port, it will accept my connection, but before I get anything back it
closes the connection on me. I can ping the remote machine, it shows
ports as open, I just can't seem to get any data.
>From what I understand on the remote side of things, you can no longer
get any useful information from the machine. Every command the user
types in returns immediately to a new bash prompt and as a result,
issuing the 'reboot' command does nothing, as if it can no longer
start any new processes. This is making troubleshooting *extremely*
difficult as I can't figure out a way to get anything from the machine
while it's on. From what I can tell though, processes that are already
running remain running, for instance if I get someone to power-cycle
the machine, I can log in and initiate a VPN connection back to my
local machine, this remains active even if I can no longer SSH in to
the remote machine. Another example, snmp traps continue to be sent
periodically from the remote machine to my desktop (contain the system
uptime and sensor data from a serial-connected device).
The problem manifests itself periodically, I can't seem to establish a
pattern to it, but it happens to all of them eventually. If I don't
get someone to power-cycle the downed units, inside of two or three
days the majority (if not all) are unresponsive. I really have no idea
what the problem is here, nor how I might go about troubleshooting it
so I'm open to suggestions as to how to proceed.
Thanks in advance,
Chris
More information about the ubuntu-users
mailing list