server crash after karmic upgrade
Patton Echols
p.echols at comcast.net
Sun Mar 21 22:52:29 UTC 2010
On 03/21/2010 03:26 AM, CLIFFORD ILKAY wrote:
> On 03/21/2010 04:53 AM, Patton Echols wrote:
>
>> On 03/19/2010 02:12 AM, Patton Echols wrote:
>>
>>> Yesterday I upgraded my home server to Karmic.
>>> The upgrade seems to have completed correctly. However, when I came
>>> home today, the server was off line, could not be pinged , no samba,
>>> no ssh, no intranet pages. The only way to restart was from the power
>>> switch. After restart, everything worked fine. There were some
>>> upgrades I applied when I ssh'ed in to the box, all seemed well.
>>> About an hour later, it was off line again.
>>>
>>> At this point, I don't even know the right questions to ask or what
>>> log files to check. Any suggestions? Tips for where to find more info?
>>>
>>>
>> Not even a guess where to start looking?
>>
>
> I'm running Fedora 12 on my desktop with the open source nVidia driver.
> The two latest kernel updates have both been broken for me so I'm using
> the kernel prior to those upgrades. While the symptoms weren't exactly
> the same as yours, they were strange. The system would boot fine. I
> could start KDE. Anywhere from minutes to hours later, I would lose
> control over the keyboard. The numlock indicator would stay on
> regardless, toggling capslock had no effect, and none of the sys request
> tricks worked All I could do was ssh into the box from my notebook and
> init 6.. The kernel update was four days ago. The system has been
> running continuously since then after rebooting with the older kernel.
>
> Assuming you didn't purge the older kernels, you might want to try
> booting using an older kernel and see how it goes. If that doesn't work,
> I'd be suspicious of your hardware. We had a server that exhibited
> unpredictable shutdowns and it turned out to be bad capacitors on the
> motherboard. A machine I had once exhibited similar problems. It turned
> out to be a dying hard disk drive. It just so happened that sectors on
> which parts of the OS were stored were defective and that would cause
> random shutdowns. Replacing the disk drive fixed the problem. I've also
> seen bad RAM causing all sorts of weird problems. You can try running
> memtest86 and something like DFT (Drive Fitness Test). Your hard disk
> drive manufacturer might have a utility on their web site.
>
> Such failures are very difficult to troubleshoot because you usually
> won't find anything useful in the logs. Sometimes all you can do is
> troubleshoot by the process of elimination. For instance, if booting
> from an older kernel doesn't help, you could move the hard disk to
> another machine and see if it misbehaves the same way. If it doesn't,
> you know it's something other than software or the hard disk. If it
> does, at least you've narrowed it down to either a hard disk problem or
> software, which is progress. Good luck.
>
Thanks for the insights. If it is a hardware issue, then it is a really
bad coincidence that it happened just after a distribution upgrade. The
physical disk location of the kernel or other OS part could explain, but
otherwise I do not believe in coincidence!
This server lives in a closet without a display. Since I was unable to
do anything with it for more than five minutes, I shut it down with the
"seven second press" until I had time to get back to it. Then I put a
monitor on it so I could determine whether it was just the network card
going off line or the entire system being unresponsive.
So I booted into the current kernel (2.6.31-20-generic-pae) did some log
searches etc. Back to the desktop, no problems. Now, if nothing
breaks, I am left with the unsettling discomfort that something may be
wrong, but no way to find out what! <Sigh>
If it breaks again I may try a "shutdown -F" to look into the disk issue
as the next step.
Thanks again.
-- PE
More information about the ubuntu-users
mailing list