server crash after karmic upgrade

Patton Echols p.echols at comcast.net
Sun Mar 21 22:52:29 UTC 2010


On 03/21/2010 03:26 AM, CLIFFORD ILKAY wrote:
> On 03/21/2010 04:53 AM, Patton Echols wrote:
>   
>> On 03/19/2010 02:12 AM, Patton Echols wrote:
>>     
>>> Yesterday I upgraded my home server to Karmic.
>>> The upgrade seems to have completed correctly.  However, when I came
>>> home today, the server was off line, could not be pinged , no samba,
>>> no ssh, no intranet pages.  The only way to restart was from the power
>>> switch.  After restart, everything worked fine.  There were some
>>> upgrades I applied when I ssh'ed in to the box, all seemed well.
>>> About an hour later, it was off line again.
>>>
>>> At this point, I don't even know the right questions to ask or what
>>> log files to check.  Any suggestions?  Tips for where to find more info?
>>>
>>>       
>> Not even a guess where to start looking?
>>     
>
> I'm running Fedora 12 on my desktop with the open source nVidia driver. 
> The two latest kernel updates have both been broken for me so I'm using 
> the kernel prior to those upgrades. While the symptoms weren't exactly 
> the same as yours, they were strange. The system would boot fine. I 
> could start KDE. Anywhere from minutes to hours later, I would lose 
> control over the keyboard. The numlock indicator would stay on 
> regardless, toggling capslock had no effect, and none of the sys request 
> tricks worked All I could do was ssh into the box from my notebook and 
> init 6.. The kernel update was four days ago. The system has been 
> running continuously since then after rebooting with the older kernel.
>
> Assuming you didn't purge the older kernels, you might want to try 
> booting using an older kernel and see how it goes. If that doesn't work, 
> I'd be suspicious of your hardware. We had a server that exhibited 
> unpredictable shutdowns and it turned out to be bad capacitors on the 
> motherboard. A machine I had once exhibited similar problems. It turned 
> out to be a dying hard disk drive. It just so happened that sectors on 
> which parts of the OS were stored were defective and that would cause 
> random shutdowns. Replacing the disk drive fixed the problem. I've also 
> seen bad RAM causing all sorts of weird problems. You can try running 
> memtest86 and something like DFT (Drive Fitness Test). Your hard disk 
> drive manufacturer might have a utility on their web site.
>
> Such failures are very difficult to troubleshoot because you usually 
> won't find anything useful in the logs. Sometimes all you can do is 
> troubleshoot by the process of elimination. For instance, if booting 
> from an older kernel doesn't help, you could move the hard disk to 
> another machine and see if it misbehaves the same way. If it doesn't, 
> you know it's something other than software or the hard disk. If it 
> does, at least you've narrowed it down to either a hard disk problem or 
> software, which is progress. Good luck.
>   

Thanks for the insights.  If it is a hardware issue, then it is a really 
bad coincidence that it happened just after a distribution upgrade.  The 
physical disk location of the kernel or other OS part could explain, but 
otherwise I do not believe in coincidence! 

This server lives in a closet without a display.  Since I was unable to 
do anything with it for more than five minutes, I shut it down with the 
"seven second press" until I had time to get back to it.  Then I put a 
monitor on it so I could determine whether it was just the network card 
going off line or the entire system being unresponsive.

So I booted into the current kernel (2.6.31-20-generic-pae) did some log 
searches etc.  Back to the desktop, no problems.  Now, if nothing 
breaks, I am left with the unsettling discomfort that something may be 
wrong, but no way to find out what! <Sigh>

If it breaks again I may try a "shutdown -F" to look into the disk issue 
as the next step.

Thanks again.

-- PE




More information about the ubuntu-users mailing list