Machine check exception, but what kind?

Kevin O'Gorman kogorman at gmail.com
Fri Apr 21 16:31:48 UTC 2017


I've been having trouble with two of my personal computers.  One is from
System76 and their great support staff suggested I load package mcelog to
monitor for machine check exceptions (MCE).  Sounded good to me, so I did
it on all my Ubuntu machines (I have 4 if you count laptops).

Lo and behold, one of the other machines glitched last night.  Not the
System76 one, but a home-brew I built myself (with a little help from my
friends).  It's got a medium-fast Core i-7 on an ASUS board. It was a
familiar occurrence:
- It had rebooted on its own and when I woke up it was asking me to log in
- On logging in I saw two popup dialogs that said there was an error
detected by a system program (but absolutely no other information about it)
and wanted permission to report it.  Even when I gave that permission, I
did not get a copy or any further information about what happened.
- /var/log/syslog showed the reboot sequence, but nothing particularly
helpful about the cause.

Pretty frustrating, but because I had installed mcelog, I also got this:
- /var/log/mcelog contained this:
mcelog: failed to prefill DIMM database from DMI data
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 4
MISC 7fbc6369a0eb ADDR 7fbc6369a0eb
TIME 1492751851 Thu Apr 20 22:17:31 2017
MCG status:
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
Running trigger `unknown-error-trigger'
STATUS be00000000800400 MCGSTATUS 0
MCGCAP c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 60
Hardware event. This is not a software error.
MCE 1
CPU 3 BANK 3
MISC 7fbc6369a0eb ADDR 7fbc6369a0eb
TIME 1492751851 Thu Apr 20 22:17:31 2017
MCG status:
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
Running trigger `unknown-error-trigger'
STATUS be00000000800400 MCGSTATUS 0
MCGCAP c09 APICID 6 SOCKETID 0
CPUID Vendor Intel Family 6 Model 60

So it looks like a hardware error.  It even says so, or at least "Hardware
event. This is not a software error."

Thing is the rest of this log is almost entirely opaque to me.  I do
understand the timestamp and "Vendor Intel" but that's about it.  I'm
wondering what actually happened, and if there's anyone on this list that
can explain.  In particular, does that first line, containing "DIMM"
suggest that there was a RAM memory-related problem?

I also wanted to alert anyone else who might be having trouble diagnosing a
recurring problem.  This package is in the regular repository, but is not
installed by default.  I think that's a shame.

-- 
Kevin O'Gorman
#define QUESTION ((bb) || (!bb))   /* Shakespeare */

Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20170421/0ccc1385/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 441 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20170421/0ccc1385/attachment.gif>


More information about the ubuntu-users mailing list