Machine check exception, but what kind?

J.Witvliet at mindef.nl J.Witvliet at mindef.nl
Sun Apr 23 09:51:14 UTC 2017


Run mem-check for atleast 24 hours.
HW errors are *not* binary; either or not present. Especially ESD related problems (you did take precautions, did you?) can take long time to manifest once in a while.

Verstuurd vanaf mijn iPhone

> Op 22 apr. 2017 om 00:25 heeft Joel Rees <joel.rees at gmail.com> het volgende geschreven:
>
>> On Sat, Apr 22, 2017 at 1:31 AM, Kevin O'Gorman <kogorman at gmail.com> wrote:
>> I've been having trouble with two of my personal computers.  One is from
>> System76 and their great support staff suggested I load package mcelog to
>> monitor for machine check exceptions (MCE).  Sounded good to me, so I did it
>> on all my Ubuntu machines (I have 4 if you count laptops).
>
> I assume you have been reading
>
>    https://www.mcelog.org/
>
> I'm seeing a lot of useful information there. Maybe I'll try it out.
>
> If you haven't read the manpage and the FAQ, ...
>
>> Lo and behold, one of the other machines glitched last night.  Not the
>> System76 one, but a home-brew I built myself (with a little help from my
>> friends).  It's got a medium-fast Core i-7 on an ASUS board. It was a
>> familiar occurrence:
>> - It had rebooted on its own and when I woke up it was asking me to log in
>> - On logging in I saw two popup dialogs that said there was an error
>> detected by a system program (but absolutely no other information about it)
>> and wanted permission to report it.  Even when I gave that permission, I did
>> not get a copy or any further information about what happened.
>
> Did you read the page on triggers? (Mentioned also in the FAQ.)
>
>> - /var/log/syslog showed the reboot sequence, but nothing particularly
>> helpful about the cause.
>>
>> Pretty frustrating, but because I had installed mcelog, I also got this:
>> - /var/log/mcelog contained this:
>> mcelog: failed to prefill DIMM database from DMI data
>
> I saw something about that in the FAQ.
>
>> Hardware event. This is not a software error.
>> MCE 0
>> CPU 0 BANK 4
>> MISC 7fbc6369a0eb ADDR 7fbc6369a0eb
>> TIME 1492751851 Thu Apr 20 22:17:31 2017
>> MCG status:
>> MCi status:
>> Uncorrected error
>> Error enabled
>> MCi_MISC register valid
>> MCi_ADDR register valid
>> Processor context corrupt
>> MCA: Internal Timer error
>> Running trigger `unknown-error-trigger'
>> STATUS be00000000800400 MCGSTATUS 0
>> MCGCAP c09 APICID 0 SOCKETID 0
>> CPUID Vendor Intel Family 6 Model 60
>
>> Hardware event. This is not a software error.
>> MCE 1
>> CPU 3 BANK 3
>> MISC 7fbc6369a0eb ADDR 7fbc6369a0eb
>> TIME 1492751851 Thu Apr 20 22:17:31 2017
>> MCG status:
>> MCi status:
>> Uncorrected error
>> Error enabled
>> MCi_MISC register valid
>> MCi_ADDR register valid
>> Processor context corrupt
>> MCA: Internal Timer error
>> Running trigger `unknown-error-trigger'
>> STATUS be00000000800400 MCGSTATUS 0
>> MCGCAP c09 APICID 6 SOCKETID 0
>> CPUID Vendor Intel Family 6 Model 60
>>
>> So it looks like a hardware error.  It even says so, or at least "Hardware
>> event. This is not a software error."
>
> Two, in fact.
>
>> Thing is the rest of this log is almost entirely opaque to me.  I do
>> understand the timestamp and "Vendor Intel" but that's about it.  I'm
>> wondering what actually happened, and if there's anyone on this list that
>> can explain.  In particular, does that first line, containing "DIMM" suggest
>> that there was a RAM memory-related problem?
>
> It does, but did you check the glossary?
>
>> I also wanted to alert anyone else who might be having trouble diagnosing a
>> recurring problem.  This package is in the regular repository, but is not
>> installed by default.  I think that's a shame.
>
> You might want to look up EDAC. I see it mentioned in the FAQ.
>
>
>> --
>> Kevin O'Gorman
>> #define QUESTION ((bb) || (!bb))   /* Shakespeare */
>>
>> Please consider the environment before printing this email.
>>
>
> Happy hunting.
>
> --
> Joel Rees
>
> I'm imagining I'm a novelist:
> http://joel-rees-economics.blogspot.com/2017/01/soc500-00-00-toc.html
> More of my delusions:
> http://reiisi.blogspot.jp/p/novels-i-am-writing.html
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users

Dit bericht kan informatie bevatten die niet voor u is bestemd. Indien u niet de geadresseerde bent of dit bericht abusievelijk aan u is toegezonden, wordt u verzocht dat aan de afzender te melden en het bericht te verwijderen. De Staat aanvaardt geen aansprakelijkheid voor schade, van welke aard ook, die verband houdt met risico's verbonden aan het elektronisch verzenden van berichten.

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. The State accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages.



More information about the ubuntu-users mailing list