LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Frank van Maarseveen <frankvm@frankvm.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Machine check exception with a kernel dependency
Date: Fri, 15 Feb 2008 15:50:07 +0100	[thread overview]
Message-ID: <20080215145007.GA18341@janus> (raw)
In-Reply-To: <20080215132241.23823d43@core>

On Fri, Feb 15, 2008 at 01:22:41PM +0000, Alan Cox wrote:
> On Wed, 13 Feb 2008 17:25:28 +0100
> Frank van Maarseveen <frankvm@frankvm.com> wrote:
> 
> > On at least two Dell optiplex 755 systems with a Core 2 Duo I get
> > 
> > Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0000000000000004 
> > Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0000000000000005 
> > Feb 13 15:14:01 inari Bank 0: b200004000000800 
> > Feb 13 15:14:01 inari Bank 5: b200221024080400 
> > 
> > 2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
> > it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
> > didn't help either.
> 
> If you run the MCE numbers through a decoder what do you get back ?

I've some trouble decoding these in a convincing way. mcelog --core2
--ascii reports "MCG status:RIPV MCIP" for 0000000000000005 and "MCG
status:MCIP" for 0000000000000004.

I've collected several Bank # output lines:

#  text
---------------------------
26 Bank 0: b200004000000800
10 Bank 5: b200121014040400
 8 Bank 5: b200121020080400
 4 Bank 5: b200221010040400
 4 Bank 5: b200221024080400

but mcelog expects lines of the format

	CPU %u: Machine Check Exception: %16Lx Bank %d: %016Lx

(they got broken by netconsole) so I made these up:

CPU 1: Machine Check Exception: 0000000000000004 Bank 0: b200004000000800
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121014040400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121020080400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221010040400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221024080400

result:

CPU 1: Machine Check Exception: 0000000000000004 Bank 0: b200004000000800
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 0 MCG status:MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Originated-request Generic Memory-access Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout)
STATUS b200004000000800 MCGSTATUS 4

CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121014040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121014040400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121020080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121020080400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221010040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221024080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5


The problem also exists on an entirely different Xeon system with 4 cores:

cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           X3210  @ 2.13GHz
stepping        : 11


-- 
Frank

      reply	other threads:[~2008-02-15 14:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-13 16:25 Frank van Maarseveen
2008-02-14 14:54 ` 2.6.24 sysprof induced MCE on Core 2 Duo (was: Machine check exception with a kernel dependency) Frank van Maarseveen
2008-02-15 13:22 ` Machine check exception with a kernel dependency Alan Cox
2008-02-15 14:50   ` Frank van Maarseveen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080215145007.GA18341@janus \
    --to=frankvm@frankvm.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: Machine check exception with a kernel dependency' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).