LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Frank van Maarseveen <frankvm@frankvm.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Machine check exception with a kernel dependency
Date: Fri, 15 Feb 2008 15:50:07 +0100 [thread overview]
Message-ID: <20080215145007.GA18341@janus> (raw)
In-Reply-To: <20080215132241.23823d43@core>
On Fri, Feb 15, 2008 at 01:22:41PM +0000, Alan Cox wrote:
> On Wed, 13 Feb 2008 17:25:28 +0100
> Frank van Maarseveen <frankvm@frankvm.com> wrote:
>
> > On at least two Dell optiplex 755 systems with a Core 2 Duo I get
> >
> > Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0000000000000004
> > Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0000000000000005
> > Feb 13 15:14:01 inari Bank 0: b200004000000800
> > Feb 13 15:14:01 inari Bank 5: b200221024080400
> >
> > 2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
> > it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
> > didn't help either.
>
> If you run the MCE numbers through a decoder what do you get back ?
I've some trouble decoding these in a convincing way. mcelog --core2
--ascii reports "MCG status:RIPV MCIP" for 0000000000000005 and "MCG
status:MCIP" for 0000000000000004.
I've collected several Bank # output lines:
# text
---------------------------
26 Bank 0: b200004000000800
10 Bank 5: b200121014040400
8 Bank 5: b200121020080400
4 Bank 5: b200221010040400
4 Bank 5: b200221024080400
but mcelog expects lines of the format
CPU %u: Machine Check Exception: %16Lx Bank %d: %016Lx
(they got broken by netconsole) so I made these up:
CPU 1: Machine Check Exception: 0000000000000004 Bank 0: b200004000000800
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121014040400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121020080400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221010040400
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221024080400
result:
CPU 1: Machine Check Exception: 0000000000000004 Bank 0: b200004000000800
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 0 MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Originated-request Generic Memory-access Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout)
STATUS b200004000000800 MCGSTATUS 4
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121014040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121014040400 MCGSTATUS 5
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200121020080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121020080400 MCGSTATUS 5
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221010040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 5
CPU 0: Machine Check Exception: 0000000000000005 Bank 5: b200221024080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5
The problem also exists on an entirely different Xeon system with 4 cores:
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU X3210 @ 2.13GHz
stepping : 11
--
Frank
prev parent reply other threads:[~2008-02-15 14:50 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-13 16:25 Frank van Maarseveen
2008-02-14 14:54 ` 2.6.24 sysprof induced MCE on Core 2 Duo (was: Machine check exception with a kernel dependency) Frank van Maarseveen
2008-02-15 13:22 ` Machine check exception with a kernel dependency Alan Cox
2008-02-15 14:50 ` Frank van Maarseveen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080215145007.GA18341@janus \
--to=frankvm@frankvm.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
--subject='Re: Machine check exception with a kernel dependency' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).