LKML Archive on
help / color / mirror / Atom feed
From: (Eric W. Biederman)
To: Ingo Molnar <>
Cc: "H. Peter Anvin" <>, Vivek Goyal <>,
	Neil Horman <>,,,,
Subject: Re: [PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path
Date: Wed, 06 Feb 2008 18:30:21 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <> (Ingo Molnar's message of "Thu, 7 Feb 2008 01:39:18 +0100")

Ingo Molnar <> writes:

> * Eric W. Biederman <> wrote:
>> Looking at the patch the local_irq_enable() is totally bogus.  As soon 
>> was we hit machine_crash_shutdown the first thing we do is disable 
>> irqs.
> yeah.
>> I'm wondering if someone was using the switch cpus on crash patch that 
>> was floating around.  That would require the ipis to work.
>> I don't know if nmi_exit makes sense.  There are enough layers of 
>> abstraction in that piece of code I can't quickly spot the part that 
>> is banging the hardware.
>> The location of nmi_exit in the patch is clearly wrong.  crash_kexec 
>> is a noop if we don't have a crash kernel loaded (and if we are not 
>> the first cpu into it), so if we don't execute the crash code 
>> something weird may happen.  Further the code is just more 
>> maintainable if that kind of code lives in machine_crash_shutdown.
> nmi_exit() has no hw effects - it's just our own bookeeping.

Alright so that should have no hardware effect then.

> the hw knows that we finished the NMI when we do an iret. Perhaps that's 
> the bug or side-effect that made the difference: via enabling irqs we 
> get an irq entry, and that does an iret and clears the NMI nested state 
> - allowing the kexec context to proceed? I suspect kexec() will do an 
> iret eventually (at minimum in the booted up kernel's context) - all 
> NMIs are blocked up to that point and maybe the APIC doesnt really like 
> being frobbed in that state? In any case, the local_irq_enable() is just 
> wrong - it's the worst thing a crashing kernel can do. Perhaps doing an 
> intentional iret with a prepared stack-let that just restores to 
> still-irqs-off state and jumps to the next instruction could 'exit' the 
> NMI context without really having to exit it in the kernel code flow?

Well I think this is slightly on the wrong track.  The original patch description
said  we get as far as purgatory.  Purgatory is the bit of C code form
/sbin/kexec that runs just before our second kernel.  It sets up
arguments and verify the target kernel has a valid sha256sum.  If
purgatory detects data corruption it spins, to prevent a corrupt
recovery kernel from doing something nasty.

It appears that the primary crash_kexec path is working fine.

The original description speculated that we had non-stopped cpus that
were telling the hardware to shut off.

I don't see what the hang is.  However the goal apparently is to make
the kexec on panic path more robust so that we can take crash dumps in
more strange cases.

We can get NMI from the nmi watchdogs so it is possible this happens
on legitimate hardware so there is a chance this is deterministic and
that we can get enough information to debug and fix the original

If part of the problem is getting to crash_kexec my inclination is to
move the call to crash_kexec up as early as possible in die_nmi.  As
we may simply be hanging in printk or something stupid like that.

It is weird that only the 32bit die_nmi path calls bust_spinlocks.

I'm not really happy with the secondary cpus taking whole notify_die
path as that is more general purpose infrastructure that might go
bad.  However it doesn't appear broken, and it should not be critical
to the crash dump process.


  reply	other threads:[~2008-02-07  1:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-06 19:25 Neil Horman
2008-02-06 19:40 ` Vivek Goyal
2008-02-06 20:12   ` Neil Horman
2008-02-06 20:21     ` H. Peter Anvin
2008-02-06 21:04       ` Neil Horman
2008-02-06 20:35     ` Vivek Goyal
2008-02-06 22:00 ` Ingo Molnar
2008-02-06 22:48   ` Vivek Goyal
2008-02-06 22:53     ` Ingo Molnar
2008-02-06 22:56     ` H. Peter Anvin
2008-02-06 23:36       ` Ingo Molnar
2008-02-06 23:50         ` Vivek Goyal
2008-02-07  0:31         ` Eric W. Biederman
2008-02-07  0:39           ` Ingo Molnar
2008-02-07  1:30             ` Eric W. Biederman [this message]
2008-02-07 12:17           ` Neil Horman
2008-02-07 12:24             ` Ingo Molnar
2008-02-07 20:37               ` Neil Horman
2008-02-08 16:14               ` Neil Horman
2008-02-08 16:45                 ` Vivek Goyal
2008-02-08 17:26                   ` Neil Horman
2008-02-12 21:08                   ` Neil Horman
2008-02-15 14:02                     ` Eric W. Biederman
2008-02-20 14:57                     ` Neil Horman
2008-02-08 16:54               ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH], issue EOI to APIC prior to calling crash_kexec in die_nmi path' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).