LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: l.genoni@oltrelinux.com
To: linux-kernel@vger.kernel.org
Subject: Re: System crash after "No irq handler for vector" linux 2.6.19
Date: Tue, 23 Jan 2007 11:19:10 +0100 (CET)	[thread overview]
Message-ID: <Pine.LNX.4.64.0701231118290.22876@Phoenix.oltrelinux.com> (raw)

reproduced.
it took more or less one hour to reproduce it.  I could reproduce it olny
running also irqbalance 0.55 and commenting out the sleep 1. The message 
in
syslog is the same and then, after a few seconds I think, KABOM! system 
crash
and reboot.

I tested also a similar system that has 4 dual core CPU Opteron 2600MHZ. 
On
this system (linux sees 8 CPU, but it is the same kernel, same gcc, same
config, same glibc, same active services) I could not reproduce it even
running irqbalance 0.55 in almost 1 hour. Maybe I could reproduce it 
waiting
for more time, but my users need to do their work, so I could not have a
longer test window. So on 16 CPU I had the crash, on 8 CPU I had no crash.

I need to give back the system to the users, so if you need other tests,
please, tell me as soon.

thanx

Luigi Genoni

On Monday 22 January 2007 18:14, Eric W. Biederman wrote:
> "Luigi Genoni" <luigi.genoni@pirelli.com> writes:
> > (e-mail resent because not delivered using my other e-mail account)
> >
> > Hi,
> > this night a linux server 8 dual core CPU Optern 2600Mhz crashed just
> > after giving this message
> >
> > Jan 22 04:48:28 frey kernel: do_IRQ: 1.98 No irq handler for vector
>
> Ok.  This indicates that the hardware is doing something we didn't 
expect.
> We don't know which irq the hardware was trying to deliver when it
> sent vector 0x98 to cpu 1.
>
> > I have no other logs, and I eventually lost the OOPS since I have no 
net
> > console setled up.
>
> If you had an oops it may have meant the above message was a secondary
> symptom.  Groan.  If it stayed up long enough to give an OOPS then
> there is a chance the above message appearing only once had nothing
> to do with the actual crash.
>
> How long had the system been up?
>
> > As I said sistem is running linux 2.6.19 compiled with gcc 4.1.1 for 
AMD
> > Opteron (attached see .config), no kernel preemption excepted the BKL
> > preemption. glibc 2.4.
> >
> > System has 16 GB RAM and 8 dual core Opteron 2600Mhz.
> >
> > I am running irqbalance 0.55.
> >
> > any hints on what has happened?
>
> Three guesses.
>
> - A race triggered by irq migration (but I would expect more people to 
be
> yelling). The code path where that message comes from is new in 2.6.19 
so
> it may not have had all of the bugs found yet :(
> - A weird hardware or BIOS setup.
> - A secondary symptom triggered by some other bug.
>
> If this winds up being reproducible we should be able to track it down.
> If not this may end up in the files of crap something bad happened that
> we don't understand.
>
> The one condition I know how to test for (if you are willing) is an
> irq migration race.  Simply by triggering irq migration much more often,
> and thus increasing our chances of hitting a problem.
>
> Stopping irqbalance and running something like:
> for irq in 0 24 28 29 44 45 60 68 ; do
>       while :; do
>               for mask in 1 2 4 8 10 20 40 80 100 200 400 800 1000 2000 
4000 8000 ; do
>                       echo mask > /proc/irq/$irq/smp_affinity
>                       sleep 1
>               done
>       done &
> done
>
> Should force every irq to migrate once a second, and removing the sleep 
1
> is even harsher, although we max at one irq migration by irq received.
>
> If some variation of the above loop does not trigger the do_IRQ ??? No 
irq
> handler for vector message chances are it isn't a race in irq migration.
>
> If we can rule out the race scenario it will at least put us in the 
right
> direction for guessing what went wrong with your box.
>
> Eric

-- 


             reply	other threads:[~2007-01-23 10:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-23 10:19 l.genoni [this message]
  -- strict thread matches above, loose matches on Subject: below --
2007-02-01 13:33 Chris Rankin
2007-01-23 10:35 l.genoni
     [not found] <200701221116.13154.luigi.genoni@pirelli.com>
2007-01-22 17:14 ` Eric W. Biederman
     [not found]   ` <200701231051.32945.luigi.genoni@pirelli.com>
2007-01-23 12:18     ` Eric W. Biederman
     [not found]       ` <Pine.LNX.4.64.0701232052330.32111@baldios.it.pirelli.com>
2007-01-31  8:39         ` Eric W. Biederman
     [not found]           ` <200701311549.22512.luigi.genoni@pirelli.com>
2007-02-01  5:59             ` Eric W. Biederman
2007-02-01  7:20             ` Eric W. Biederman
     [not found]               ` <200702021848.55921.luigi.genoni@pirelli.com>
2007-02-02 18:02                 ` Eric W. Biederman
     [not found]                   ` <200702021905.39922.luigi.genoni@pirelli.com>
2007-02-02 18:32                     ` Eric W. Biederman
2007-02-03  0:40                     ` Eric W. Biederman
2007-01-22 10:06 l.genoni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0701231118290.22876@Phoenix.oltrelinux.com \
    --to=l.genoni@oltrelinux.com \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: System crash after "No irq handler for vector" linux 2.6.19' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).