LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Dr. Ernst Molitor" <molitor@uni-bonn.de>
To: Andrew Morton <akpm@osdl.org>
Cc: mingo@redhat.com, linux-kernel@vger.kernel.org,
	Christophe Saout <christophe@saout.de>,
	molitor@uni-bonn.de, em@cfce.de
Subject: Re: PROBLEM: Linux-2.6.6 with dm-crypt hangs on SMP boxes
Date: Mon, 24 May 2004 22:33:51 +0200	[thread overview]
Message-ID: <1085430830.7365.24.camel@felicia> (raw)
In-Reply-To: <20040521185640.6bf88bdb.akpm@osdl.org>

Dear Andrew Morton, 

On Fri, 2004-05-21 at 18:56 -0700, Andrew Morton wrote:
> "Dr. Ernst Molitor" <molitor@cfce.de> wrote:
> >
> > [1.] Linux-2.6.6 caused full halts on two SMP boxes.
> > [2.] I've been using Linux-2.4.20 with   cryptoloop/cryptoapi for 156
> > days uptime; on two boxes, I have installed 2.6.6-rc3-bk5 (one box) and
> > 2.6.6-bk5 (the other one), with dm-crypt on the partitions created with
> > cryptoloop/cryptoapi. Both boxes ran like a charm, but both of them
> > repeatedly came to a halt (no screen, no network connectivity, no
> > reaction to keyboard or mouse activity: Need for hard reset) repeatedly.
> > [3.] dm-crypt, loop device (maybe other things).
> > In kern.log on the box with 2.6.6-bk5, I found the line
> >   Incorrect TSC synchronization on an SMP system (see dmesg).
> > with the 2.6.6 kernels, with 2.4.20, the message was
> >  checking TSC synchronization across CPUs: passed.
> > [4.] 2.6.6, 2.6.6-bk5
> 
> Are the machines using highmem? (What is in /proc/meminfo?)
> 
> Please add `nmi_watchdog=1' to the kernel boot command line and reboot.

thank you for your hint, I followed it meticulously, and yes, the line
starting with "NMI:" in /proc/interrupts showed increasing counts.
> 
> After booting, do:
> 
> 	echo 1 > /proc/sys/kernel/sysrq
> 
> After a machine hangs up, see if there is an NMI watchdog message on the
> console.  If not, try typing ALT-sysrq-P.  If this generates a trace, type
> it again until you capture the trace from the other CPU as well.  We'd need
> to see both those traces.  A digital camera helps...

I'm very sorry, I have non seen any watchdog messages;  ALT-sysrq-P
worked before a problem occurred, only, -  it failed to work once the
box hanged up. I had to switch off DPMI power off in the BIOS just to be
sure about this, since once the box freezes, it just does not react on
any action other than a hard reset or power off cycle.

I had the box compile gcc-3.4.0 several times to keep it busy - it took
between 2 and about 30 minutes to hang up under this task. Hanging up
did not depend on dm-crypt - it came about even when dm-crypt was not
used at all. 

By chance, I happened to read a message by Jeronimo Wanhoff on the
mailing list of the BoLUG (LUG of Bonn, Germany); he noted that his
nvidia nforce2 board hanged with local APIC support in use, and
suggested to use nolapic as a boot parameter to disable local APIC use
since this cause his box to stop hanging up.

Following his advice, I rebooted with the "nolapic" boot option; the box
has been up and running 2.6.6-bk8 for 6 hours now, and I did try and put
some load on it (recompilation of gcc-3.4.0 on an encrypted file system,
using dm-crypt/twofish). 

Please drop me a line of e-mail if I can be of any assistance in
pinpointing the problem. Could it be a hardware failure with the local
APIC which does not matter under 2.4.x?   I'd report back if the system
-  which seems to be stable now - should falter again.

Thank you very much for your patience and help.

Best wishes and regards, 

Ernst

  reply	other threads:[~2004-05-24 20:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-20  8:59 Dr. Ernst Molitor
2004-05-22  1:56 ` Andrew Morton
2004-05-24 20:33   ` Dr. Ernst Molitor [this message]
2004-05-26 11:41     ` Dr. Ernst Molitor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1085430830.7365.24.camel@felicia \
    --to=molitor@uni-bonn.de \
    --cc=akpm@osdl.org \
    --cc=christophe@saout.de \
    --cc=em@cfce.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --subject='Re: PROBLEM: Linux-2.6.6 with dm-crypt hangs on SMP boxes' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).