LKML Archive on
help / color / mirror / Atom feed
From: Randy Dunlap <>
To: Brice Figureau <>
Subject: Re: Strange freeze on 2.6.22 (deadlock?)
Date: Mon, 7 Jan 2008 09:20:48 -0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <1199720934.11173.49.camel@localhost.localdomain>

On Mon, 07 Jan 2008 16:48:54 +0100 Brice Figureau wrote:

> Hi,
> I'm seeing a strange complete server freeze/lock-up on an bi-Xeon HT
> amd64 server running standard debian 2.6.22 (and before that vanilla
> 2.6.19.x and 2.6.20.x which exhibited the same issue).
> I'm only reporting it now, since I could get a full sysrq-t only this
> morning.
> The symptoms are that every 5 to 7 days, the server (which acts as a MX
> along with a few low traffic websites) locks-up. The ipmi watchdog is
> unable to reboot the server (and doesn't even trigger, since there is no
> evidence in the esmlog), the machine is still pingable. I can't ssh to
> it, but I can enter my login & password on a serial console, but no
> shell is started.
> Pressing sysrq-t produced the trace hosted here:
> It happened one time when I was connected to the server through ssh and
> I could see that the load started to increase well above 100. It was
> then impossible to launch new process from the command-line (and I had
> to reboot manually).
> It happened also last week, and the server was stuck for about 6 hours.
> When I started investigating what was wrong, it slowly came back to life
> (with an avg 1-min load of more than 1500, and tons of cron processes
> running in parallel).
> I'm not really familiar with kernel development so I can't really find
> the issue in the aforementioned trace output.
> What I think is that for some reason there is a race/deadlock that
> finally prevents new processes to really start (which in turns produces
> the high load).
> What seems suspect in the aforementioned trace is:
>  *) lot of processes stacktrace ends in __mod_timer+0xc3/0xd3
> which seems to be this line from kernel/timer.c
> 415	timer->expires = expires;
> 416	internal_add_timer(base, timer);
> -->	spin_unlock_irqrestore(&base->lock, flags);
> 419	return ret;
> 420  }
>  *) lot of processes stacktrace ends in __mutex_lock_slowpath and/or zone_statistics

There are also lots of processes in D state (usually waiting
for I/O to complete).  And jbd is in their stack traces.

How is/are the ext3 filesystems mounted?  I mean what data=xyz
mode?  data=journal (the heaviest duty mode) has at least one
known deadlock.  If you are using data=journal, you could try
switching to data=ordered...

> Anyway, I will soon reboot to a 2.6.23.x to see if that symptom
> persists.
> More information (config, server specs) are available on request.
> I'm not subscribed to the list, so please CC: me for any anwser.
> Many thanks,


  reply	other threads:[~2008-01-07 17:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-07 15:48 Brice Figureau
2008-01-07 17:20 ` Randy Dunlap [this message]
2008-01-07 19:06 Brice Figureau
2008-01-08 23:16 ` Chuck Ebbert
2008-01-09 16:02   ` Brice Figureau
2008-01-07 21:14 Brice Figureau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \
    --subject='Re: Strange freeze on 2.6.22 (deadlock?)' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).