LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: linux@horizon.com
Cc: linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: 2.6.20.3 AMD64 oops in CFQ code
Date: Thu, 22 Mar 2007 19:41:57 +0100	[thread overview]
Message-ID: <20070322184155.GY19922@kernel.dk> (raw)
In-Reply-To: <20070322123821.7843.qmail@science.horizon.com>

On Thu, Mar 22 2007, linux@horizon.com wrote:
> This is a uniprocessor AMD64 system running software RAID-5 and RAID-10
> over multiple PCIe SiI3132 SATA controllers.  The hardware has been very
> stable for a long time, but has been acting up of late since I upgraded
> to 2.6.20.3.  ECC memory should preclude the possibility of bit-flip
> errors.
> 
> Kernel 2.6.20.3 + linuxpps patches (confined to drivers/serial, and not
> actually in use as I stole the serial port for a console).
> 
> It takes half a day to reproduce the problem, so bisecting would be painful.
> 
> BackupPC_dump mostly writes to a large (1.7 TB) ext3 RAID5 partition.
> 
> 
> Here are two oopes, a few minutes (16:31, to be precise) apart.
> Unusually, it oopsed twice *without* locking up the system..  Usually,
> I see this followed by an error from drivers/input/keyboard/atkbd.c:
>                         printk(KERN_WARNING "atkbd.c: Spurious %s on %s. "
>                                "Some program might be trying access hardware directly.\n",
> emitted at 1 Hz with the keyboard LEDs flashing and the system
> unresponsive to keyboard or pings.
> (I think it was spurious ACK on serio/input0, but my memory may be faulty.)
> 
> 
> If anyone has any suggestions, they'd be gratefully received.
> 
> 
> Unable to handle kernel NULL pointer dereference at 0000000000000098 RIP: 
>  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
> PGD 777e9067 PUD 78774067 PMD 0 
> Oops: 0000 [1] 
> CPU 0 
> Modules linked in: ecb
> Pid: 2837, comm: BackupPC_dump Not tainted 2.6.20.3-g691f5333 #40
> RIP: 0010:[<ffffffff8031504a>]  [<ffffffff8031504a>] cfq_dispatch_insert+0x18/0x68
> RSP: 0018:ffff8100770bbaf8  EFLAGS: 00010092
> RAX: ffff81007fb36c80 RBX: 0000000000000000 RCX: 0000000000000001
> RDX: 000000010003e4e7 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff81007fb37a00 R08: 00000000ffffffff R09: ffff81005d390298
> R10: ffff81007fcb4f80 R11: ffff81007fcb4f80 R12: ffff81007facd280
> R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000000
> FS:  00002b322d120d30(0000) GS:ffffffff805de000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000098 CR3: 000000007bcf0000 CR4: 00000000000006e0
> Process BackupPC_dump (pid: 2837, threadinfo ffff8100770ba000, task ffff81007fc5d8e0)
> Stack:  0000000000000000 ffff8100770f39f0 0000000000000000 0000000000000004
>  0000000000000001 ffffffff80315253 ffffffff803b2607 ffff81005da2bc40
>  ffff81007fac3800 ffff81007facd280 ffff81007facd280 ffff81005d390298
> Call Trace:
>  [<ffffffff80315253>] cfq_dispatch_requests+0x152/0x512
>  [<ffffffff803b2607>] scsi_done+0x0/0x18
>  [<ffffffff8030d9f1>] elv_next_request+0x137/0x147
>  [<ffffffff803b7ce0>] scsi_request_fn+0x6a/0x33a
>  [<ffffffff8024d407>] generic_unplug_device+0xa/0xe
>  [<ffffffff80407ced>] unplug_slaves+0x5b/0x94
>  [<ffffffff80223d65>] sync_page+0x0/0x40
>  [<ffffffff80223d9b>] sync_page+0x36/0x40
>  [<ffffffff80256d45>] __wait_on_bit_lock+0x36/0x65
>  [<ffffffff80237496>] __lock_page+0x5e/0x64
>  [<ffffffff8028061d>] wake_bit_function+0x0/0x23
>  [<ffffffff802074de>] find_get_page+0xe/0x2d
>  [<ffffffff8020b38e>] do_generic_mapping_read+0x1c2/0x40d
>  [<ffffffff8020bd80>] file_read_actor+0x0/0x118
>  [<ffffffff8021422e>] generic_file_aio_read+0x15c/0x19e
>  [<ffffffff8020bafa>] do_sync_read+0xc9/0x10c
>  [<ffffffff80210342>] may_open+0x5b/0x1c6
>  [<ffffffff802805ef>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff8020a857>] vfs_read+0xaa/0x152
>  [<ffffffff8020faf3>] sys_read+0x45/0x6e
>  [<ffffffff8025041e>] system_call+0x7e/0x83

3 (I think) seperate instances of this, each involving raid5. Is your
array degraded or fully operational?


-- 
Jens Axboe


  reply	other threads:[~2007-03-22 18:45 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-22 12:38 linux
2007-03-22 18:41 ` Jens Axboe [this message]
2007-03-22 18:54   ` linux
2007-03-22 19:00     ` Jens Axboe
2007-03-22 23:59       ` Neil Brown
2007-03-23  0:31         ` Dan Williams
2007-03-23  0:33           ` Dan Williams
2007-03-23  0:44           ` Neil Brown
2007-03-23 17:46             ` linux
2007-04-03  5:49               ` Tejun Heo
2007-04-03 13:03                 ` linux
2007-04-03 13:11                   ` Tejun Heo
2007-04-04 23:22                 ` Bill Davidsen
2007-04-05  4:13                   ` Lee Revell
2007-04-05  4:29                     ` Tejun Heo
2007-03-22 18:43 ` Aristeu Sergio Rozanski Filho

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070322184155.GY19922@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --subject='Re: 2.6.20.3 AMD64 oops in CFQ code' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).