LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Dan Williams" <dan.j.williams@intel.com>
To: "Jens Axboe" <jens.axboe@oracle.com>
Cc: "Frank Seidel" <linux@f-seidel.de>,
linux-kernel@vger.kernel.org, NeilBrown <neilb@suse.de>
Subject: Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)
Date: Thu, 1 Mar 2007 12:50:06 -0700 [thread overview]
Message-ID: <e9c3a7c20703011150h68d99cedx433112afbd8dc88b@mail.gmail.com> (raw)
In-Reply-To: <20070301123057.GO23985@kernel.dk>
On 3/1/07, Jens Axboe <jens.axboe@oracle.com> wrote:
> On Thu, Mar 01 2007, Frank Seidel wrote:
> > Am Mittwoch, 28. Februar 2007 19:02 schrieb Dan Williams:
> > > I can reliably reproduce a null pointer dereference on 2.6.20 and
> > > 2.6.21-rc2. I will keep digging to find the kernel version where
> > > this last worked, but wanted to see if there were any immediate
> > > experiments I should try.
> > > ...
> > > Kernel 2.6.21-rc2 on an i686
> > > ...
> > > [ 431.709022] BUG: unable to handle kernel NULL pointer dereference
> > > at virtual address 0000005c [ 431.717993] printing eip:
> > > ...
> > > [ 431.825386] EIP is at cfq_dispatch_insert+0xb/0x53
> > > ...
> > > [ 431.887396] [<c01e1fc9>] cfq_dispatch_requests+0x138/0x3f0
> > Hi,
> > unfortunately i yet don't really have much/enough knowledge of cfq and
> > the kernels inwards at the moment...
> > but looking at cfq_dispatch_insert+0xb it seems the struct request
> > pointer given (as second parameter by cfq_dispatch_request) was NULL
> > and dereferencing it in the RQ_CFQQ macro leads to this oops.
> >
> > The "break"-out patch below for __cfq_dispatch_request might be at least
> > a possible workaround for this, but it could also be total bullsh..
> > Perhaps someone smarter might pick this up.. and give a real fix.
> >
> > Have fun,
> > Frank
> > ---
> >
> > block/cfq-iosched.c | 3 ++-
> > 1 files changed, 2 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/block/cfq-iosched.c
> > ===================================================================
> > --- linux-2.6.orig/block/cfq-iosched.c
> > +++ linux-2.6/block/cfq-iosched.c
> > @@ -962,7 +962,8 @@ __cfq_dispatch_requests(struct cfq_data
> > * follow expired path, else get first next available
> > */
> > if ((rq = cfq_check_fifo(cfqq)) == NULL)
> > - rq = cfqq->next_rq;
> > + if ((rq = cfqq->next_rq) == NULL)
> > + break;
> >
> > /*
> > * finally, insert request into driver dispatch list
>
> That is not the right fix. A little further up in this function, a check
> (well BUG_ON()) is done for a non-empty sort list. So we know at this
> point, that we have requests pending for this queue. When that is the
> case, ->next_rq must always be kept uptodate and non-NULL. The oops at
> least tells us this, it should not be papered around. The real fix is
> finding out _where_ this now isn't being updated.
>
> I'm puzzled why this is hitting Dan, but no one else has reported
> anything. Dan, did 2.6.19 work for you?
>
I am puzzled as well, although I do not think many people run raid6
arrays with 2-failed disks, so it might be an under-tested path, but a
non-degraded array runs fine...
I fired up a 2.6.19 kernel and tiobench ran past the point (in terms
of time) where it had failed on .20 and .21-rc. However I noticed
things were running much slower since the cpu optimizations had fallen
back to Pentium-Pro from Core2 which affects the raid6 p+q calculation
speed among other things. So I need to re-baseline the failure
against a more common config to say whether it is actually gone in
2.6.19.
I should have time to try these tests next week.
> --
> Jens Axboe
Regards,
Dan
next prev parent reply other threads:[~2007-03-01 19:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-28 18:02 Dan Williams
2007-02-28 18:18 ` Dan Williams
2007-02-28 18:49 ` Chuck Ebbert
2007-02-28 19:21 ` Chuck Ebbert
2007-03-01 12:08 ` Frank Seidel
2007-03-01 12:30 ` Jens Axboe
2007-03-01 19:50 ` Dan Williams [this message]
2007-03-21 13:07 ` Dale Blount
2007-03-21 18:09 ` Chuck Ebbert
2007-03-21 18:23 ` Dale Blount
2007-03-21 18:25 ` Chuck Ebbert
2007-03-21 19:59 ` Jens Axboe
2007-03-22 12:54 ` Dale Blount
2007-03-21 19:04 ` Johannes Weiner
2007-03-22 17:29 ` Johannes Weiner
2007-03-22 18:42 ` Jens Axboe
2007-03-22 19:22 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e9c3a7c20703011150h68d99cedx433112afbd8dc88b@mail.gmail.com \
--to=dan.j.williams@intel.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@f-seidel.de \
--cc=neilb@suse.de \
--subject='Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2 and 2.6.20)' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).