LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* AIC7xxx panic
@ 2001-10-07 10:37 Jim Crilly
  2001-10-07 10:48 ` Rob Turk
  2001-10-07 11:28 ` Jim Crilly
  0 siblings, 2 replies; 9+ messages in thread
From: Jim Crilly @ 2001-10-07 10:37 UTC (permalink / raw)
  To: linux-kernel

I got a reproducible panic while running dbench simulating 25+ clients,
the new aic7xxx driver panics with "Too few segs for dma mapping.
"Increase AHC_NSEG". The partition in question is FAT32 and on a
different disk than /, I'm not using HIGHMEM. I am using XFS and the
preempt patches, but I don't think they're related to the panic.

The odd thing, is if I run dbench in the same manner on my / partition,
which is on a different disk on the same controller, it goes fine. It
seems, to my untrained eye anyway, to be a bad interaction between the
vfat driver and the aic7xxx driver.

I'm using the old aic7xxx driver right now and it's fine, has anyone
else seen anything like this?

Jim
-- 
Help protect your rights on-line.
Join the Electronic Frontiers Foundation today: http://www.eff.org/join
-----------------------------------------------------------------------
Security: Antonyms: See Microsoft
-----------------------------------------------------------------------
"We are coming after you. God may have mercy on you, but we won't,"
declared Sen. John McCain, R-Arizona.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: AIC7xxx panic
  2001-10-07 10:37 AIC7xxx panic Jim Crilly
@ 2001-10-07 10:48 ` Rob Turk
  2001-10-07 11:28 ` Jim Crilly
  1 sibling, 0 replies; 9+ messages in thread
From: Rob Turk @ 2001-10-07 10:48 UTC (permalink / raw)
  To: linux-kernel

"Jim Crilly" <noth@noth.is.eleet.ca> wrote in message
news:cistron.1002451051.3718.20.camel@warblade...
> I got a reproducible panic while running dbench simulating 25+ clients,
> the new aic7xxx driver panics with "Too few segs for dma mapping.
> "Increase AHC_NSEG". The partition in question is FAT32 and on a
> different disk than /, I'm not using HIGHMEM. I am using XFS and the
> preempt patches, but I don't think they're related to the panic.
>
> The odd thing, is if I run dbench in the same manner on my / partition,
> which is on a different disk on the same controller, it goes fine. It
> seems, to my untrained eye anyway, to be a bad interaction between the
> vfat driver and the aic7xxx driver.
>
> I'm using the old aic7xxx driver right now and it's fine, has anyone
> else seen anything like this?
>
> Jim

Since this seems to fail on just one disk, it might have to do with one of the
disk characteristics, like command queue depth. Did you enable Tagged Command
Queueing, and if so, can you try playing around with the maximum depth?

Rob





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: AIC7xxx panic
  2001-10-07 10:37 AIC7xxx panic Jim Crilly
  2001-10-07 10:48 ` Rob Turk
@ 2001-10-07 11:28 ` Jim Crilly
  2001-10-07 12:21   ` David M. Grimes
  1 sibling, 1 reply; 9+ messages in thread
From: Jim Crilly @ 2001-10-07 11:28 UTC (permalink / raw)
  To: Rob Turk; +Cc: linux-kernel

Both disks on the controller are Seagate Cheetahs, the one being worked
during the panic is a ST39204LW, the other disk is a ST318451LW.

I did have TCQ enabled and I left it at the default of 255, I'll try a
lower value tomorrow, since it's so late.

Jim

On Sun, 2001-10-07 at 06:48, Rob Turk wrote:
> "Jim Crilly" <noth@noth.is.eleet.ca> wrote in message
> news:cistron.1002451051.3718.20.camel@warblade...
> > I got a reproducible panic while running dbench simulating 25+ clients,
> > the new aic7xxx driver panics with "Too few segs for dma mapping.
> > "Increase AHC_NSEG". The partition in question is FAT32 and on a
> > different disk than /, I'm not using HIGHMEM. I am using XFS and the
> > preempt patches, but I don't think they're related to the panic.
> >
> > The odd thing, is if I run dbench in the same manner on my / partition,
> > which is on a different disk on the same controller, it goes fine. It
> > seems, to my untrained eye anyway, to be a bad interaction between the
> > vfat driver and the aic7xxx driver.
> >
> > I'm using the old aic7xxx driver right now and it's fine, has anyone
> > else seen anything like this?
> >
> > Jim
> 
> Since this seems to fail on just one disk, it might have to do with one of the
> disk characteristics, like command queue depth. Did you enable Tagged Command
> Queueing, and if so, can you try playing around with the maximum depth?
> 
> Rob
> 
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Help protect your rights on-line.
Join the Electronic Frontiers Foundation today: http://www.eff.org/join
-----------------------------------------------------------------------
Security: Antonyms: See Microsoft
-----------------------------------------------------------------------
"We are coming after you. God may have mercy on you, but we won't,"
declared Sen. John McCain, R-Arizona.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: AIC7xxx panic
  2001-10-07 11:28 ` Jim Crilly
@ 2001-10-07 12:21   ` David M. Grimes
  2001-10-07 14:48     ` Gérard Roudier
  0 siblings, 1 reply; 9+ messages in thread
From: David M. Grimes @ 2001-10-07 12:21 UTC (permalink / raw)
  To: Jim Crilly; +Cc: Rob Turk, linux-kernel

On Sun, Oct 07, 2001 at 07:28:57AM -0400, Jim Crilly wrote:
> Both disks on the controller are Seagate Cheetahs, the one being worked
> during the panic is a ST39204LW, the other disk is a ST318451LW.

I've seen this on a 2-disk system (both Seagate ST150176LW) on a
VA-Systems onboad AIC 7xxx.  I enabled TCQ, and noticed the default
depth increased sometime around 2.4.10, not exactly sure when (it used
to be 8, now much higher).  I've seen it on both disks.

In drivers/scsi/aic7xxx/aic7xxx_osm.h is the #define for NSEG, and I
changed it from 128 to 512, and it stopped the problem.  Question is,
why was the TCQ depth increased, and should NSEG have been upped with
it?

> 
> I did have TCQ enabled and I left it at the default of 255, I'll try a
> lower value tomorrow, since it's so late.

This also fixed my problem, I left NSEG at 128 and lowered the TCQ depth
back to 8.  This worked fine as well.

I'll be intereted to see what the eventual outcome of this is, so I can
apply the "right" fix!

Anyhow, thought you might want another datapoint.

  Thanks,

  Dave

> 
> Jim
> 
> On Sun, 2001-10-07 at 06:48, Rob Turk wrote:
> > "Jim Crilly" <noth@noth.is.eleet.ca> wrote in message
> > news:cistron.1002451051.3718.20.camel@warblade...
> > > I got a reproducible panic while running dbench simulating 25+ clients,
> > > the new aic7xxx driver panics with "Too few segs for dma mapping.
> > > "Increase AHC_NSEG". The partition in question is FAT32 and on a
> > > different disk than /, I'm not using HIGHMEM. I am using XFS and the
> > > preempt patches, but I don't think they're related to the panic.
> > >
> > > The odd thing, is if I run dbench in the same manner on my / partition,
> > > which is on a different disk on the same controller, it goes fine. It
> > > seems, to my untrained eye anyway, to be a bad interaction between the
> > > vfat driver and the aic7xxx driver.
> > >
> > > I'm using the old aic7xxx driver right now and it's fine, has anyone
> > > else seen anything like this?
> > >
> > > Jim
> > 
> > Since this seems to fail on just one disk, it might have to do with one of the
> > disk characteristics, like command queue depth. Did you enable Tagged Command
> > Queueing, and if so, can you try playing around with the maximum depth?
> > 
> > Rob

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: AIC7xxx panic
  2001-10-07 12:21   ` David M. Grimes
@ 2001-10-07 14:48     ` Gérard Roudier
  2001-10-08  2:31       ` Jim Crilly
  2001-10-09  2:21       ` Justin T. Gibbs
  0 siblings, 2 replies; 9+ messages in thread
From: Gérard Roudier @ 2001-10-07 14:48 UTC (permalink / raw)
  To: David M. Grimes; +Cc: Jim Crilly, Rob Turk, linux-kernel



On Sun, 7 Oct 2001, David M. Grimes wrote:

> On Sun, Oct 07, 2001 at 07:28:57AM -0400, Jim Crilly wrote:
> > Both disks on the controller are Seagate Cheetahs, the one being worked
> > during the panic is a ST39204LW, the other disk is a ST318451LW.
>
> I've seen this on a 2-disk system (both Seagate ST150176LW) on a
> VA-Systems onboad AIC 7xxx.  I enabled TCQ, and noticed the default
> depth increased sometime around 2.4.10, not exactly sure when (it used
> to be 8, now much higher).  I've seen it on both disks.
>
> In drivers/scsi/aic7xxx/aic7xxx_osm.h is the #define for NSEG, and I
> changed it from 128 to 512, and it stopped the problem.  Question is,
> why was the TCQ depth increased, and should NSEG have been upped with
> it?

The default TCQ depth was 8 in Doug Ledford's aic7xxx driver but was 253
in Justin Gibbs' aic7xxx driver. As seen from driver developpers the TCQ
depth haven't been changed. :-)

The max number of DMA segments and TCQ depths are totally unrelated items.
Your guessed work-around may just indicate that their interaction may
trigger some software bug. Using larger TCQ depths make more pressure on
memory and disk IOs, leading to more memory being locked for IO pending
and memory segmentation being more likely.

> > I did have TCQ enabled and I left it at the default of 255, I'll try a
> > lower value tomorrow, since it's so late.
>
> This also fixed my problem, I left NSEG at 128 and lowered the TCQ depth
> back to 8.  This worked fine as well.
>
> I'll be intereted to see what the eventual outcome of this is, so I can
> apply the "right" fix!

The right fix might well not apply to the driver code. Btw, I donnot plan
to look into the problem, as Justin may just be studying it, in my
guessing.  I just wanted to suggest to also look into upper layers and not
to only focus on the low-level driver.

  Gérard.

> Anyhow, thought you might want another datapoint.
>
>   Thanks,
>
>   Dave
>
> >
> > Jim
> >
> > On Sun, 2001-10-07 at 06:48, Rob Turk wrote:
> > > "Jim Crilly" <noth@noth.is.eleet.ca> wrote in message
> > > news:cistron.1002451051.3718.20.camel@warblade...
> > > > I got a reproducible panic while running dbench simulating 25+ clients,
> > > > the new aic7xxx driver panics with "Too few segs for dma mapping.
> > > > "Increase AHC_NSEG". The partition in question is FAT32 and on a
> > > > different disk than /, I'm not using HIGHMEM. I am using XFS and the
> > > > preempt patches, but I don't think they're related to the panic.
> > > >
> > > > The odd thing, is if I run dbench in the same manner on my / partition,
> > > > which is on a different disk on the same controller, it goes fine. It
> > > > seems, to my untrained eye anyway, to be a bad interaction between the
> > > > vfat driver and the aic7xxx driver.
> > > >
> > > > I'm using the old aic7xxx driver right now and it's fine, has anyone
> > > > else seen anything like this?
> > > >
> > > > Jim
> > >
> > > Since this seems to fail on just one disk, it might have to do with one of the
> > > disk characteristics, like command queue depth. Did you enable Tagged Command
> > > Queueing, and if so, can you try playing around with the maximum depth?
> > >
> > > Rob
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: AIC7xxx panic
  2001-10-07 14:48     ` Gérard Roudier
@ 2001-10-08  2:31       ` Jim Crilly
  2001-10-09  0:51         ` David M. Grimes
  2001-10-09  2:21       ` Justin T. Gibbs
  1 sibling, 1 reply; 9+ messages in thread
From: Jim Crilly @ 2001-10-08  2:31 UTC (permalink / raw)
  To: linux-kernel

I changed AHC_NSEG from 128 to 512 and as expected the panic went away,
but does this mean the default should be higher in the kernel or is
there a real bug here? The main reason I wonder is because it ran fine
on disk 0 but panic'd on disk 1.

On Sun, 2001-10-07 at 10:48, Gérard Roudier wrote:
> 
> 
> On Sun, 7 Oct 2001, David M. Grimes wrote:
> 
> > On Sun, Oct 07, 2001 at 07:28:57AM -0400, Jim Crilly wrote:
> > > Both disks on the controller are Seagate Cheetahs, the one being worked
> > > during the panic is a ST39204LW, the other disk is a ST318451LW.
> >
> > I've seen this on a 2-disk system (both Seagate ST150176LW) on a
> > VA-Systems onboad AIC 7xxx.  I enabled TCQ, and noticed the default
> > depth increased sometime around 2.4.10, not exactly sure when (it used
> > to be 8, now much higher).  I've seen it on both disks.
> >
> > In drivers/scsi/aic7xxx/aic7xxx_osm.h is the #define for NSEG, and I
> > changed it from 128 to 512, and it stopped the problem.  Question is,
> > why was the TCQ depth increased, and should NSEG have been upped with
> > it?
> 
> The default TCQ depth was 8 in Doug Ledford's aic7xxx driver but was 253
> in Justin Gibbs' aic7xxx driver. As seen from driver developpers the TCQ
> depth haven't been changed. :-)
> 
> The max number of DMA segments and TCQ depths are totally unrelated items.
> Your guessed work-around may just indicate that their interaction may
> trigger some software bug. Using larger TCQ depths make more pressure on
> memory and disk IOs, leading to more memory being locked for IO pending
> and memory segmentation being more likely.
> 
> > > I did have TCQ enabled and I left it at the default of 255, I'll try a
> > > lower value tomorrow, since it's so late.
> >
> > This also fixed my problem, I left NSEG at 128 and lowered the TCQ depth
> > back to 8.  This worked fine as well.
> >
> > I'll be intereted to see what the eventual outcome of this is, so I can
> > apply the "right" fix!
> 
> The right fix might well not apply to the driver code. Btw, I donnot plan
> to look into the problem, as Justin may just be studying it, in my
> guessing.  I just wanted to suggest to also look into upper layers and not
> to only focus on the low-level driver.
> 
>   Gérard.
> 
> > Anyhow, thought you might want another datapoint.
> >
> >   Thanks,
> >
> >   Dave
> >
> > >
> > > Jim
> > >
> > > On Sun, 2001-10-07 at 06:48, Rob Turk wrote:
> > > > "Jim Crilly" <noth@noth.is.eleet.ca> wrote in message
> > > > news:cistron.1002451051.3718.20.camel@warblade...
> > > > > I got a reproducible panic while running dbench simulating 25+ clients,
> > > > > the new aic7xxx driver panics with "Too few segs for dma mapping.
> > > > > "Increase AHC_NSEG". The partition in question is FAT32 and on a
> > > > > different disk than /, I'm not using HIGHMEM. I am using XFS and the
> > > > > preempt patches, but I don't think they're related to the panic.
> > > > >
> > > > > The odd thing, is if I run dbench in the same manner on my / partition,
> > > > > which is on a different disk on the same controller, it goes fine. It
> > > > > seems, to my untrained eye anyway, to be a bad interaction between the
> > > > > vfat driver and the aic7xxx driver.
> > > > >
> > > > > I'm using the old aic7xxx driver right now and it's fine, has anyone
> > > > > else seen anything like this?
> > > > >
> > > > > Jim
> > > >
> > > > Since this seems to fail on just one disk, it might have to do with one of the
> > > > disk characteristics, like command queue depth. Did you enable Tagged Command
> > > > Queueing, and if so, can you try playing around with the maximum depth?
> > > >
> > > > Rob
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> >
-- 
Help protect your rights on-line.
Join the Electronic Frontiers Foundation today: http://www.eff.org/join
-----------------------------------------------------------------------
Security: Antonyms: See Microsoft
-----------------------------------------------------------------------
"We are coming after you. God may have mercy on you, but we won't,"
declared Sen. John McCain, R-Arizona.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: AIC7xxx panic
  2001-10-08  2:31       ` Jim Crilly
@ 2001-10-09  0:51         ` David M. Grimes
  2001-10-09 11:47           ` Alan Cox
  0 siblings, 1 reply; 9+ messages in thread
From: David M. Grimes @ 2001-10-09  0:51 UTC (permalink / raw)
  To: Jim Crilly; +Cc: linux-kernel

On Sun, Oct 07, 2001 at 10:31:48PM -0400, Jim Crilly wrote:
> I changed AHC_NSEG from 128 to 512 and as expected the panic went away,
> but does this mean the default should be higher in the kernel or is
> there a real bug here? The main reason I wonder is because it ran fine
> on disk 0 but panic'd on disk 1.

Perhaps this is related (from 2.4.10-acX thread later on l-k):

--------
>From linux-kernel-owner@vger.kernel.org Mon Oct  8 18:34:32 2001
Subject: Re: linux-2.4.10-acX
To: mfedyk@matchmail.com (Mike Fedyk)
From: Alan Cox <alan@lxorguk.ukuu.org.uk>

> > -   Elevator flow control
>
> Where can I find more information on this?

Read the ll_rw_blk diff. Basically it tries to avoid too many locked
buffers clogging up memory and killing the box. I'm not totally sure its
the right approach.
--------

Were there recent changes in ll_rw_blk which are being addressed by
"Elevator flow control"?  As suggested earlier in this thread, the cause
might be a few layers up, and this seemed relevant.

Can anyone confirm or shed any additional light on this?

  Dave


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: AIC7xxx panic 
  2001-10-07 14:48     ` Gérard Roudier
  2001-10-08  2:31       ` Jim Crilly
@ 2001-10-09  2:21       ` Justin T. Gibbs
  1 sibling, 0 replies; 9+ messages in thread
From: Justin T. Gibbs @ 2001-10-09  2:21 UTC (permalink / raw)
  To: Gérard Roudier; +Cc: David M. Grimes, Jim Crilly, Rob Turk, linux-kernel

>The right fix might well not apply to the driver code. Btw, I donnot plan
>to look into the problem, as Justin may just be studying it, in my
>guessing.  I just wanted to suggest to also look into upper layers and not
>to only focus on the low-level driver.

I can't really speak to what is an acceptable number of segments
for Linux (I just copied what the old driver did), but the aic7xxx
driver does export its current limit to upper layers and that limit
should be honored.

--
Justin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: AIC7xxx panic
  2001-10-09  0:51         ` David M. Grimes
@ 2001-10-09 11:47           ` Alan Cox
  0 siblings, 0 replies; 9+ messages in thread
From: Alan Cox @ 2001-10-09 11:47 UTC (permalink / raw)
  To: David M. Grimes; +Cc: Jim Crilly, linux-kernel

> Were there recent changes in ll_rw_blk which are being addressed by
> "Elevator flow control"?  As suggested earlier in this thread, the cause
> might be a few layers up, and this seemed relevant.

Unrelated I suspect. All it means is that in some cases -ac will have less
segments queued before blocking. The max sectors per I/O and max segments
per I/o are controlled by the drivers

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-10-09 11:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-07 10:37 AIC7xxx panic Jim Crilly
2001-10-07 10:48 ` Rob Turk
2001-10-07 11:28 ` Jim Crilly
2001-10-07 12:21   ` David M. Grimes
2001-10-07 14:48     ` Gérard Roudier
2001-10-08  2:31       ` Jim Crilly
2001-10-09  0:51         ` David M. Grimes
2001-10-09 11:47           ` Alan Cox
2001-10-09  2:21       ` Justin T. Gibbs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).