LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.20-rc4-mm1 md problem
@ 2007-01-12 13:33 Michal Piotrowski
  2007-01-12 15:52 ` Rafael J. Wysocki
  2007-01-15  7:17 ` Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-01-12 13:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, Neil Brown, LKML

My system hangs on this
http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config

Debug plan:
- revert md-* patches
- binary search

Does someone have a better idea?

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.20-rc4-mm1 md problem
  2007-01-12 13:33 2.6.20-rc4-mm1 md problem Michal Piotrowski
@ 2007-01-12 15:52 ` Rafael J. Wysocki
  2007-01-12 17:40   ` Michal Piotrowski
  2007-01-15  7:17 ` Ingo Molnar
  1 sibling, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2007-01-12 15:52 UTC (permalink / raw)
  To: Michal Piotrowski; +Cc: Andrew Morton, Ingo Molnar, Neil Brown, LKML

On Friday, 12 January 2007 14:33, Michal Piotrowski wrote:
> My system hangs on this
> http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
> 
> Debug plan:
> - revert md-* patches
> - binary search
> 
> Does someone have a better idea?

Revert git-block.patch and related stuff?

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
		- Stephen King

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.20-rc4-mm1 md problem
  2007-01-12 15:52 ` Rafael J. Wysocki
@ 2007-01-12 17:40   ` Michal Piotrowski
  2007-01-12 17:58     ` Michal Piotrowski
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Piotrowski @ 2007-01-12 17:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Neil Brown, LKML, Jens Axboe

On 12/01/07, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, 12 January 2007 14:33, Michal Piotrowski wrote:
> > My system hangs on this
> > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
> >
> > Debug plan:
> > - revert md-* patches
> > - binary search
> >
> > Does someone have a better idea?
>
> Revert git-block.patch and related stuff?

Indeed.

GOOD
#
##git-sym2.patch
#git-scsi-target.patch
#git-scsi-target-fixup.patch
#
git-block.patch
git-block-fixup.patch
BAD

git-block.patch it's huge patch.

diffstat git-block.patch
[..]
drivers/md/bitmap.c             |    1
 drivers/md/dm-emc.c             |    2
 drivers/md/dm-table.c           |   14 -
 drivers/md/dm.c                 |   18 -
 drivers/md/dm.h                 |    1
 drivers/md/linear.c             |   14 -
 drivers/md/md.c                 |    3
 drivers/md/multipath.c          |   32 --
 drivers/md/raid0.c              |   17 -
 drivers/md/raid1.c              |   70 -----
 drivers/md/raid10.c             |   73 ------
 drivers/md/raid5.c              |   60 ----
[..]

I'll do a binary search through those files.

>
> Greetings,
> Rafael

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.20-rc4-mm1 md problem
  2007-01-12 17:40   ` Michal Piotrowski
@ 2007-01-12 17:58     ` Michal Piotrowski
  2007-01-14 21:59       ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Piotrowski @ 2007-01-12 17:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Ingo Molnar, Neil Brown, LKML, Jens Axboe, Jens Axboe

On 12/01/07, Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
> On 12/01/07, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Friday, 12 January 2007 14:33, Michal Piotrowski wrote:
> > > My system hangs on this
> > > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> > > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
> > >
> > > Debug plan:
> > > - revert md-* patches
> > > - binary search
> > >
> > > Does someone have a better idea?
> >
> > Revert git-block.patch and related stuff?
>
> Indeed.
>
> GOOD
> #
> ##git-sym2.patch
> #git-scsi-target.patch
> #git-scsi-target-fixup.patch
> #
> git-block.patch
> git-block-fixup.patch
> BAD
>
> git-block.patch it's huge patch.
>
> diffstat git-block.patch
> [..]
> drivers/md/bitmap.c             |    1
>  drivers/md/dm-emc.c             |    2
>  drivers/md/dm-table.c           |   14 -
>  drivers/md/dm.c                 |   18 -
>  drivers/md/dm.h                 |    1
>  drivers/md/linear.c             |   14 -
>  drivers/md/md.c                 |    3
>  drivers/md/multipath.c          |   32 --
>  drivers/md/raid0.c              |   17 -
>  drivers/md/raid1.c              |   70 -----
>  drivers/md/raid10.c             |   73 ------
>  drivers/md/raid5.c              |   60 ----
> [..]
>
> I'll do a binary search through those files.
>

That wasn't a good idea.

/mnt/md0/devel/linux-mm/drivers/md/raid1.c: In function 'unplug_slaves':
/mnt/md0/devel/linux-mm/drivers/md/raid1.c:556: error:
'request_queue_t' has no member named 'unplug_fn'

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.20-rc4-mm1 md problem
  2007-01-12 17:58     ` Michal Piotrowski
@ 2007-01-14 21:59       ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2007-01-14 21:59 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Neil Brown, LKML

On Fri, Jan 12 2007, Michal Piotrowski wrote:
> On 12/01/07, Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
> >On 12/01/07, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> On Friday, 12 January 2007 14:33, Michal Piotrowski wrote:
> >> > My system hangs on this
> >> > 
> >http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> >> > 
> >http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
> >> >
> >> > Debug plan:
> >> > - revert md-* patches
> >> > - binary search
> >> >
> >> > Does someone have a better idea?
> >>
> >> Revert git-block.patch and related stuff?
> >
> >Indeed.
> >
> >GOOD
> >#
> >##git-sym2.patch
> >#git-scsi-target.patch
> >#git-scsi-target-fixup.patch
> >#
> >git-block.patch
> >git-block-fixup.patch
> >BAD
> >
> >git-block.patch it's huge patch.
> >
> >diffstat git-block.patch
> >[..]
> >drivers/md/bitmap.c             |    1
> > drivers/md/dm-emc.c             |    2
> > drivers/md/dm-table.c           |   14 -
> > drivers/md/dm.c                 |   18 -
> > drivers/md/dm.h                 |    1
> > drivers/md/linear.c             |   14 -
> > drivers/md/md.c                 |    3
> > drivers/md/multipath.c          |   32 --
> > drivers/md/raid0.c              |   17 -
> > drivers/md/raid1.c              |   70 -----
> > drivers/md/raid10.c             |   73 ------
> > drivers/md/raid5.c              |   60 ----
> >[..]
> >
> >I'll do a binary search through those files.
> >
> 
> That wasn't a good idea.
> 
> /mnt/md0/devel/linux-mm/drivers/md/raid1.c: In function 'unplug_slaves':
> /mnt/md0/devel/linux-mm/drivers/md/raid1.c:556: error:
> 'request_queue_t' has no member named 'unplug_fn'

You can't go throught he individual files and expect that to compile,
yet alone work. Fortunately I think I already fixed this last week,
unfortunately I think I forgot to push the #plug branch for #for-akpm so
that Andrew automatically picks it up...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.20-rc4-mm1 md problem
  2007-01-12 13:33 2.6.20-rc4-mm1 md problem Michal Piotrowski
  2007-01-12 15:52 ` Rafael J. Wysocki
@ 2007-01-15  7:17 ` Ingo Molnar
  2007-01-15  8:08   ` Thomas Gleixner
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2007-01-15  7:17 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Andrew Morton, Neil Brown, LKML, Thomas Gleixner, Jens Axboe


* Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:

> My system hangs on this
> http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
> 
> Debug plan:
> - revert md-* patches
> - binary search
> 
> Does someone have a better idea?

Thomas saw something similar yesterday and he the partial results that 
git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or 
filesystems drivers. For me it worked fine, so it must be only on some 
combinations. The changes to ll_rw_block.c look quite extensive.

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 2.6.20-rc4-mm1 md problem
  2007-01-15  7:17 ` Ingo Molnar
@ 2007-01-15  8:08   ` Thomas Gleixner
  2007-01-15 18:03     ` [patch-mm] Workaround for RAID breakage Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2007-01-15  8:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michal Piotrowski, Andrew Morton, Neil Brown, LKML, Jens Axboe

On Mon, 2007-01-15 at 08:17 +0100, Ingo Molnar wrote:
> > Debug plan:
> > - revert md-* patches
> > - binary search
> > 
> > Does someone have a better idea?
> 
> Thomas saw something similar yesterday and he the partial results that 
> git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or 
> filesystems drivers. For me it worked fine, so it must be only on some 
> combinations. The changes to ll_rw_block.c look quite extensive.

Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID.

	tglx
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [patch-mm] Workaround for RAID breakage
  2007-01-15  8:08   ` Thomas Gleixner
@ 2007-01-15 18:03     ` Thomas Gleixner
  2007-01-15 20:17       ` Michal Piotrowski
  2007-01-16  0:41       ` Jens Axboe
  0 siblings, 2 replies; 12+ messages in thread
From: Thomas Gleixner @ 2007-01-15 18:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michal Piotrowski, Andrew Morton, Neil Brown, LKML, Jens Axboe

On Mon, 2007-01-15 at 09:08 +0100, Thomas Gleixner wrote:
> > Thomas saw something similar yesterday and he the partial results that 
> > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or 
> > filesystems drivers. For me it worked fine, so it must be only on some 
> > combinations. The changes to ll_rw_block.c look quite extensive.
> 
> Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID.

I tracked this down and found two problems:

- The new plug/unplug code does not check for underruns. That allows the
plug count (ioc->plugged) to become negative. This gets triggered from
various places. 

AFAICS this is intentional to avoid checks all over the place, but the
underflow check is missing. All we need to do is make sure, that in case
of ioc->plugged == 0 we return early and bug, if there is either a queue
plugged in or the plugged_list is not empty.

Jens ?

- The raid1 code has no bitmap set in remount r/w. So the
pending_bio_list gets not processed for quite a time. The workaround is
to kick mddev->thread, so the list is processed. Not sure about that.

Neil ?

At least it boots and behaves normal.

	tglx


Index: linux-2.6.20-rc4-mm1/block/ll_rw_blk.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/block/ll_rw_blk.c
+++ linux-2.6.20-rc4-mm1/block/ll_rw_blk.c
@@ -3757,6 +3757,12 @@ void blk_unplug_current(void)
 	if (!ioc)
 		return;
 
+	if (!ioc->plugged) {
+		BUG_ON(!list_empty(&ioc->plugged_list));
+		BUG_ON(ioc->plugged_queue);
+		return;
+	}
+
 	ioc->plugged--;
 	if (ioc->plugged)
 		return;
Index: linux-2.6.20-rc4-mm1/drivers/md/raid1.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/drivers/md/raid1.c
+++ linux-2.6.20-rc4-mm1/drivers/md/raid1.c
@@ -897,7 +897,7 @@ static int make_request(request_queue_t 
 
 	spin_unlock_irqrestore(&conf->device_lock, flags);
 
-	if (do_sync)
+	if (do_sync || !bitmap)
 		md_wakeup_thread(mddev->thread);
 #if 0
 	while ((bio = bio_list_pop(&bl)) != NULL)




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch-mm] Workaround for RAID breakage
  2007-01-15 18:03     ` [patch-mm] Workaround for RAID breakage Thomas Gleixner
@ 2007-01-15 20:17       ` Michal Piotrowski
  2007-01-16  0:41       ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-01-15 20:17 UTC (permalink / raw)
  To: tglx; +Cc: Ingo Molnar, Andrew Morton, Neil Brown, LKML, Jens Axboe

On 15/01/07, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 2007-01-15 at 09:08 +0100, Thomas Gleixner wrote:
> > > Thomas saw something similar yesterday and he the partial results that
> > > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or
> > > filesystems drivers. For me it worked fine, so it must be only on some
> > > combinations. The changes to ll_rw_block.c look quite extensive.
> >
> > Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID.
>
> I tracked this down and found two problems:
>
> - The new plug/unplug code does not check for underruns. That allows the
> plug count (ioc->plugged) to become negative. This gets triggered from
> various places.
>
> AFAICS this is intentional to avoid checks all over the place, but the
> underflow check is missing. All we need to do is make sure, that in case
> of ioc->plugged == 0 we return early and bug, if there is either a queue
> plugged in or the plugged_list is not empty.
>
> Jens ?
>
> - The raid1 code has no bitmap set in remount r/w. So the
> pending_bio_list gets not processed for quite a time. The workaround is
> to kick mddev->thread, so the list is processed. Not sure about that.
>
> Neil ?
>
> At least it boots and behaves normal.

Yes, it works fine now. Thanks!

>         tglx

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch-mm] Workaround for RAID breakage
  2007-01-15 18:03     ` [patch-mm] Workaround for RAID breakage Thomas Gleixner
  2007-01-15 20:17       ` Michal Piotrowski
@ 2007-01-16  0:41       ` Jens Axboe
  2007-01-16  8:27         ` Thomas Gleixner
  1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2007-01-16  0:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Michal Piotrowski, Andrew Morton, Neil Brown, LKML

On Mon, Jan 15 2007, Thomas Gleixner wrote:
> On Mon, 2007-01-15 at 09:08 +0100, Thomas Gleixner wrote:
> > > Thomas saw something similar yesterday and he the partial results that 
> > > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or 
> > > filesystems drivers. For me it worked fine, so it must be only on some 
> > > combinations. The changes to ll_rw_block.c look quite extensive.
> > 
> > Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID.
> 
> I tracked this down and found two problems:
> 
> - The new plug/unplug code does not check for underruns. That allows the
> plug count (ioc->plugged) to become negative. This gets triggered from
> various places. 
>
> AFAICS this is intentional to avoid checks all over the place, but the
> underflow check is missing. All we need to do is make sure, that in case
> of ioc->plugged == 0 we return early and bug, if there is either a queue
> plugged in or the plugged_list is not empty.
> 
> Jens ?

It should not go negative, that would be a bug elsewhere. So it's
interesting if it does, we should definitely put a WARN_ON() check in
there for that.

> - The raid1 code has no bitmap set in remount r/w. So the
> pending_bio_list gets not processed for quite a time. The workaround is
> to kick mddev->thread, so the list is processed. Not sure about that.
> 
> Neil ?

Super, thanks for that Thomas! I'll merge it in the plug branch.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch-mm] Workaround for RAID breakage
  2007-01-16  0:41       ` Jens Axboe
@ 2007-01-16  8:27         ` Thomas Gleixner
  2007-01-16 10:07           ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2007-01-16  8:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ingo Molnar, Michal Piotrowski, Andrew Morton, Neil Brown, LKML

On Tue, 2007-01-16 at 11:41 +1100, Jens Axboe wrote:
> > AFAICS this is intentional to avoid checks all over the place, but the
> > underflow check is missing. All we need to do is make sure, that in case
> > of ioc->plugged == 0 we return early and bug, if there is either a queue
> > plugged in or the plugged_list is not empty.
> > 
> > Jens ?
> 
> It should not go negative, that would be a bug elsewhere. So it's
> interesting if it does, we should definitely put a WARN_ON() check in
> there for that.

Well. All offenders come via queue_sync_plugs(). queue_sync_plugs()
calls blk_unplug_current().

One path which triggers is blk_sync_queue(). This happens e.g. in the
cleanup of the floppy check. There are other call pathes too.

The other is raid md_super_write(). It is not plugged and calls with the
barrier bit set, which executes the unlikely path in __make_request():

        if (unlikely(bio_barrier(bio))) {
                queue_sync_plugs(q);
                spin_lock_irq(q->queue_lock);
                goto get_rq;
        }

So you either need checks all over the place to avoid calling
blk_unplug_current(), or you prevent the unplug below 0 like I did. The
BUG_ON()s I added should catch any real invalid callers. But it's up to
you.

	tglx



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [patch-mm] Workaround for RAID breakage
  2007-01-16  8:27         ` Thomas Gleixner
@ 2007-01-16 10:07           ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2007-01-16 10:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Michal Piotrowski, Andrew Morton, Neil Brown, LKML

On Tue, Jan 16 2007, Thomas Gleixner wrote:
> On Tue, 2007-01-16 at 11:41 +1100, Jens Axboe wrote:
> > > AFAICS this is intentional to avoid checks all over the place, but the
> > > underflow check is missing. All we need to do is make sure, that in case
> > > of ioc->plugged == 0 we return early and bug, if there is either a queue
> > > plugged in or the plugged_list is not empty.
> > > 
> > > Jens ?
> > 
> > It should not go negative, that would be a bug elsewhere. So it's
> > interesting if it does, we should definitely put a WARN_ON() check in
> > there for that.
> 
> Well. All offenders come via queue_sync_plugs(). queue_sync_plugs()
> calls blk_unplug_current().

Ah, again a problem because the #plug branch wasn't pushed to #for-akpm
in due time. That problem was fixed already :/

> One path which triggers is blk_sync_queue(). This happens e.g. in the
> cleanup of the floppy check. There are other call pathes too.
> 
> The other is raid md_super_write(). It is not plugged and calls with the
> barrier bit set, which executes the unlikely path in __make_request():
> 
>         if (unlikely(bio_barrier(bio))) {
>                 queue_sync_plugs(q);
>                 spin_lock_irq(q->queue_lock);
>                 goto get_rq;
>         }
> 
> So you either need checks all over the place to avoid calling
> blk_unplug_current(), or you prevent the unplug below 0 like I did. The
> BUG_ON()s I added should catch any real invalid callers. But it's up to
> you.

blk_replug_current_nested() should do it. I'll make sure the branch is
updated for next -mm.

The BUG_ON()'s are indeed correct, they should just be moved further
down the function.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-01-16 10:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-12 13:33 2.6.20-rc4-mm1 md problem Michal Piotrowski
2007-01-12 15:52 ` Rafael J. Wysocki
2007-01-12 17:40   ` Michal Piotrowski
2007-01-12 17:58     ` Michal Piotrowski
2007-01-14 21:59       ` Jens Axboe
2007-01-15  7:17 ` Ingo Molnar
2007-01-15  8:08   ` Thomas Gleixner
2007-01-15 18:03     ` [patch-mm] Workaround for RAID breakage Thomas Gleixner
2007-01-15 20:17       ` Michal Piotrowski
2007-01-16  0:41       ` Jens Axboe
2007-01-16  8:27         ` Thomas Gleixner
2007-01-16 10:07           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).