LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* 2.6.20-rc4-mm1 md problem @ 2007-01-12 13:33 Michal Piotrowski 2007-01-12 15:52 ` Rafael J. Wysocki 2007-01-15 7:17 ` Ingo Molnar 0 siblings, 2 replies; 12+ messages in thread From: Michal Piotrowski @ 2007-01-12 13:33 UTC (permalink / raw) To: Andrew Morton; +Cc: Ingo Molnar, Neil Brown, LKML My system hangs on this http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config Debug plan: - revert md-* patches - binary search Does someone have a better idea? Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (http://www.stardust.webpages.pl/ltg/) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem 2007-01-12 13:33 2.6.20-rc4-mm1 md problem Michal Piotrowski @ 2007-01-12 15:52 ` Rafael J. Wysocki 2007-01-12 17:40 ` Michal Piotrowski 2007-01-15 7:17 ` Ingo Molnar 1 sibling, 1 reply; 12+ messages in thread From: Rafael J. Wysocki @ 2007-01-12 15:52 UTC (permalink / raw) To: Michal Piotrowski; +Cc: Andrew Morton, Ingo Molnar, Neil Brown, LKML On Friday, 12 January 2007 14:33, Michal Piotrowski wrote: > My system hangs on this > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config > > Debug plan: > - revert md-* patches > - binary search > > Does someone have a better idea? Revert git-block.patch and related stuff? Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem 2007-01-12 15:52 ` Rafael J. Wysocki @ 2007-01-12 17:40 ` Michal Piotrowski 2007-01-12 17:58 ` Michal Piotrowski 0 siblings, 1 reply; 12+ messages in thread From: Michal Piotrowski @ 2007-01-12 17:40 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, Ingo Molnar, Neil Brown, LKML, Jens Axboe On 12/01/07, Rafael J. Wysocki <rjw@sisk.pl> wrote: > On Friday, 12 January 2007 14:33, Michal Piotrowski wrote: > > My system hangs on this > > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg > > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config > > > > Debug plan: > > - revert md-* patches > > - binary search > > > > Does someone have a better idea? > > Revert git-block.patch and related stuff? Indeed. GOOD # ##git-sym2.patch #git-scsi-target.patch #git-scsi-target-fixup.patch # git-block.patch git-block-fixup.patch BAD git-block.patch it's huge patch. diffstat git-block.patch [..] drivers/md/bitmap.c | 1 drivers/md/dm-emc.c | 2 drivers/md/dm-table.c | 14 - drivers/md/dm.c | 18 - drivers/md/dm.h | 1 drivers/md/linear.c | 14 - drivers/md/md.c | 3 drivers/md/multipath.c | 32 -- drivers/md/raid0.c | 17 - drivers/md/raid1.c | 70 ----- drivers/md/raid10.c | 73 ------ drivers/md/raid5.c | 60 ---- [..] I'll do a binary search through those files. > > Greetings, > Rafael Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (http://www.stardust.webpages.pl/ltg/) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem 2007-01-12 17:40 ` Michal Piotrowski @ 2007-01-12 17:58 ` Michal Piotrowski 2007-01-14 21:59 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Michal Piotrowski @ 2007-01-12 17:58 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Andrew Morton, Ingo Molnar, Neil Brown, LKML, Jens Axboe, Jens Axboe On 12/01/07, Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote: > On 12/01/07, Rafael J. Wysocki <rjw@sisk.pl> wrote: > > On Friday, 12 January 2007 14:33, Michal Piotrowski wrote: > > > My system hangs on this > > > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg > > > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config > > > > > > Debug plan: > > > - revert md-* patches > > > - binary search > > > > > > Does someone have a better idea? > > > > Revert git-block.patch and related stuff? > > Indeed. > > GOOD > # > ##git-sym2.patch > #git-scsi-target.patch > #git-scsi-target-fixup.patch > # > git-block.patch > git-block-fixup.patch > BAD > > git-block.patch it's huge patch. > > diffstat git-block.patch > [..] > drivers/md/bitmap.c | 1 > drivers/md/dm-emc.c | 2 > drivers/md/dm-table.c | 14 - > drivers/md/dm.c | 18 - > drivers/md/dm.h | 1 > drivers/md/linear.c | 14 - > drivers/md/md.c | 3 > drivers/md/multipath.c | 32 -- > drivers/md/raid0.c | 17 - > drivers/md/raid1.c | 70 ----- > drivers/md/raid10.c | 73 ------ > drivers/md/raid5.c | 60 ---- > [..] > > I'll do a binary search through those files. > That wasn't a good idea. /mnt/md0/devel/linux-mm/drivers/md/raid1.c: In function 'unplug_slaves': /mnt/md0/devel/linux-mm/drivers/md/raid1.c:556: error: 'request_queue_t' has no member named 'unplug_fn' Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (http://www.stardust.webpages.pl/ltg/) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem 2007-01-12 17:58 ` Michal Piotrowski @ 2007-01-14 21:59 ` Jens Axboe 0 siblings, 0 replies; 12+ messages in thread From: Jens Axboe @ 2007-01-14 21:59 UTC (permalink / raw) To: Michal Piotrowski Cc: Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Neil Brown, LKML On Fri, Jan 12 2007, Michal Piotrowski wrote: > On 12/01/07, Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote: > >On 12/01/07, Rafael J. Wysocki <rjw@sisk.pl> wrote: > >> On Friday, 12 January 2007 14:33, Michal Piotrowski wrote: > >> > My system hangs on this > >> > > >http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg > >> > > >http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config > >> > > >> > Debug plan: > >> > - revert md-* patches > >> > - binary search > >> > > >> > Does someone have a better idea? > >> > >> Revert git-block.patch and related stuff? > > > >Indeed. > > > >GOOD > ># > >##git-sym2.patch > >#git-scsi-target.patch > >#git-scsi-target-fixup.patch > ># > >git-block.patch > >git-block-fixup.patch > >BAD > > > >git-block.patch it's huge patch. > > > >diffstat git-block.patch > >[..] > >drivers/md/bitmap.c | 1 > > drivers/md/dm-emc.c | 2 > > drivers/md/dm-table.c | 14 - > > drivers/md/dm.c | 18 - > > drivers/md/dm.h | 1 > > drivers/md/linear.c | 14 - > > drivers/md/md.c | 3 > > drivers/md/multipath.c | 32 -- > > drivers/md/raid0.c | 17 - > > drivers/md/raid1.c | 70 ----- > > drivers/md/raid10.c | 73 ------ > > drivers/md/raid5.c | 60 ---- > >[..] > > > >I'll do a binary search through those files. > > > > That wasn't a good idea. > > /mnt/md0/devel/linux-mm/drivers/md/raid1.c: In function 'unplug_slaves': > /mnt/md0/devel/linux-mm/drivers/md/raid1.c:556: error: > 'request_queue_t' has no member named 'unplug_fn' You can't go throught he individual files and expect that to compile, yet alone work. Fortunately I think I already fixed this last week, unfortunately I think I forgot to push the #plug branch for #for-akpm so that Andrew automatically picks it up... -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem 2007-01-12 13:33 2.6.20-rc4-mm1 md problem Michal Piotrowski 2007-01-12 15:52 ` Rafael J. Wysocki @ 2007-01-15 7:17 ` Ingo Molnar 2007-01-15 8:08 ` Thomas Gleixner 1 sibling, 1 reply; 12+ messages in thread From: Ingo Molnar @ 2007-01-15 7:17 UTC (permalink / raw) To: Michal Piotrowski Cc: Andrew Morton, Neil Brown, LKML, Thomas Gleixner, Jens Axboe * Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote: > My system hangs on this > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config > > Debug plan: > - revert md-* patches > - binary search > > Does someone have a better idea? Thomas saw something similar yesterday and he the partial results that git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or filesystems drivers. For me it worked fine, so it must be only on some combinations. The changes to ll_rw_block.c look quite extensive. Ingo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem 2007-01-15 7:17 ` Ingo Molnar @ 2007-01-15 8:08 ` Thomas Gleixner 2007-01-15 18:03 ` [patch-mm] Workaround for RAID breakage Thomas Gleixner 0 siblings, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2007-01-15 8:08 UTC (permalink / raw) To: Ingo Molnar Cc: Michal Piotrowski, Andrew Morton, Neil Brown, LKML, Jens Axboe On Mon, 2007-01-15 at 08:17 +0100, Ingo Molnar wrote: > > Debug plan: > > - revert md-* patches > > - binary search > > > > Does someone have a better idea? > > Thomas saw something similar yesterday and he the partial results that > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or > filesystems drivers. For me it worked fine, so it must be only on some > combinations. The changes to ll_rw_block.c look quite extensive. Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID. tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch-mm] Workaround for RAID breakage 2007-01-15 8:08 ` Thomas Gleixner @ 2007-01-15 18:03 ` Thomas Gleixner 2007-01-15 20:17 ` Michal Piotrowski 2007-01-16 0:41 ` Jens Axboe 0 siblings, 2 replies; 12+ messages in thread From: Thomas Gleixner @ 2007-01-15 18:03 UTC (permalink / raw) To: Ingo Molnar Cc: Michal Piotrowski, Andrew Morton, Neil Brown, LKML, Jens Axboe On Mon, 2007-01-15 at 09:08 +0100, Thomas Gleixner wrote: > > Thomas saw something similar yesterday and he the partial results that > > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or > > filesystems drivers. For me it worked fine, so it must be only on some > > combinations. The changes to ll_rw_block.c look quite extensive. > > Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID. I tracked this down and found two problems: - The new plug/unplug code does not check for underruns. That allows the plug count (ioc->plugged) to become negative. This gets triggered from various places. AFAICS this is intentional to avoid checks all over the place, but the underflow check is missing. All we need to do is make sure, that in case of ioc->plugged == 0 we return early and bug, if there is either a queue plugged in or the plugged_list is not empty. Jens ? - The raid1 code has no bitmap set in remount r/w. So the pending_bio_list gets not processed for quite a time. The workaround is to kick mddev->thread, so the list is processed. Not sure about that. Neil ? At least it boots and behaves normal. tglx Index: linux-2.6.20-rc4-mm1/block/ll_rw_blk.c =================================================================== --- linux-2.6.20-rc4-mm1.orig/block/ll_rw_blk.c +++ linux-2.6.20-rc4-mm1/block/ll_rw_blk.c @@ -3757,6 +3757,12 @@ void blk_unplug_current(void) if (!ioc) return; + if (!ioc->plugged) { + BUG_ON(!list_empty(&ioc->plugged_list)); + BUG_ON(ioc->plugged_queue); + return; + } + ioc->plugged--; if (ioc->plugged) return; Index: linux-2.6.20-rc4-mm1/drivers/md/raid1.c =================================================================== --- linux-2.6.20-rc4-mm1.orig/drivers/md/raid1.c +++ linux-2.6.20-rc4-mm1/drivers/md/raid1.c @@ -897,7 +897,7 @@ static int make_request(request_queue_t spin_unlock_irqrestore(&conf->device_lock, flags); - if (do_sync) + if (do_sync || !bitmap) md_wakeup_thread(mddev->thread); #if 0 while ((bio = bio_list_pop(&bl)) != NULL) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch-mm] Workaround for RAID breakage 2007-01-15 18:03 ` [patch-mm] Workaround for RAID breakage Thomas Gleixner @ 2007-01-15 20:17 ` Michal Piotrowski 2007-01-16 0:41 ` Jens Axboe 1 sibling, 0 replies; 12+ messages in thread From: Michal Piotrowski @ 2007-01-15 20:17 UTC (permalink / raw) To: tglx; +Cc: Ingo Molnar, Andrew Morton, Neil Brown, LKML, Jens Axboe On 15/01/07, Thomas Gleixner <tglx@linutronix.de> wrote: > On Mon, 2007-01-15 at 09:08 +0100, Thomas Gleixner wrote: > > > Thomas saw something similar yesterday and he the partial results that > > > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or > > > filesystems drivers. For me it worked fine, so it must be only on some > > > combinations. The changes to ll_rw_block.c look quite extensive. > > > > Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID. > > I tracked this down and found two problems: > > - The new plug/unplug code does not check for underruns. That allows the > plug count (ioc->plugged) to become negative. This gets triggered from > various places. > > AFAICS this is intentional to avoid checks all over the place, but the > underflow check is missing. All we need to do is make sure, that in case > of ioc->plugged == 0 we return early and bug, if there is either a queue > plugged in or the plugged_list is not empty. > > Jens ? > > - The raid1 code has no bitmap set in remount r/w. So the > pending_bio_list gets not processed for quite a time. The workaround is > to kick mddev->thread, so the list is processed. Not sure about that. > > Neil ? > > At least it boots and behaves normal. Yes, it works fine now. Thanks! > tglx Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (http://www.stardust.webpages.pl/ltg/) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch-mm] Workaround for RAID breakage 2007-01-15 18:03 ` [patch-mm] Workaround for RAID breakage Thomas Gleixner 2007-01-15 20:17 ` Michal Piotrowski @ 2007-01-16 0:41 ` Jens Axboe 2007-01-16 8:27 ` Thomas Gleixner 1 sibling, 1 reply; 12+ messages in thread From: Jens Axboe @ 2007-01-16 0:41 UTC (permalink / raw) To: Thomas Gleixner Cc: Ingo Molnar, Michal Piotrowski, Andrew Morton, Neil Brown, LKML On Mon, Jan 15 2007, Thomas Gleixner wrote: > On Mon, 2007-01-15 at 09:08 +0100, Thomas Gleixner wrote: > > > Thomas saw something similar yesterday and he the partial results that > > > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or > > > filesystems drivers. For me it worked fine, so it must be only on some > > > combinations. The changes to ll_rw_block.c look quite extensive. > > > > Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID. > > I tracked this down and found two problems: > > - The new plug/unplug code does not check for underruns. That allows the > plug count (ioc->plugged) to become negative. This gets triggered from > various places. > > AFAICS this is intentional to avoid checks all over the place, but the > underflow check is missing. All we need to do is make sure, that in case > of ioc->plugged == 0 we return early and bug, if there is either a queue > plugged in or the plugged_list is not empty. > > Jens ? It should not go negative, that would be a bug elsewhere. So it's interesting if it does, we should definitely put a WARN_ON() check in there for that. > - The raid1 code has no bitmap set in remount r/w. So the > pending_bio_list gets not processed for quite a time. The workaround is > to kick mddev->thread, so the list is processed. Not sure about that. > > Neil ? Super, thanks for that Thomas! I'll merge it in the plug branch. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch-mm] Workaround for RAID breakage 2007-01-16 0:41 ` Jens Axboe @ 2007-01-16 8:27 ` Thomas Gleixner 2007-01-16 10:07 ` Jens Axboe 0 siblings, 1 reply; 12+ messages in thread From: Thomas Gleixner @ 2007-01-16 8:27 UTC (permalink / raw) To: Jens Axboe Cc: Ingo Molnar, Michal Piotrowski, Andrew Morton, Neil Brown, LKML On Tue, 2007-01-16 at 11:41 +1100, Jens Axboe wrote: > > AFAICS this is intentional to avoid checks all over the place, but the > > underflow check is missing. All we need to do is make sure, that in case > > of ioc->plugged == 0 we return early and bug, if there is either a queue > > plugged in or the plugged_list is not empty. > > > > Jens ? > > It should not go negative, that would be a bug elsewhere. So it's > interesting if it does, we should definitely put a WARN_ON() check in > there for that. Well. All offenders come via queue_sync_plugs(). queue_sync_plugs() calls blk_unplug_current(). One path which triggers is blk_sync_queue(). This happens e.g. in the cleanup of the floppy check. There are other call pathes too. The other is raid md_super_write(). It is not plugged and calls with the barrier bit set, which executes the unlikely path in __make_request(): if (unlikely(bio_barrier(bio))) { queue_sync_plugs(q); spin_lock_irq(q->queue_lock); goto get_rq; } So you either need checks all over the place to avoid calling blk_unplug_current(), or you prevent the unplug below 0 like I did. The BUG_ON()s I added should catch any real invalid callers. But it's up to you. tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch-mm] Workaround for RAID breakage 2007-01-16 8:27 ` Thomas Gleixner @ 2007-01-16 10:07 ` Jens Axboe 0 siblings, 0 replies; 12+ messages in thread From: Jens Axboe @ 2007-01-16 10:07 UTC (permalink / raw) To: Thomas Gleixner Cc: Ingo Molnar, Michal Piotrowski, Andrew Morton, Neil Brown, LKML On Tue, Jan 16 2007, Thomas Gleixner wrote: > On Tue, 2007-01-16 at 11:41 +1100, Jens Axboe wrote: > > > AFAICS this is intentional to avoid checks all over the place, but the > > > underflow check is missing. All we need to do is make sure, that in case > > > of ioc->plugged == 0 we return early and bug, if there is either a queue > > > plugged in or the plugged_list is not empty. > > > > > > Jens ? > > > > It should not go negative, that would be a bug elsewhere. So it's > > interesting if it does, we should definitely put a WARN_ON() check in > > there for that. > > Well. All offenders come via queue_sync_plugs(). queue_sync_plugs() > calls blk_unplug_current(). Ah, again a problem because the #plug branch wasn't pushed to #for-akpm in due time. That problem was fixed already :/ > One path which triggers is blk_sync_queue(). This happens e.g. in the > cleanup of the floppy check. There are other call pathes too. > > The other is raid md_super_write(). It is not plugged and calls with the > barrier bit set, which executes the unlikely path in __make_request(): > > if (unlikely(bio_barrier(bio))) { > queue_sync_plugs(q); > spin_lock_irq(q->queue_lock); > goto get_rq; > } > > So you either need checks all over the place to avoid calling > blk_unplug_current(), or you prevent the unplug below 0 like I did. The > BUG_ON()s I added should catch any real invalid callers. But it's up to > you. blk_replug_current_nested() should do it. I'll make sure the branch is updated for next -mm. The BUG_ON()'s are indeed correct, they should just be moved further down the function. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-01-16 10:08 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-01-12 13:33 2.6.20-rc4-mm1 md problem Michal Piotrowski 2007-01-12 15:52 ` Rafael J. Wysocki 2007-01-12 17:40 ` Michal Piotrowski 2007-01-12 17:58 ` Michal Piotrowski 2007-01-14 21:59 ` Jens Axboe 2007-01-15 7:17 ` Ingo Molnar 2007-01-15 8:08 ` Thomas Gleixner 2007-01-15 18:03 ` [patch-mm] Workaround for RAID breakage Thomas Gleixner 2007-01-15 20:17 ` Michal Piotrowski 2007-01-16 0:41 ` Jens Axboe 2007-01-16 8:27 ` Thomas Gleixner 2007-01-16 10:07 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).