LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [BUG?] GDTH driver not working after upgrade to 2.6.24 @ 2008-01-30 19:47 Sven Köhler 2008-01-31 10:08 ` Boaz Harrosh 0 siblings, 1 reply; 31+ messages in thread From: Sven Köhler @ 2008-01-30 19:47 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1805 bytes --] Hi, so i have upgraded a system to kernel 2.6.24. After that, it failed to boot with the usual message telling, that the rootfs on device /dev/sda1 cannot be mounted (a raid1 run by the controller below). With 2.6.23.12, everything is working fine. # lspci -v: 03:01.0 RAID bus controller: Intel Corporation RAID Controller Subsystem: Intel Corporation Unknown device 01db Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17 Memory at ddffc000 (32-bit, prefetchable) [size=16K] [virtual] Expansion ROM at deef0000 [disabled] [size=32K] Capabilities: [80] Power Management version 2 # GDT-related dmesg output (2.6.23.12): GDT-HA: Storage RAID Controller Driver. Version: 3.05 ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 17 GDT-HA: Found 1 PCI Storage RAID Controllers Configuring GDT-PCI HA at 3/1 IRQ 17 GDT-HA 0: Name: SRCU42L scsi0 : SRCU42L scsi 0:0:0:0: Direct-Access Intel Host Drive #00 PQ: 0 ANSI: 2 scsi 0:2:6:0: Processor ESG-SHV SCA HSBP M29 1.06 PQ: 0 ANSI: 2 sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB) sd 0:0:0:0: [sda] Assuming Write Enabled sd 0:0:0:0: [sda] Assuming drive cache: write through sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB) sd 0:0:0:0: [sda] Assuming Write Enabled sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 sda2 < sda5 > sd 0:0:0:0: [sda] Attached SCSI disk # cat /boot/config-2.6.24 |grep GDT CONFIG_SCSI_GDTH=y Any ideas? http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2 show huge drivers/scsi/gdth* related changes. Can't test at the moment. System went production. Regards, Sven [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24 2008-01-30 19:47 [BUG?] GDTH driver not working after upgrade to 2.6.24 Sven Köhler @ 2008-01-31 10:08 ` Boaz Harrosh 2008-01-31 11:07 ` Sven Köhler 2008-02-12 17:30 ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh 0 siblings, 2 replies; 31+ messages in thread From: Boaz Harrosh @ 2008-01-31 10:08 UTC (permalink / raw) To: Sven Köhler, linux-scsi; +Cc: linux-kernel On Wed, Jan 30 2008 at 21:47 +0200, Sven Köhler <skoehler@upb.de> wrote: > Hi, > > so i have upgraded a system to kernel 2.6.24. After that, it failed to > boot with the usual message telling, that the rootfs on device /dev/sda1 > cannot be mounted (a raid1 run by the controller below). > > With 2.6.23.12, everything is working fine. > > # lspci -v: > > 03:01.0 RAID bus controller: Intel Corporation RAID Controller > Subsystem: Intel Corporation Unknown device 01db > Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17 > Memory at ddffc000 (32-bit, prefetchable) [size=16K] > [virtual] Expansion ROM at deef0000 [disabled] [size=32K] > Capabilities: [80] Power Management version 2 > > # GDT-related dmesg output (2.6.23.12): > > GDT-HA: Storage RAID Controller Driver. Version: 3.05 > ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 17 > GDT-HA: Found 1 PCI Storage RAID Controllers > Configuring GDT-PCI HA at 3/1 IRQ 17 > GDT-HA 0: Name: SRCU42L > scsi0 : SRCU42L > scsi 0:0:0:0: Direct-Access Intel Host Drive #00 PQ: 0 ANSI: 2 > scsi 0:2:6:0: Processor ESG-SHV SCA HSBP M29 1.06 PQ: 0 ANSI: 2 > sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB) > sd 0:0:0:0: [sda] Assuming Write Enabled > sd 0:0:0:0: [sda] Assuming drive cache: write through > sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB) > sd 0:0:0:0: [sda] Assuming Write Enabled > sd 0:0:0:0: [sda] Assuming drive cache: write through > sda: sda1 sda2 < sda5 > > sd 0:0:0:0: [sda] Attached SCSI disk > > # cat /boot/config-2.6.24 |grep GDT > > CONFIG_SCSI_GDTH=y > > > > > Any ideas? > > http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2 > show huge drivers/scsi/gdth* related changes. > > Can't test at the moment. System went production. > > > Regards, > Sven > Hi Sven! CCing to the scsi mailing list. Yes the gdth driver passed an open hart surgery in kernel 2.6.24. The bad thing about it is that all three of the Coders that did that did not have any HW to work on. One of them is me. We did cry for tester for a long time but no one came forward. Could you test patches for us? first thing would be to enable debug output patch below. If you absolutely need a 2.6.24 kernel, + gdth in a production system you could checkout the 2.6.23 driver and compile. The old driver will work the same in 2.6.24. It will not however even compile in 2.6.25-rcx. If any one wants to send me a card that uses the gdth driver, I will be very happy to debug this card, and return it once I'm done. Boaz --- git-diff --stat -p drivers/scsi/gdth.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index c825239..eca72c4 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -188,6 +188,7 @@ static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, struct gdth_cmndinfo *cmndinfo); static void gdth_scsi_done(struct scsi_cmnd *scp); +#define DEBUG_GDTH 1 #ifdef DEBUG_GDTH static unchar DebugState = DEBUG_GDTH; ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24 2008-01-31 10:08 ` Boaz Harrosh @ 2008-01-31 11:07 ` Sven Köhler 2008-01-31 12:35 ` Boaz Harrosh 2008-02-12 17:30 ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh 1 sibling, 1 reply; 31+ messages in thread From: Sven Köhler @ 2008-01-31 11:07 UTC (permalink / raw) To: Boaz Harrosh; +Cc: linux-scsi, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1580 bytes --] > Yes the gdth driver passed an open hart surgery in kernel 2.6.24. The bad thing > about it is that all three of the Coders that did that did not have any HW to work > on. One of them is me. We did cry for tester for a long time but no one came forward. All i'd like to ask is WHY!? WHY such a big open heart surgery? OK, you have your reasons. > Could you test patches for us? first thing would be to enable debug output patch below. The machine is still production. It will be replaced by a 64bit system with an AACRAID card some time in the future. Then, i could maybe test patches on the old machine. But unfortunatly, it's not my machine. It's just administrated by me. And i don't know, what's exactly is the future of it. > If you absolutely need a 2.6.24 kernel, + gdth in a production system you could > checkout the 2.6.23 driver and compile. The old driver will work the same in 2.6.24. > It will not however even compile in 2.6.25-rcx. I see. Thanks for the hint. Would be an alternative. Actually, i don't need 2.6.24 - but if something security related is fixed, then i'm not able to move on to 2.6.24. And i'm using Gentoo. There's not really a kernel maintained by the distribution. > If any one wants to send me a card that uses the gdth driver, I will be very happy > to debug this card, and return it once I'm done. I don't think, it's a seperate card, that i could send you. It's some 19" rack server with such a card on-board, i think. So it don't see much opportunity for me to test patched and stuff :-( [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 252 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24 2008-01-31 11:07 ` Sven Köhler @ 2008-01-31 12:35 ` Boaz Harrosh 2008-01-31 16:39 ` Jan Engelhardt 0 siblings, 1 reply; 31+ messages in thread From: Boaz Harrosh @ 2008-01-31 12:35 UTC (permalink / raw) To: Sven Köhler, James Bottomley, Matthew Wilcox Cc: linux-scsi, linux-kernel On Thu, Jan 31 2008 at 13:07 +0200, Sven Köhler <skoehler@upb.de> wrote: >> Yes the gdth driver passed an open hart surgery in kernel 2.6.24. The bad thing >> about it is that all three of the Coders that did that did not have any HW to work >> on. One of them is me. We did cry for tester for a long time but no one came forward. > > All i'd like to ask is WHY!? > WHY such a big open heart surgery? > OK, you have your reasons. > >> Could you test patches for us? first thing would be to enable debug output patch below. > > The machine is still production. It will be replaced by a 64bit system > with an AACRAID card some time in the future. Then, i could maybe test > patches on the old machine. But unfortunatly, it's not my machine. It's > just administrated by me. And i don't know, what's exactly is the future > of it. > >> If you absolutely need a 2.6.24 kernel, + gdth in a production system you could >> checkout the 2.6.23 driver and compile. The old driver will work the same in 2.6.24. >> It will not however even compile in 2.6.25-rcx. > > I see. Thanks for the hint. Would be an alternative. Actually, i don't > need 2.6.24 - but if something security related is fixed, then i'm not > able to move on to 2.6.24. And i'm using Gentoo. There's not really a > kernel maintained by the distribution. > >> If any one wants to send me a card that uses the gdth driver, I will be very happy >> to debug this card, and return it once I'm done. > > I don't think, it's a seperate card, that i could send you. It's some > 19" rack server with such a card on-board, i think. > > > So it don't see much opportunity for me to test patched and stuff :-( > Thanks, Perhaps someone else then. Anyone with gdth HW that can test patches? Your lspci said: "Intel Corporation RAID Controller" Matthew is there a gdth card lying around in an Intel lab near you? James do we need to mark gdth BROKEN for 2.6.24 and higher? Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24 2008-01-31 12:35 ` Boaz Harrosh @ 2008-01-31 16:39 ` Jan Engelhardt 2008-01-31 16:52 ` Boaz Harrosh 0 siblings, 1 reply; 31+ messages in thread From: Jan Engelhardt @ 2008-01-31 16:39 UTC (permalink / raw) To: Boaz Harrosh Cc: Sven Köhler, James Bottomley, Matthew Wilcox, linux-scsi, linux-kernel Hi, On Jan 31 2008 14:35, Boaz Harrosh wrote: > >Thanks, Perhaps someone else then. >Anyone with gdth HW that can test patches? Is bisecting down the existing chain and finding the bad commit sufficient? (I also take new patches.) >Your lspci said: "Intel Corporation RAID Controller" Matthew >is there a gdth card lying around in an Intel lab near you? > >James do we need to mark gdth BROKEN for 2.6.24 and higher? I say revert it for the time being. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24 2008-01-31 16:39 ` Jan Engelhardt @ 2008-01-31 16:52 ` Boaz Harrosh 0 siblings, 0 replies; 31+ messages in thread From: Boaz Harrosh @ 2008-01-31 16:52 UTC (permalink / raw) To: Jan Engelhardt Cc: Sven Köhler, James Bottomley, Matthew Wilcox, linux-scsi, linux-kernel On Thu, Jan 31 2008 at 18:39 +0200, Jan Engelhardt <jengelh@computergmbh.de> wrote: > Hi, > > On Jan 31 2008 14:35, Boaz Harrosh wrote: >> Thanks, Perhaps someone else then. >> Anyone with gdth HW that can test patches? > > Is bisecting down the existing chain and finding the bad commit > sufficient? (I also take new patches.) Most certainly. If you are willing, please... I'm looking for someone responsive. first - enable debug prints (see the original post) and send me the prints. second - bisection could be grate yes. third - accepting patches and testing could be grate, thanks. > >> Your lspci said: "Intel Corporation RAID Controller" Matthew >> is there a gdth card lying around in an Intel lab near you? >> >> James do we need to mark gdth BROKEN for 2.6.24 and higher? > > I say revert it for the time being. It could be reverted for 2.6.24.x maintenance releases but for 2.6.25-xxx it cannot as it will not compile, and the fix to that is what you see in code. Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage 2008-01-31 10:08 ` Boaz Harrosh 2008-01-31 11:07 ` Sven Köhler @ 2008-02-12 17:30 ` Boaz Harrosh 2008-02-12 17:35 ` [BUGFIX 1/2] gdth: scan for scsi devices Boaz Harrosh 2008-02-12 17:40 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh 1 sibling, 2 replies; 31+ messages in thread From: Boaz Harrosh @ 2008-02-12 17:30 UTC (permalink / raw) To: Sven Köhler, linux-scsi, James Bottomley, Christoph Hellwig, Jeff Garzik Cc: linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Thu, Jan 31 2008 at 12:08 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: > On Wed, Jan 30 2008 at 21:47 +0200, Sven Köhler <skoehler@upb.de> wrote: >> Hi, >> >> so i have upgraded a system to kernel 2.6.24. After that, it failed to >> boot with the usual message telling, that the rootfs on device /dev/sda1 >> cannot be mounted (a raid1 run by the controller below). >> >> With 2.6.23.12, everything is working fine. >> >> # lspci -v: >> >> 03:01.0 RAID bus controller: Intel Corporation RAID Controller >> Subsystem: Intel Corporation Unknown device 01db >> Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17 >> Memory at ddffc000 (32-bit, prefetchable) [size=16K] >> [virtual] Expansion ROM at deef0000 [disabled] [size=32K] >> Capabilities: [80] Power Management version 2 >> >> # GDT-related dmesg output (2.6.23.12): >> >> GDT-HA: Storage RAID Controller Driver. Version: 3.05 >> ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 17 >> GDT-HA: Found 1 PCI Storage RAID Controllers >> Configuring GDT-PCI HA at 3/1 IRQ 17 >> GDT-HA 0: Name: SRCU42L >> scsi0 : SRCU42L >> scsi 0:0:0:0: Direct-Access Intel Host Drive #00 PQ: 0 ANSI: 2 >> scsi 0:2:6:0: Processor ESG-SHV SCA HSBP M29 1.06 PQ: 0 ANSI: 2 >> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB) >> sd 0:0:0:0: [sda] Assuming Write Enabled >> sd 0:0:0:0: [sda] Assuming drive cache: write through >> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB) >> sd 0:0:0:0: [sda] Assuming Write Enabled >> sd 0:0:0:0: [sda] Assuming drive cache: write through >> sda: sda1 sda2 < sda5 > >> sd 0:0:0:0: [sda] Attached SCSI disk >> >> # cat /boot/config-2.6.24 |grep GDT >> >> CONFIG_SCSI_GDTH=y >> >> >> >> >> Any ideas? >> >> http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2 >> show huge drivers/scsi/gdth* related changes. >> >> Can't test at the moment. System went production. >> >> >> Regards, >> Sven >> > > Hi Sven! <snip> With the kind help of: Joerg Dorchain: <joerg@dorchain.net> Jon Chelton <jchelton@ffpglobal.com> Stefan Priebe <s.priebe@allied-internet.ag> Which let me take up their machines their effort and their time I hope I'm able to fix the gdth driver for the 2.6.24 kernel and forward. Actually it was a simple miss by Christoph, but with my inexperience it took me a bisection and a while to get it. Both Joerg, and Stefan where able to boot with these patches and work on their machine. Jon Chelton's disks array should also work, we're testing. Submitted 2 patches. They should also be included after some testing into the 2.6.24.x stable releases. (Will be posted after some more testing) [PATCH 1/2] gdth: scan for scsi devices simple but must fatal. [PATCH 2/2] gdth: bugfix for the Timer at exit crash James please inspect and comment on this patch. It was not yet tested by the original bug submitter. In the original gdth series Christoph has forgotten to add the call to scsi_scan_host(). Jeff alternative patches did do the scan. After everything was probed, the code would loop on all found cards and scan. However I like to individually scan at each probe, because I think this way it is more ready for the hot-plug API where the discovery is done outside of the driver, and the probe is called on single host at a time. Is that right? please comment. Test away. Meany thanks to Joerg, Jon && Stefan, Cheers. Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* [BUGFIX 1/2] gdth: scan for scsi devices 2008-02-12 17:30 ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh @ 2008-02-12 17:35 ` Boaz Harrosh 2008-02-12 18:05 ` Christoph Hellwig 2008-02-12 17:40 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh 1 sibling, 1 reply; 31+ messages in thread From: Boaz Harrosh @ 2008-02-12 17:35 UTC (permalink / raw) To: Sven Köhler, linux-scsi, James Bottomley, Christoph Hellwig, Jeff Garzik Cc: linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag The patch: "gdth: switch to modern scsi host registration" missed one simple fact when moving a way from scsi_module.c. That is to call scsi_scan_host() on the probed host. With this the gdth driver from 2.6.24 is again able to see drives and boot. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Tested-by: Joerg Dorchain: <joerg@dorchain.net> Tested-by: Stefan Priebe <s.priebe@allied-internet.ag> Tested-by: Jon Chelton <jchelton@ffpglobal.com> --- drivers/scsi/gdth.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index b253b8c..8eb78be 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -4838,6 +4838,9 @@ static int __init gdth_isa_probe_one(ulong32 isa_bios) if (error) goto out_free_coal_stat; list_add_tail(&ha->list, &gdth_instances); + + scsi_scan_host(shp); + return 0; out_free_coal_stat: @@ -4965,6 +4968,9 @@ static int __init gdth_eisa_probe_one(ushort eisa_slot) if (error) goto out_free_coal_stat; list_add_tail(&ha->list, &gdth_instances); + + scsi_scan_host(shp); + return 0; out_free_ccb_phys: @@ -5102,6 +5108,9 @@ static int __init gdth_pci_probe_one(gdth_pci_str *pcistr, int ctr) if (error) goto out_free_coal_stat; list_add_tail(&ha->list, &gdth_instances); + + scsi_scan_host(shp); + return 0; out_free_coal_stat: -- 1.5.3.3 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 1/2] gdth: scan for scsi devices 2008-02-12 17:35 ` [BUGFIX 1/2] gdth: scan for scsi devices Boaz Harrosh @ 2008-02-12 18:05 ` Christoph Hellwig 0 siblings, 0 replies; 31+ messages in thread From: Christoph Hellwig @ 2008-02-12 18:05 UTC (permalink / raw) To: Boaz Harrosh Cc: Sven K?hler, linux-scsi, James Bottomley, Christoph Hellwig, Jeff Garzik, linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Tue, Feb 12, 2008 at 07:35:22PM +0200, Boaz Harrosh wrote: > > The patch: "gdth: switch to modern scsi host registration" > > missed one simple fact when moving a way from scsi_module.c. > That is to call scsi_scan_host() on the probed host. > With this the gdth driver from 2.6.24 is again able to > see drives and boot. Doh, someone please hand me a brown paper bag. My first series of patches had this but it got dropped when I rebased it over various janitor cleanups. The patch looks obviously correct, thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-12 17:30 ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh 2008-02-12 17:35 ` [BUGFIX 1/2] gdth: scan for scsi devices Boaz Harrosh @ 2008-02-12 17:40 ` Boaz Harrosh 2008-02-13 7:06 ` Stefan Priebe - allied internet ag ` (2 more replies) 1 sibling, 3 replies; 31+ messages in thread From: Boaz Harrosh @ 2008-02-12 17:40 UTC (permalink / raw) To: Sven Köhler, Christoph Hellwig, Jeff Garzik Cc: linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag gdth _exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. del_timer_sync the timer before we delete the cards. NOT YET TESTED Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> --- drivers/scsi/gdth.c | 15 ++++++++------- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..103280e 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; + BUG_ON(list_empty(&gdth_instances)); + ha = list_first_entry(&gdth_instances, gdth_ha_str, list); spin_lock_irqsave(&ha->smp_lock, flags); @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha) ha->sdev = NULL; } - gdth_flush(ha); - if (shp->irq) free_irq(shp->irq,ha); @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void) { gdth_ha_str *ha; - list_for_each_entry(ha, &gdth_instances, list) - gdth_remove_one(ha); + unregister_chrdev(major,"gdth"); + unregister_reboot_notifier(&gdth_notifier); #ifdef GDTH_STATISTICS - del_timer(&gdth_timer); + del_timer_sync(&gdth_timer); #endif - unregister_chrdev(major,"gdth"); - unregister_reboot_notifier(&gdth_notifier); + + list_for_each_entry(ha, &gdth_instances, list) + gdth_remove_one(ha); } module_init(gdth_init); -- 1.5.3.3 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-12 17:40 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh @ 2008-02-13 7:06 ` Stefan Priebe - allied internet ag 2008-02-13 9:03 ` Boaz Harrosh 2008-02-13 10:48 ` Boaz Harrosh 2008-02-13 15:44 ` James Bottomley 2 siblings, 1 reply; 31+ messages in thread From: Stefan Priebe - allied internet ag @ 2008-02-13 7:06 UTC (permalink / raw) To: Boaz Harrosh Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton Hello! I've tested this patch now - and it works fine. Now rmmod, halt and reboot also works. Stefan Priebe Boaz Harrosh schrieb: > gdth _exit would first remove all cards then stop the timer > and would not sync with the timer function. This caused a crash > in gdth_timer() when module was unloaded. > > del_timer_sync the timer before we delete the cards. > > NOT YET TESTED > > Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> > --- > drivers/scsi/gdth.c | 15 ++++++++------- > 1 files changed, 8 insertions(+), 7 deletions(-) > > diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c > index 8eb78be..103280e 100644 > --- a/drivers/scsi/gdth.c > +++ b/drivers/scsi/gdth.c > @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data) > gdth_ha_str *ha; > ulong flags; > > + BUG_ON(list_empty(&gdth_instances)); > + > ha = list_first_entry(&gdth_instances, gdth_ha_str, list); > spin_lock_irqsave(&ha->smp_lock, flags); > > @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha) > ha->sdev = NULL; > } > > - gdth_flush(ha); > - > if (shp->irq) > free_irq(shp->irq,ha); > > @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void) > { > gdth_ha_str *ha; > > - list_for_each_entry(ha, &gdth_instances, list) > - gdth_remove_one(ha); > + unregister_chrdev(major,"gdth"); > + unregister_reboot_notifier(&gdth_notifier); > > #ifdef GDTH_STATISTICS > - del_timer(&gdth_timer); > + del_timer_sync(&gdth_timer); > #endif > - unregister_chrdev(major,"gdth"); > - unregister_reboot_notifier(&gdth_notifier); > + > + list_for_each_entry(ha, &gdth_instances, list) > + gdth_remove_one(ha); > } > > module_init(gdth_init); ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 7:06 ` Stefan Priebe - allied internet ag @ 2008-02-13 9:03 ` Boaz Harrosh 2008-02-13 19:38 ` Jan Engelhardt 0 siblings, 1 reply; 31+ messages in thread From: Boaz Harrosh @ 2008-02-13 9:03 UTC (permalink / raw) To: Stefan Priebe - allied internet ag Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton On Wed, Feb 13 2008 at 9:06 +0200, Stefan Priebe - allied internet ag <s.priebe@allied-internet.ag> wrote: > Hello! > > I've tested this patch now - and it works fine. Now rmmod, halt and > reboot also works. > > Stefan Priebe > This is grate news Stefan. Thank you very much for all your time and effort, with out we could not have fixed all this. Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 9:03 ` Boaz Harrosh @ 2008-02-13 19:38 ` Jan Engelhardt 2008-02-14 15:58 ` Boaz Harrosh 0 siblings, 1 reply; 31+ messages in thread From: Jan Engelhardt @ 2008-02-13 19:38 UTC (permalink / raw) To: Boaz Harrosh Cc: Stefan Priebe - allied internet ag, Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton On Feb 13 2008 11:03, Boaz Harrosh wrote: >> >> I've tested this patch now - and it works fine. Now rmmod, halt and >> reboot also works. >> >> Stefan Priebe >> >This is grate news Stefan. Thank you very much for all your time >and effort, with out we could not have fixed all this. Do you have a git tree with the latest pieces? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 19:38 ` Jan Engelhardt @ 2008-02-14 15:58 ` Boaz Harrosh 0 siblings, 0 replies; 31+ messages in thread From: Boaz Harrosh @ 2008-02-14 15:58 UTC (permalink / raw) To: Jan Engelhardt Cc: Stefan Priebe - allied internet ag, Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton On Wed, Feb 13 2008 at 21:38 +0200, Jan Engelhardt <jengelh@computergmbh.de> wrote: > On Feb 13 2008 11:03, Boaz Harrosh wrote: >>> I've tested this patch now - and it works fine. Now rmmod, halt and >>> reboot also works. >>> >>> Stefan Priebe >>> >> This is grate news Stefan. Thank you very much for all your time >> and effort, with out we could not have fixed all this. > > Do you have a git tree with the latest pieces? No, scsi-misc I guess ;) I could put it here: git://git.bhalevy.com/open-osd gdth branch give me an hours Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-12 17:40 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh 2008-02-13 7:06 ` Stefan Priebe - allied internet ag @ 2008-02-13 10:48 ` Boaz Harrosh 2008-02-13 15:44 ` James Bottomley 2 siblings, 0 replies; 31+ messages in thread From: Boaz Harrosh @ 2008-02-13 10:48 UTC (permalink / raw) To: Sven Köhler, Christoph Hellwig, Jeff Garzik Cc: linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag, Dave Milter On Tue, Feb 12 2008 at 19:40 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: > gdth _exit would first remove all cards then stop the timer > and would not sync with the timer function. This caused a crash > in gdth_timer() when module was unloaded. > > del_timer_sync the timer before we delete the cards. > > NOT YET TESTED > > Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Tested-by: Stefan Priebe <s.priebe@allied-internet.ag> > --- > drivers/scsi/gdth.c | 15 ++++++++------- > 1 files changed, 8 insertions(+), 7 deletions(-) > > diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c > index 8eb78be..103280e 100644 > --- a/drivers/scsi/gdth.c > +++ b/drivers/scsi/gdth.c > @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data) > gdth_ha_str *ha; > ulong flags; > > + BUG_ON(list_empty(&gdth_instances)); > + > ha = list_first_entry(&gdth_instances, gdth_ha_str, list); > spin_lock_irqsave(&ha->smp_lock, flags); > > @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha) > ha->sdev = NULL; > } > > - gdth_flush(ha); > - > if (shp->irq) > free_irq(shp->irq,ha); > > @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void) > { > gdth_ha_str *ha; > > - list_for_each_entry(ha, &gdth_instances, list) > - gdth_remove_one(ha); > + unregister_chrdev(major,"gdth"); > + unregister_reboot_notifier(&gdth_notifier); > > #ifdef GDTH_STATISTICS > - del_timer(&gdth_timer); > + del_timer_sync(&gdth_timer); > #endif > - unregister_chrdev(major,"gdth"); > - unregister_reboot_notifier(&gdth_notifier); > + > + list_for_each_entry(ha, &gdth_instances, list) > + gdth_remove_one(ha); > } > > module_init(gdth_init); James please put this patch in rc-fixes also. It has now been tested by few people, and it solves a reproducible problem in the unloading of the driver. It was not yet confirmed by Andrew's reporter with the: + if (list_empty(&gdth_instances)) + return; at gdth_timer() In -mm tree. In my patch I have converted the if() to a BUG_ON because now it should not happen. But I figure it is not worse then what there is now, which is nothing. With your recommendation I will push both patches to the stable branches People have emailed me requesting it. Thanks Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-12 17:40 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh 2008-02-13 7:06 ` Stefan Priebe - allied internet ag 2008-02-13 10:48 ` Boaz Harrosh @ 2008-02-13 15:44 ` James Bottomley 2008-02-13 15:54 ` Boaz Harrosh 2 siblings, 1 reply; 31+ messages in thread From: James Bottomley @ 2008-02-13 15:44 UTC (permalink / raw) To: Boaz Harrosh Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: > - gdth_flush(ha); > - This piece doesn't look right. gdth_flush() forces the internal cache to disk backing. If you remove it, you're taking the chance that the machine will be powered off without a writeback which can cause data corruption. James ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 15:44 ` James Bottomley @ 2008-02-13 15:54 ` Boaz Harrosh 2008-02-13 16:33 ` Boaz Harrosh 0 siblings, 1 reply; 31+ messages in thread From: Boaz Harrosh @ 2008-02-13 15:54 UTC (permalink / raw) To: James Bottomley Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: >> - gdth_flush(ha); >> - > > This piece doesn't look right. gdth_flush() forces the internal cache > to disk backing. If you remove it, you're taking the chance that the > machine will be powered off without a writeback which can cause data > corruption. > > James > Yes. I have more problems reported, with exit, and am just sending one more patch that puts this back in. Which was tested. So I will resend this one plus one new one. Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 15:54 ` Boaz Harrosh @ 2008-02-13 16:33 ` Boaz Harrosh 2008-02-13 16:35 ` [PATCH ver2] gdth: bugfix for the at-exit problems Boaz Harrosh 2008-02-13 16:45 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash James Bottomley 0 siblings, 2 replies; 31+ messages in thread From: Boaz Harrosh @ 2008-02-13 16:33 UTC (permalink / raw) To: James Bottomley Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: > On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: >>> - gdth_flush(ha); >>> - >> This piece doesn't look right. gdth_flush() forces the internal cache >> to disk backing. If you remove it, you're taking the chance that the >> machine will be powered off without a writeback which can cause data >> corruption. >> >> James >> > Yes. > I have more problems reported, with exit, and am just sending one more patch that puts > this back in. Which was tested. > > So I will resend this one plus one new one. > > Boaz > The gdth driver would do a register_reboot_notifier(&gdth_notifier); to a gdth_halt() function, which would then redo half of what gdth_exit does, and wrongly so, and crash. Are we guaranteed in todays kernel that modules .exit function be called on an halt or reboot? If so then there is no need for duplications and the gdth_halt() should go. Submitted a patch that replaces the previous one I submitted with a deeper fix. [PATCH] gdth: bugfix for the at-exit problems If you ask me this all gdth_flush() is a crackup. sd and scsi-ml are doing scsi FLUSH commands when ever is needed. The controller as no business caching data in memory longer then what is stated in standard. Raid controller or no raid controller. Virtual or not virtual device. Data on Plate means data on plate. What if there is a power outage? what the driver can do then? Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH ver2] gdth: bugfix for the at-exit problems 2008-02-13 16:33 ` Boaz Harrosh @ 2008-02-13 16:35 ` Boaz Harrosh 2008-02-13 16:45 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash James Bottomley 1 sibling, 0 replies; 31+ messages in thread From: Boaz Harrosh @ 2008-02-13 16:35 UTC (permalink / raw) To: James Bottomley Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also a reboot notifier function was registered but is not needed anymore. And would crash. So remove gdth_halt and the reboot notifier registration. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> --- drivers/scsi/gdth.c | 64 +++++++------------------------------------------- 1 files changed, 9 insertions(+), 55 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..8469fe6 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg); static void gdth_flush(gdth_ha_str *ha); -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf); static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *)); static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, struct gdth_cmndinfo *cmndinfo); @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd, #include "gdth_proc.h" #include "gdth_proc.c" -/* notifier block to get a notify on system shutdown/halt/reboot */ -static struct notifier_block gdth_notifier = { - gdth_halt, NULL, 0 -}; -static int notifier_disabled = 0; - static gdth_ha_str *gdth_find_ha(int hanum) { gdth_ha_str *ha; @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; + BUG_ON(list_empty(&gdth_instances)); + ha = list_first_entry(&gdth_instances, gdth_ha_str, list); spin_lock_irqsave(&ha->smp_lock, flags); @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha) } } -/* shutdown routine */ -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) -{ - gdth_ha_str *ha; -#ifndef __alpha__ - gdth_cmd_str gdtcmd; - char cmnd[MAX_COMMAND_SIZE]; -#endif - - if (notifier_disabled) - return NOTIFY_OK; - - TRACE2(("gdth_halt() event %d\n",(int)event)); - if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF) - return NOTIFY_DONE; - - notifier_disabled = 1; - printk("GDT-HA: Flushing all host drives .. "); - list_for_each_entry(ha, &gdth_instances, list) { - gdth_flush(ha); - -#ifndef __alpha__ - /* controller reset */ - memset(cmnd, 0xff, MAX_COMMAND_SIZE); - gdtcmd.BoardNode = LOCALBOARD; - gdtcmd.Service = CACHESERVICE; - gdtcmd.OpCode = GDT_RESET; - TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum)); - gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL); -#endif - } - printk("Done.\n"); - -#ifdef GDTH_STATISTICS - del_timer(&gdth_timer); -#endif - return NOTIFY_OK; -} - /* configure lun */ static int gdth_slave_configure(struct scsi_device *sdev) { @@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_remove_host(shp); + gdth_flush(ha); + if (ha->sdev) { scsi_free_host_dev(ha->sdev); ha->sdev = NULL; } - gdth_flush(ha); - if (shp->irq) free_irq(shp->irq,ha); @@ -5235,8 +5191,6 @@ static int __init gdth_init(void) add_timer(&gdth_timer); #endif major = register_chrdev(0,"gdth", &gdth_fops); - notifier_disabled = 0; - register_reboot_notifier(&gdth_notifier); gdth_polling = FALSE; return 0; } @@ -5245,14 +5199,14 @@ static void __exit gdth_exit(void) { gdth_ha_str *ha; - list_for_each_entry(ha, &gdth_instances, list) - gdth_remove_one(ha); + unregister_chrdev(major,"gdth"); #ifdef GDTH_STATISTICS - del_timer(&gdth_timer); + del_timer_sync(&gdth_timer); #endif - unregister_chrdev(major,"gdth"); - unregister_reboot_notifier(&gdth_notifier); + + list_for_each_entry(ha, &gdth_instances, list) + gdth_remove_one(ha); } module_init(gdth_init); -- 1.5.3.3 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 16:33 ` Boaz Harrosh 2008-02-13 16:35 ` [PATCH ver2] gdth: bugfix for the at-exit problems Boaz Harrosh @ 2008-02-13 16:45 ` James Bottomley 2008-02-13 16:50 ` Boaz Harrosh 1 sibling, 1 reply; 31+ messages in thread From: James Bottomley @ 2008-02-13 16:45 UTC (permalink / raw) To: Boaz Harrosh Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote: > On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: > > On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > >> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: > >>> - gdth_flush(ha); > >>> - > >> This piece doesn't look right. gdth_flush() forces the internal cache > >> to disk backing. If you remove it, you're taking the chance that the > >> machine will be powered off without a writeback which can cause data > >> corruption. > >> > >> James > >> > > Yes. > > I have more problems reported, with exit, and am just sending one more patch that puts > > this back in. Which was tested. > > > > So I will resend this one plus one new one. > > > > Boaz > > > > The gdth driver would do a register_reboot_notifier(&gdth_notifier); > to a gdth_halt() function, which would then redo half of what gdth_exit > does, and wrongly so, and crash. > > Are we guaranteed in todays kernel that modules .exit function be called > on an halt or reboot? If so then there is no need for duplications and > the gdth_halt() should go. No. The __exit section is actually discardable if you promise never to remove the module. James ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 16:45 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash James Bottomley @ 2008-02-13 16:50 ` Boaz Harrosh 2008-02-13 17:03 ` James Bottomley 0 siblings, 1 reply; 31+ messages in thread From: Boaz Harrosh @ 2008-02-13 16:50 UTC (permalink / raw) To: James Bottomley Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote: >> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: >>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: >>>>> - gdth_flush(ha); >>>>> - >>>> This piece doesn't look right. gdth_flush() forces the internal cache >>>> to disk backing. If you remove it, you're taking the chance that the >>>> machine will be powered off without a writeback which can cause data >>>> corruption. >>>> >>>> James >>>> >>> Yes. >>> I have more problems reported, with exit, and am just sending one more patch that puts >>> this back in. Which was tested. >>> >>> So I will resend this one plus one new one. >>> >>> Boaz >>> >> The gdth driver would do a register_reboot_notifier(&gdth_notifier); >> to a gdth_halt() function, which would then redo half of what gdth_exit >> does, and wrongly so, and crash. >> >> Are we guaranteed in todays kernel that modules .exit function be called >> on an halt or reboot? If so then there is no need for duplications and >> the gdth_halt() should go. > > No. The __exit section is actually discardable if you promise never to > remove the module. > I don't understand please explain. What does a driver need to do if it needs a consistent shutdown retine? module or built in? unload or shutdown? > James > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 16:50 ` Boaz Harrosh @ 2008-02-13 17:03 ` James Bottomley 2008-02-13 17:12 ` Boaz Harrosh ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: James Bottomley @ 2008-02-13 17:03 UTC (permalink / raw) To: Boaz Harrosh Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote: > On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote: > >> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: > >>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > >>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: > >>>>> - gdth_flush(ha); > >>>>> - > >>>> This piece doesn't look right. gdth_flush() forces the internal cache > >>>> to disk backing. If you remove it, you're taking the chance that the > >>>> machine will be powered off without a writeback which can cause data > >>>> corruption. > >>>> > >>>> James > >>>> > >>> Yes. > >>> I have more problems reported, with exit, and am just sending one more patch that puts > >>> this back in. Which was tested. > >>> > >>> So I will resend this one plus one new one. > >>> > >>> Boaz > >>> > >> The gdth driver would do a register_reboot_notifier(&gdth_notifier); > >> to a gdth_halt() function, which would then redo half of what gdth_exit > >> does, and wrongly so, and crash. > >> > >> Are we guaranteed in todays kernel that modules .exit function be called > >> on an halt or reboot? If so then there is no need for duplications and > >> the gdth_halt() should go. > > > > No. The __exit section is actually discardable if you promise never to > > remove the module. > > > I don't understand please explain. > What does a driver need to do if it needs a consistent shutdown retine? > module or built in? unload or shutdown? It needs to register a reboot notifier, which gdth does. However, the notifier is only called on reboot, so it also needs to clean up correctly on module exit as well. The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE command. That's done by a shutdown notifier from sd, so the correct thing would always get done; however it does mean the driver has to be in a condition to process the last sync cache command. For the quick fix, just keep the current infrastructure and put back the gdth_flush() command where it can be effective. James ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 17:03 ` James Bottomley @ 2008-02-13 17:12 ` Boaz Harrosh 2008-02-13 17:36 ` James Bottomley 2008-02-13 17:18 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh 2008-02-14 6:51 ` Christoph Hellwig 2 siblings, 1 reply; 31+ messages in thread From: Boaz Harrosh @ 2008-02-13 17:12 UTC (permalink / raw) To: James Bottomley Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote: >> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote: >>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: >>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: >>>>>>> - gdth_flush(ha); >>>>>>> - >>>>>> This piece doesn't look right. gdth_flush() forces the internal cache >>>>>> to disk backing. If you remove it, you're taking the chance that the >>>>>> machine will be powered off without a writeback which can cause data >>>>>> corruption. >>>>>> >>>>>> James >>>>>> >>>>> Yes. >>>>> I have more problems reported, with exit, and am just sending one more patch that puts >>>>> this back in. Which was tested. >>>>> >>>>> So I will resend this one plus one new one. >>>>> >>>>> Boaz >>>>> >>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier); >>>> to a gdth_halt() function, which would then redo half of what gdth_exit >>>> does, and wrongly so, and crash. >>>> >>>> Are we guaranteed in todays kernel that modules .exit function be called >>>> on an halt or reboot? If so then there is no need for duplications and >>>> the gdth_halt() should go. >>> No. The __exit section is actually discardable if you promise never to >>> remove the module. >>> >> I don't understand please explain. >> What does a driver need to do if it needs a consistent shutdown retine? >> module or built in? unload or shutdown? > > It needs to register a reboot notifier, which gdth does. > > However, the notifier is only called on reboot, so it also needs to > clean up correctly on module exit as well. > > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE > command. That's done by a shutdown notifier from sd, so the correct > thing would always get done; however it does mean the driver has to be > in a condition to process the last sync cache command. > > For the quick fix, just keep the current infrastructure and put back the > gdth_flush() command where it can be effective. > > James > > Totally untested. --- From: Boaz Harrosh <bharrosh@panasas.com> Subject: [PATCH] gdth: bugfix for the at-exit problems gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So unify the exit and halt functions with a gdth_shutdown() that's called by both. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> --- drivers/scsi/gdth.c | 99 ++++++++++++++++++++------------------------------ 1 files changed, 40 insertions(+), 59 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..7bb9b45 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg); static void gdth_flush(gdth_ha_str *ha); -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf); static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *)); static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, struct gdth_cmndinfo *cmndinfo); @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd, #include "gdth_proc.h" #include "gdth_proc.c" -/* notifier block to get a notify on system shutdown/halt/reboot */ -static struct notifier_block gdth_notifier = { - gdth_halt, NULL, 0 -}; -static int notifier_disabled = 0; - static gdth_ha_str *gdth_find_ha(int hanum) { gdth_ha_str *ha; @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; + BUG_ON(list_empty(&gdth_instances)); + ha = list_first_entry(&gdth_instances, gdth_ha_str, list); spin_lock_irqsave(&ha->smp_lock, flags); @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha) } } -/* shutdown routine */ -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) -{ - gdth_ha_str *ha; -#ifndef __alpha__ - gdth_cmd_str gdtcmd; - char cmnd[MAX_COMMAND_SIZE]; -#endif - - if (notifier_disabled) - return NOTIFY_OK; - - TRACE2(("gdth_halt() event %d\n",(int)event)); - if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF) - return NOTIFY_DONE; - - notifier_disabled = 1; - printk("GDT-HA: Flushing all host drives .. "); - list_for_each_entry(ha, &gdth_instances, list) { - gdth_flush(ha); - -#ifndef __alpha__ - /* controller reset */ - memset(cmnd, 0xff, MAX_COMMAND_SIZE); - gdtcmd.BoardNode = LOCALBOARD; - gdtcmd.Service = CACHESERVICE; - gdtcmd.OpCode = GDT_RESET; - TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum)); - gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL); -#endif - } - printk("Done.\n"); - -#ifdef GDTH_STATISTICS - del_timer(&gdth_timer); -#endif - return NOTIFY_OK; -} - /* configure lun */ static int gdth_slave_configure(struct scsi_device *sdev) { @@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_remove_host(shp); + gdth_flush(ha); + if (ha->sdev) { scsi_free_host_dev(ha->sdev); ha->sdev = NULL; } - gdth_flush(ha); - if (shp->irq) free_irq(shp->irq,ha); @@ -5173,6 +5129,40 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_host_put(shp); } +static void gdth_shutdown(void); +static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) +{ + TRACE2(("gdth_halt() event %d\n",(int)event)); + if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF) + return NOTIFY_DONE; + + gdth_shutdown(); + return NOTIFY_OK; +} + +static struct notifier_block gdth_notifier = { + gdth_halt, NULL, 0 +}; + +bool gdth_shutdown_done; +static void gdth_shutdown() +{ + gdth_ha_str *ha; + if (gdth_shutdown_done) + return; + + gdth_shutdown_done = true; + unregister_chrdev(major,"gdth"); + unregister_reboot_notifier(&gdth_notifier); + +#ifdef GDTH_STATISTICS + del_timer_sync(&gdth_timer); +#endif + + list_for_each_entry(ha, &gdth_instances, list) + gdth_remove_one(ha); +} + static int __init gdth_init(void) { if (disable) { @@ -5185,6 +5175,7 @@ static int __init gdth_init(void) GDTH_VERSION_STR); /* initializations */ + gdth_shutdown_done = false; gdth_polling = TRUE; gdth_clear_events(); @@ -5235,7 +5226,6 @@ static int __init gdth_init(void) add_timer(&gdth_timer); #endif major = register_chrdev(0,"gdth", &gdth_fops); - notifier_disabled = 0; register_reboot_notifier(&gdth_notifier); gdth_polling = FALSE; return 0; @@ -5243,16 +5233,7 @@ static int __init gdth_init(void) static void __exit gdth_exit(void) { - gdth_ha_str *ha; - - list_for_each_entry(ha, &gdth_instances, list) - gdth_remove_one(ha); - -#ifdef GDTH_STATISTICS - del_timer(&gdth_timer); -#endif - unregister_chrdev(major,"gdth"); - unregister_reboot_notifier(&gdth_notifier); + gdth_shutdown(); } module_init(gdth_init); -- 1.5.3.3 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 17:12 ` Boaz Harrosh @ 2008-02-13 17:36 ` James Bottomley 2008-02-14 10:49 ` Boaz Harrosh 0 siblings, 1 reply; 31+ messages in thread From: James Bottomley @ 2008-02-13 17:36 UTC (permalink / raw) To: Boaz Harrosh Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, 2008-02-13 at 19:12 +0200, Boaz Harrosh wrote: > On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote: > >> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > >>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote: > >>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: > >>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > >>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: > >>>>>>> - gdth_flush(ha); > >>>>>>> - > >>>>>> This piece doesn't look right. gdth_flush() forces the internal cache > >>>>>> to disk backing. If you remove it, you're taking the chance that the > >>>>>> machine will be powered off without a writeback which can cause data > >>>>>> corruption. > >>>>>> > >>>>>> James > >>>>>> > >>>>> Yes. > >>>>> I have more problems reported, with exit, and am just sending one more patch that puts > >>>>> this back in. Which was tested. > >>>>> > >>>>> So I will resend this one plus one new one. > >>>>> > >>>>> Boaz > >>>>> > >>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier); > >>>> to a gdth_halt() function, which would then redo half of what gdth_exit > >>>> does, and wrongly so, and crash. > >>>> > >>>> Are we guaranteed in todays kernel that modules .exit function be called > >>>> on an halt or reboot? If so then there is no need for duplications and > >>>> the gdth_halt() should go. > >>> No. The __exit section is actually discardable if you promise never to > >>> remove the module. > >>> > >> I don't understand please explain. > >> What does a driver need to do if it needs a consistent shutdown retine? > >> module or built in? unload or shutdown? > > > > It needs to register a reboot notifier, which gdth does. > > > > However, the notifier is only called on reboot, so it also needs to > > clean up correctly on module exit as well. > > > > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE > > command. That's done by a shutdown notifier from sd, so the correct > > thing would always get done; however it does mean the driver has to be > > in a condition to process the last sync cache command. > > > > For the quick fix, just keep the current infrastructure and put back the > > gdth_flush() command where it can be effective. > > > > James > > > > > Totally untested. > > --- > From: Boaz Harrosh <bharrosh@panasas.com> > Subject: [PATCH] gdth: bugfix for the at-exit problems > > gdth_exit would first remove all cards then stop the timer > and would not sync with the timer function. This caused a crash > in gdth_timer() when module was unloaded. > So del_timer_sync the timer before we delete the cards. > > also the reboot notifier function would crash. So unify > the exit and halt functions with a gdth_shutdown() that's > called by both. > > Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> > --- > drivers/scsi/gdth.c | 99 ++++++++++++++++++++------------------------------ > 1 files changed, 40 insertions(+), 59 deletions(-) > > diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c > index 8eb78be..7bb9b45 100644 > --- a/drivers/scsi/gdth.c > +++ b/drivers/scsi/gdth.c > @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep, > unsigned int cmd, unsigned long arg); > > static void gdth_flush(gdth_ha_str *ha); > -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf); > static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *)); > static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, > struct gdth_cmndinfo *cmndinfo); > @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd, > #include "gdth_proc.h" > #include "gdth_proc.c" > > -/* notifier block to get a notify on system shutdown/halt/reboot */ > -static struct notifier_block gdth_notifier = { > - gdth_halt, NULL, 0 > -}; > -static int notifier_disabled = 0; > - > static gdth_ha_str *gdth_find_ha(int hanum) > { > gdth_ha_str *ha; > @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data) > gdth_ha_str *ha; > ulong flags; > > + BUG_ON(list_empty(&gdth_instances)); > + > ha = list_first_entry(&gdth_instances, gdth_ha_str, list); > spin_lock_irqsave(&ha->smp_lock, flags); > > @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha) > } > } > > -/* shutdown routine */ > -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) > -{ > - gdth_ha_str *ha; > -#ifndef __alpha__ > - gdth_cmd_str gdtcmd; > - char cmnd[MAX_COMMAND_SIZE]; > -#endif > - > - if (notifier_disabled) > - return NOTIFY_OK; > - > - TRACE2(("gdth_halt() event %d\n",(int)event)); > - if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF) > - return NOTIFY_DONE; > - > - notifier_disabled = 1; > - printk("GDT-HA: Flushing all host drives .. "); > - list_for_each_entry(ha, &gdth_instances, list) { > - gdth_flush(ha); > - > -#ifndef __alpha__ > - /* controller reset */ > - memset(cmnd, 0xff, MAX_COMMAND_SIZE); > - gdtcmd.BoardNode = LOCALBOARD; > - gdtcmd.Service = CACHESERVICE; > - gdtcmd.OpCode = GDT_RESET; > - TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum)); > - gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL); > -#endif > - } > - printk("Done.\n"); > - > -#ifdef GDTH_STATISTICS > - del_timer(&gdth_timer); > -#endif > - return NOTIFY_OK; > -} > - > /* configure lun */ > static int gdth_slave_configure(struct scsi_device *sdev) > { > @@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha) > > scsi_remove_host(shp); > > + gdth_flush(ha); > + > if (ha->sdev) { > scsi_free_host_dev(ha->sdev); > ha->sdev = NULL; > } > > - gdth_flush(ha); > - > if (shp->irq) > free_irq(shp->irq,ha); > > @@ -5173,6 +5129,40 @@ static void gdth_remove_one(gdth_ha_str *ha) > scsi_host_put(shp); > } > > +static void gdth_shutdown(void); > +static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) > +{ > + TRACE2(("gdth_halt() event %d\n",(int)event)); > + if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF) > + return NOTIFY_DONE; > + > + gdth_shutdown(); > + return NOTIFY_OK; > +} > + > +static struct notifier_block gdth_notifier = { > + gdth_halt, NULL, 0 > +}; > + > +bool gdth_shutdown_done; Static police alert! Just make it static and move it into gdth_shutdown() > +static void gdth_shutdown() > +{ > + gdth_ha_str *ha; > + if (gdth_shutdown_done) > + return; > + > + gdth_shutdown_done = true; > + unregister_chrdev(major,"gdth"); > + unregister_reboot_notifier(&gdth_notifier); I'm not sure you can do this, aren't reboot notifiers called with the rwsem held? In which case the unregister which also takes the rwsem will hang the system. James ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 17:36 ` James Bottomley @ 2008-02-14 10:49 ` Boaz Harrosh 2008-02-14 11:58 ` [PATCH] gdth: bugfix for the at-exit problems Boaz Harrosh 0 siblings, 1 reply; 31+ messages in thread From: Boaz Harrosh @ 2008-02-14 10:49 UTC (permalink / raw) To: James Bottomley Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, Feb 13 2008 at 19:36 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: <snip> >> --- >> From: Boaz Harrosh <bharrosh@panasas.com> >> Subject: [PATCH] gdth: bugfix for the at-exit problems >> >> gdth_exit would first remove all cards then stop the timer >> and would not sync with the timer function. This caused a crash >> in gdth_timer() when module was unloaded. >> So del_timer_sync the timer before we delete the cards. >> >> also the reboot notifier function would crash. So unify >> the exit and halt functions with a gdth_shutdown() that's >> called by both. >> >> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> >> --- <snip> >> +static struct notifier_block gdth_notifier = { >> + gdth_halt, NULL, 0 >> +}; >> + >> +bool gdth_shutdown_done; > right forgot the static. But I use it in gdth_init(), so it must be external. Unless you promise me that gdth_init() will never ever be called after a call to shutdown. Any way the hot-plug patch changes all that. This is only for 2.6.24 bugfixs. > Static police alert! Just make it static and move it into > gdth_shutdown() > >> +static void gdth_shutdown() >> +{ >> + gdth_ha_str *ha; >> + if (gdth_shutdown_done) >> + return; >> + >> + gdth_shutdown_done = true; >> + unregister_chrdev(major,"gdth"); >> + unregister_reboot_notifier(&gdth_notifier); > > I'm not sure you can do this, aren't reboot notifiers called with the > rwsem held? In which case the unregister which also takes the rwsem > will hang the system. > humm, can't remove a notifier from within the notifier. Thanks James for the catch, it's what happens when you don't test your own patches. I have moved unregister_reboot_notifier to gdth_exit. > James > Will send a new version for review. Please note that this is a bugfix patch on top of 2.6.24. It is not needed for Jeff's hot-plug path. There will be one more bugfix patch for a crash at the user-mode ioctl code. Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] gdth: bugfix for the at-exit problems 2008-02-14 10:49 ` Boaz Harrosh @ 2008-02-14 11:58 ` Boaz Harrosh 2008-02-14 16:10 ` James Bottomley 0 siblings, 1 reply; 31+ messages in thread From: Boaz Harrosh @ 2008-02-14 11:58 UTC (permalink / raw) To: James Bottomley Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag This is a bugfix for the 2.6.24.x stable releases. gdth_exit would first remove all cards then stop the timer and would not sync with the timer function. This caused a crash in gdth_timer() when module was unloaded. So del_timer_sync the timer before we delete the cards. also the reboot notifier function would crash. So unify the exit and halt functions with a gdth_shutdown() that's called by both. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> --- drivers/scsi/gdth.c | 90 ++++++++++++++++++++------------------------------ 1 files changed, 36 insertions(+), 54 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index 8eb78be..3828b23 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep, unsigned int cmd, unsigned long arg); static void gdth_flush(gdth_ha_str *ha); -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf); static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *)); static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp, struct gdth_cmndinfo *cmndinfo); @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd, #include "gdth_proc.h" #include "gdth_proc.c" -/* notifier block to get a notify on system shutdown/halt/reboot */ -static struct notifier_block gdth_notifier = { - gdth_halt, NULL, 0 -}; -static int notifier_disabled = 0; - static gdth_ha_str *gdth_find_ha(int hanum) { gdth_ha_str *ha; @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data) gdth_ha_str *ha; ulong flags; + BUG_ON(list_empty(&gdth_instances)); + ha = list_first_entry(&gdth_instances, gdth_ha_str, list); spin_lock_irqsave(&ha->smp_lock, flags); @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha) } } -/* shutdown routine */ -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) -{ - gdth_ha_str *ha; -#ifndef __alpha__ - gdth_cmd_str gdtcmd; - char cmnd[MAX_COMMAND_SIZE]; -#endif - - if (notifier_disabled) - return NOTIFY_OK; - - TRACE2(("gdth_halt() event %d\n",(int)event)); - if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF) - return NOTIFY_DONE; - - notifier_disabled = 1; - printk("GDT-HA: Flushing all host drives .. "); - list_for_each_entry(ha, &gdth_instances, list) { - gdth_flush(ha); - -#ifndef __alpha__ - /* controller reset */ - memset(cmnd, 0xff, MAX_COMMAND_SIZE); - gdtcmd.BoardNode = LOCALBOARD; - gdtcmd.Service = CACHESERVICE; - gdtcmd.OpCode = GDT_RESET; - TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum)); - gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL); -#endif - } - printk("Done.\n"); - -#ifdef GDTH_STATISTICS - del_timer(&gdth_timer); -#endif - return NOTIFY_OK; -} - /* configure lun */ static int gdth_slave_configure(struct scsi_device *sdev) { @@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_remove_host(shp); + gdth_flush(ha); + if (ha->sdev) { scsi_free_host_dev(ha->sdev); ha->sdev = NULL; } - gdth_flush(ha); - if (shp->irq) free_irq(shp->irq,ha); @@ -5173,6 +5129,22 @@ static void gdth_remove_one(gdth_ha_str *ha) scsi_host_put(shp); } +static void gdth_shutdown(void); +static int gdth_halt(struct notifier_block *nb, ulong event, void *buf) +{ + TRACE2(("gdth_halt() event %d\n", (int)event)); + if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF) + return NOTIFY_DONE; + + gdth_shutdown(); + return NOTIFY_OK; +} + +static struct notifier_block gdth_notifier = { + gdth_halt, NULL, 0 +}; +static bool gdth_shutdown_done; + static int __init gdth_init(void) { if (disable) { @@ -5185,6 +5157,7 @@ static int __init gdth_init(void) GDTH_VERSION_STR); /* initializations */ + gdth_shutdown_done = false; gdth_polling = TRUE; gdth_clear_events(); @@ -5235,23 +5208,32 @@ static int __init gdth_init(void) add_timer(&gdth_timer); #endif major = register_chrdev(0,"gdth", &gdth_fops); - notifier_disabled = 0; register_reboot_notifier(&gdth_notifier); gdth_polling = FALSE; return 0; } -static void __exit gdth_exit(void) +static void gdth_shutdown() { gdth_ha_str *ha; - list_for_each_entry(ha, &gdth_instances, list) - gdth_remove_one(ha); + if (gdth_shutdown_done) + return; + + gdth_shutdown_done = true; + unregister_chrdev(major, "gdth"); #ifdef GDTH_STATISTICS - del_timer(&gdth_timer); + del_timer_sync(&gdth_timer); #endif - unregister_chrdev(major,"gdth"); + + list_for_each_entry(ha, &gdth_instances, list) + gdth_remove_one(ha); +} + +static void __exit gdth_exit(void) +{ + gdth_shutdown(); unregister_reboot_notifier(&gdth_notifier); } -- 1.5.3.3 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] gdth: bugfix for the at-exit problems 2008-02-14 11:58 ` [PATCH] gdth: bugfix for the at-exit problems Boaz Harrosh @ 2008-02-14 16:10 ` James Bottomley 2008-02-14 16:18 ` Boaz Harrosh 0 siblings, 1 reply; 31+ messages in thread From: James Bottomley @ 2008-02-14 16:10 UTC (permalink / raw) To: Boaz Harrosh Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Thu, 2008-02-14 at 13:58 +0200, Boaz Harrosh wrote: > This is a bugfix for the 2.6.24.x stable releases. > > gdth_exit would first remove all cards then stop the timer > and would not sync with the timer function. This caused a crash > in gdth_timer() when module was unloaded. > So del_timer_sync the timer before we delete the cards. > > also the reboot notifier function would crash. So unify > the exit and halt functions with a gdth_shutdown() that's > called by both. The patch looks fine now, thanks. Can we actually get a tester just to make sure there's nothing I missed. James ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] gdth: bugfix for the at-exit problems 2008-02-14 16:10 ` James Bottomley @ 2008-02-14 16:18 ` Boaz Harrosh 0 siblings, 0 replies; 31+ messages in thread From: Boaz Harrosh @ 2008-02-14 16:18 UTC (permalink / raw) To: James Bottomley Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Thu, Feb 14 2008 at 18:10 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > On Thu, 2008-02-14 at 13:58 +0200, Boaz Harrosh wrote: >> This is a bugfix for the 2.6.24.x stable releases. >> >> gdth_exit would first remove all cards then stop the timer >> and would not sync with the timer function. This caused a crash >> in gdth_timer() when module was unloaded. >> So del_timer_sync the timer before we delete the cards. >> >> also the reboot notifier function would crash. So unify >> the exit and halt functions with a gdth_shutdown() that's >> called by both. > > The patch looks fine now, thanks. Can we actually get a tester just to > make sure there's nothing I missed. > > James > > Yes, and the tester reported, a breakage. We are on it. Apparently, you cannot do a full deallocation of resources at reboot notifier, nor would you want to I guess. But you can do the flush. The exit call is never called on a reboot and the card access is valid to the end. Please comment? So I pretty much reverted that patch, but did leave some cleanups. Also we found the other problems reported with user-mode tools and cat /proc/sys/gdth/0 so 2 patches on the way above reverted. Give us a few ours to test every thing. Thanks Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 17:03 ` James Bottomley 2008-02-13 17:12 ` Boaz Harrosh @ 2008-02-13 17:18 ` Boaz Harrosh 2008-02-13 17:33 ` James Bottomley 2008-02-14 6:51 ` Christoph Hellwig 2 siblings, 1 reply; 31+ messages in thread From: Boaz Harrosh @ 2008-02-13 17:18 UTC (permalink / raw) To: James Bottomley Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote: >> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote: >>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote: >>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote: >>>>>>> - gdth_flush(ha); >>>>>>> - >>>>>> This piece doesn't look right. gdth_flush() forces the internal cache >>>>>> to disk backing. If you remove it, you're taking the chance that the >>>>>> machine will be powered off without a writeback which can cause data >>>>>> corruption. >>>>>> >>>>>> James >>>>>> >>>>> Yes. >>>>> I have more problems reported, with exit, and am just sending one more patch that puts >>>>> this back in. Which was tested. >>>>> >>>>> So I will resend this one plus one new one. >>>>> >>>>> Boaz >>>>> >>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier); >>>> to a gdth_halt() function, which would then redo half of what gdth_exit >>>> does, and wrongly so, and crash. >>>> >>>> Are we guaranteed in todays kernel that modules .exit function be called >>>> on an halt or reboot? If so then there is no need for duplications and >>>> the gdth_halt() should go. >>> No. The __exit section is actually discardable if you promise never to >>> remove the module. >>> >> I don't understand please explain. >> What does a driver need to do if it needs a consistent shutdown retine? >> module or built in? unload or shutdown? > > It needs to register a reboot notifier, which gdth does. > > However, the notifier is only called on reboot, so it also needs to > clean up correctly on module exit as well. > > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE Why would we think that the controller does not support this command is it not in the mandatory section of the standard? > command. That's done by a shutdown notifier from sd, so the correct > thing would always get done; however it does mean the driver has to be > in a condition to process the last sync cache command. Why would it not be ready? what do other drivers do? The drivers is ready until the very last module's .exit. Is that good enough? > > For the quick fix, just keep the current infrastructure and put back the > gdth_flush() command where it can be effective. > Just did. But if needed I would prefer to emulate the SCSI SYNCHRONIZE CACHE command and not that boot notifier thing. Please advise. > James > > Boaz ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 17:18 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh @ 2008-02-13 17:33 ` James Bottomley 0 siblings, 0 replies; 31+ messages in thread From: James Bottomley @ 2008-02-13 17:33 UTC (permalink / raw) To: Boaz Harrosh Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, 2008-02-13 at 19:18 +0200, Boaz Harrosh wrote: > On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > It needs to register a reboot notifier, which gdth does. > > > > However, the notifier is only called on reboot, so it also needs to > > clean up correctly on module exit as well. > > > > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE > > Why would we think that the controller does not support this command > is it not in the mandatory section of the standard? Um, because the controller isn't a SCSI device. It's an emulated device which means the SCSI comands are processed in the driver. It does look like the driver<->HBA communication is some sort of translated dialect of SCSI. > > command. That's done by a shutdown notifier from sd, so the correct > > thing would always get done; however it does mean the driver has to be > > in a condition to process the last sync cache command. > > Why would it not be ready? what do other drivers do? > The drivers is ready until the very last module's .exit. Is that good > enough? shutdown is called as part of device removal and module unload ... usually from scsi_remove_host(). So you can't tear down command processing before that point. > > > > For the quick fix, just keep the current infrastructure and put back the > > gdth_flush() command where it can be effective. > > > > Just did. But if needed I would prefer to emulate the SCSI SYNCHRONIZE CACHE > command and not that boot notifier thing. Please advise. I think such a change, though desirable, would be too large to count as a bug fix. James ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash 2008-02-13 17:03 ` James Bottomley 2008-02-13 17:12 ` Boaz Harrosh 2008-02-13 17:18 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh @ 2008-02-14 6:51 ` Christoph Hellwig 2 siblings, 0 replies; 31+ messages in thread From: Christoph Hellwig @ 2008-02-14 6:51 UTC (permalink / raw) To: James Bottomley Cc: Boaz Harrosh, Sven K?hler, Christoph Hellwig, Jeff Garzik, linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag On Wed, Feb 13, 2008 at 11:03:45AM -0600, James Bottomley wrote: > > I don't understand please explain. > > What does a driver need to do if it needs a consistent shutdown retine? > > module or built in? unload or shutdown? > > It needs to register a reboot notifier, which gdth does. Well, for crappy legacy driver that's the way, but it's not really recommended. As soon as a driver uses the proper driver models, e.g. gdth for pci using Jeff's pci hotplug patches it can just implement the ->shutdown method that is called before shutdown/kexec and can do the right thing. ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2008-02-14 16:20 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-01-30 19:47 [BUG?] GDTH driver not working after upgrade to 2.6.24 Sven Köhler 2008-01-31 10:08 ` Boaz Harrosh 2008-01-31 11:07 ` Sven Köhler 2008-01-31 12:35 ` Boaz Harrosh 2008-01-31 16:39 ` Jan Engelhardt 2008-01-31 16:52 ` Boaz Harrosh 2008-02-12 17:30 ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh 2008-02-12 17:35 ` [BUGFIX 1/2] gdth: scan for scsi devices Boaz Harrosh 2008-02-12 18:05 ` Christoph Hellwig 2008-02-12 17:40 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh 2008-02-13 7:06 ` Stefan Priebe - allied internet ag 2008-02-13 9:03 ` Boaz Harrosh 2008-02-13 19:38 ` Jan Engelhardt 2008-02-14 15:58 ` Boaz Harrosh 2008-02-13 10:48 ` Boaz Harrosh 2008-02-13 15:44 ` James Bottomley 2008-02-13 15:54 ` Boaz Harrosh 2008-02-13 16:33 ` Boaz Harrosh 2008-02-13 16:35 ` [PATCH ver2] gdth: bugfix for the at-exit problems Boaz Harrosh 2008-02-13 16:45 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash James Bottomley 2008-02-13 16:50 ` Boaz Harrosh 2008-02-13 17:03 ` James Bottomley 2008-02-13 17:12 ` Boaz Harrosh 2008-02-13 17:36 ` James Bottomley 2008-02-14 10:49 ` Boaz Harrosh 2008-02-14 11:58 ` [PATCH] gdth: bugfix for the at-exit problems Boaz Harrosh 2008-02-14 16:10 ` James Bottomley 2008-02-14 16:18 ` Boaz Harrosh 2008-02-13 17:18 ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh 2008-02-13 17:33 ` James Bottomley 2008-02-14 6:51 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).