LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [BUG?] GDTH driver not working after upgrade to 2.6.24
@ 2008-01-30 19:47 Sven Köhler
  2008-01-31 10:08 ` Boaz Harrosh
  0 siblings, 1 reply; 31+ messages in thread
From: Sven Köhler @ 2008-01-30 19:47 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

Hi,

so i have upgraded a system to kernel 2.6.24. After that, it failed to 
boot with the usual message telling, that the rootfs on device /dev/sda1 
  cannot be mounted (a raid1 run by the controller below).

With 2.6.23.12, everything is working fine.

# lspci -v:

03:01.0 RAID bus controller: Intel Corporation RAID Controller
         Subsystem: Intel Corporation Unknown device 01db
         Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17
         Memory at ddffc000 (32-bit, prefetchable) [size=16K]
         [virtual] Expansion ROM at deef0000 [disabled] [size=32K]
         Capabilities: [80] Power Management version 2

# GDT-related dmesg output (2.6.23.12):

GDT-HA: Storage RAID Controller Driver. Version: 3.05
ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 17
GDT-HA: Found 1 PCI Storage RAID Controllers
Configuring GDT-PCI HA at 3/1 IRQ 17
GDT-HA 0: Name: SRCU42L
scsi0 : SRCU42L
scsi 0:0:0:0: Direct-Access     Intel    Host Drive  #00       PQ: 0 ANSI: 2
scsi 0:2:6:0: Processor         ESG-SHV  SCA HSBP M29     1.06 PQ: 0 ANSI: 2
sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
sd 0:0:0:0: [sda] Assuming Write Enabled
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
sd 0:0:0:0: [sda] Assuming Write Enabled
sd 0:0:0:0: [sda] Assuming drive cache: write through
  sda: sda1 sda2 < sda5 >
sd 0:0:0:0: [sda] Attached SCSI disk

# cat /boot/config-2.6.24 |grep GDT

CONFIG_SCSI_GDTH=y




Any ideas?

http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2 
show huge drivers/scsi/gdth* related changes.

Can't test at the moment. System went production.


Regards,
   Sven


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24
  2008-01-30 19:47 [BUG?] GDTH driver not working after upgrade to 2.6.24 Sven Köhler
@ 2008-01-31 10:08 ` Boaz Harrosh
  2008-01-31 11:07   ` Sven Köhler
  2008-02-12 17:30   ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh
  0 siblings, 2 replies; 31+ messages in thread
From: Boaz Harrosh @ 2008-01-31 10:08 UTC (permalink / raw)
  To: Sven Köhler, linux-scsi; +Cc: linux-kernel

On Wed, Jan 30 2008 at 21:47 +0200, Sven Köhler <skoehler@upb.de> wrote:
> Hi,
> 
> so i have upgraded a system to kernel 2.6.24. After that, it failed to 
> boot with the usual message telling, that the rootfs on device /dev/sda1 
>   cannot be mounted (a raid1 run by the controller below).
> 
> With 2.6.23.12, everything is working fine.
> 
> # lspci -v:
> 
> 03:01.0 RAID bus controller: Intel Corporation RAID Controller
>          Subsystem: Intel Corporation Unknown device 01db
>          Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17
>          Memory at ddffc000 (32-bit, prefetchable) [size=16K]
>          [virtual] Expansion ROM at deef0000 [disabled] [size=32K]
>          Capabilities: [80] Power Management version 2
> 
> # GDT-related dmesg output (2.6.23.12):
> 
> GDT-HA: Storage RAID Controller Driver. Version: 3.05
> ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 17
> GDT-HA: Found 1 PCI Storage RAID Controllers
> Configuring GDT-PCI HA at 3/1 IRQ 17
> GDT-HA 0: Name: SRCU42L
> scsi0 : SRCU42L
> scsi 0:0:0:0: Direct-Access     Intel    Host Drive  #00       PQ: 0 ANSI: 2
> scsi 0:2:6:0: Processor         ESG-SHV  SCA HSBP M29     1.06 PQ: 0 ANSI: 2
> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
> sd 0:0:0:0: [sda] Assuming Write Enabled
> sd 0:0:0:0: [sda] Assuming drive cache: write through
> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
> sd 0:0:0:0: [sda] Assuming Write Enabled
> sd 0:0:0:0: [sda] Assuming drive cache: write through
>   sda: sda1 sda2 < sda5 >
> sd 0:0:0:0: [sda] Attached SCSI disk
> 
> # cat /boot/config-2.6.24 |grep GDT
> 
> CONFIG_SCSI_GDTH=y
> 
> 
> 
> 
> Any ideas?
> 
> http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2 
> show huge drivers/scsi/gdth* related changes.
> 
> Can't test at the moment. System went production.
> 
> 
> Regards,
>    Sven
> 

Hi Sven!

CCing to the scsi mailing list. 

Yes the gdth driver passed an open hart surgery in kernel 2.6.24. The bad thing
about it is that all three of the Coders that did that did not have any HW to work
on. One of them is me. We did cry for tester for a long time but no one came forward.

Could you test patches for us? first thing would be to enable debug output patch below.

If you absolutely need a 2.6.24 kernel, + gdth in a production system you could
checkout the 2.6.23 driver and compile. The old driver will work the same in 2.6.24.
It will not however even compile in 2.6.25-rcx.

If any one wants to send me a card that uses the gdth driver, I will be very happy
to debug this card, and return it once I'm done.

Boaz

---
git-diff --stat -p
 drivers/scsi/gdth.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index c825239..eca72c4 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -188,6 +188,7 @@ static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
 				struct gdth_cmndinfo *cmndinfo);
 static void gdth_scsi_done(struct scsi_cmnd *scp);
 
+#define DEBUG_GDTH 1
 #ifdef DEBUG_GDTH
 static unchar   DebugState = DEBUG_GDTH;
 




 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24
  2008-01-31 10:08 ` Boaz Harrosh
@ 2008-01-31 11:07   ` Sven Köhler
  2008-01-31 12:35     ` Boaz Harrosh
  2008-02-12 17:30   ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh
  1 sibling, 1 reply; 31+ messages in thread
From: Sven Köhler @ 2008-01-31 11:07 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: linux-scsi, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]

> Yes the gdth driver passed an open hart surgery in kernel 2.6.24. The bad thing
> about it is that all three of the Coders that did that did not have any HW to work
> on. One of them is me. We did cry for tester for a long time but no one came forward.

All i'd like to ask is WHY!?
WHY such a big open heart surgery?
OK, you have your reasons.

> Could you test patches for us? first thing would be to enable debug output patch below.

The machine is still production. It will be replaced by a 64bit system 
with an AACRAID card some time in the future. Then, i could maybe test 
patches on the old machine. But unfortunatly, it's not my machine. It's 
just administrated by me. And i don't know, what's exactly is the future 
of it.

> If you absolutely need a 2.6.24 kernel, + gdth in a production system you could
> checkout the 2.6.23 driver and compile. The old driver will work the same in 2.6.24.
> It will not however even compile in 2.6.25-rcx.

I see. Thanks for the hint. Would be an alternative. Actually, i don't 
need 2.6.24 - but if something security related is fixed, then i'm not 
able to move on to 2.6.24. And i'm using Gentoo. There's not really a 
kernel maintained by the distribution.

> If any one wants to send me a card that uses the gdth driver, I will be very happy
> to debug this card, and return it once I'm done.

I don't think, it's a seperate card, that i could send you. It's some 
19" rack server with such a card on-board, i think.


So it don't see much opportunity for me to test patched and stuff :-(


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24
  2008-01-31 11:07   ` Sven Köhler
@ 2008-01-31 12:35     ` Boaz Harrosh
  2008-01-31 16:39       ` Jan Engelhardt
  0 siblings, 1 reply; 31+ messages in thread
From: Boaz Harrosh @ 2008-01-31 12:35 UTC (permalink / raw)
  To: Sven Köhler, James Bottomley, Matthew Wilcox
  Cc: linux-scsi, linux-kernel

On Thu, Jan 31 2008 at 13:07 +0200, Sven Köhler <skoehler@upb.de> wrote:
>> Yes the gdth driver passed an open hart surgery in kernel 2.6.24. The bad thing
>> about it is that all three of the Coders that did that did not have any HW to work
>> on. One of them is me. We did cry for tester for a long time but no one came forward.
> 
> All i'd like to ask is WHY!?
> WHY such a big open heart surgery?
> OK, you have your reasons.
> 
>> Could you test patches for us? first thing would be to enable debug output patch below.
> 
> The machine is still production. It will be replaced by a 64bit system 
> with an AACRAID card some time in the future. Then, i could maybe test 
> patches on the old machine. But unfortunatly, it's not my machine. It's 
> just administrated by me. And i don't know, what's exactly is the future 
> of it.
> 
>> If you absolutely need a 2.6.24 kernel, + gdth in a production system you could
>> checkout the 2.6.23 driver and compile. The old driver will work the same in 2.6.24.
>> It will not however even compile in 2.6.25-rcx.
> 
> I see. Thanks for the hint. Would be an alternative. Actually, i don't 
> need 2.6.24 - but if something security related is fixed, then i'm not 
> able to move on to 2.6.24. And i'm using Gentoo. There's not really a 
> kernel maintained by the distribution.
> 
>> If any one wants to send me a card that uses the gdth driver, I will be very happy
>> to debug this card, and return it once I'm done.
> 
> I don't think, it's a seperate card, that i could send you. It's some 
> 19" rack server with such a card on-board, i think.
> 
> 
> So it don't see much opportunity for me to test patched and stuff :-(
> 

Thanks, Perhaps someone else then. 
Anyone with gdth HW that can test patches?
Your lspci said: "Intel Corporation RAID Controller" Matthew
is there a gdth card lying around in an Intel lab near you?

James do we need to mark gdth BROKEN for 2.6.24 and higher?

Boaz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24
  2008-01-31 12:35     ` Boaz Harrosh
@ 2008-01-31 16:39       ` Jan Engelhardt
  2008-01-31 16:52         ` Boaz Harrosh
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Engelhardt @ 2008-01-31 16:39 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Sven Köhler, James Bottomley, Matthew Wilcox, linux-scsi,
	linux-kernel

Hi,

On Jan 31 2008 14:35, Boaz Harrosh wrote:
>
>Thanks, Perhaps someone else then. 
>Anyone with gdth HW that can test patches?

Is bisecting down the existing chain and finding the bad commit 
sufficient? (I also take new patches.)

>Your lspci said: "Intel Corporation RAID Controller" Matthew
>is there a gdth card lying around in an Intel lab near you?
>
>James do we need to mark gdth BROKEN for 2.6.24 and higher?

I say revert it for the time being.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUG?] GDTH driver not working after upgrade to 2.6.24
  2008-01-31 16:39       ` Jan Engelhardt
@ 2008-01-31 16:52         ` Boaz Harrosh
  0 siblings, 0 replies; 31+ messages in thread
From: Boaz Harrosh @ 2008-01-31 16:52 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Sven Köhler, James Bottomley, Matthew Wilcox, linux-scsi,
	linux-kernel

On Thu, Jan 31 2008 at 18:39 +0200, Jan Engelhardt <jengelh@computergmbh.de> wrote:
> Hi,
> 
> On Jan 31 2008 14:35, Boaz Harrosh wrote:
>> Thanks, Perhaps someone else then. 
>> Anyone with gdth HW that can test patches?
> 
> Is bisecting down the existing chain and finding the bad commit 
> sufficient? (I also take new patches.)
Most certainly. If you are willing, please... I'm looking for someone
responsive.

first -  enable debug prints (see the original post)
         and send me the prints.
second - bisection could be grate yes.
third - accepting patches and testing could be grate, thanks.

> 
>> Your lspci said: "Intel Corporation RAID Controller" Matthew
>> is there a gdth card lying around in an Intel lab near you?
>>
>> James do we need to mark gdth BROKEN for 2.6.24 and higher?
> 
> I say revert it for the time being.

It could be reverted for 2.6.24.x  maintenance releases but for 2.6.25-xxx
it cannot as it will not compile, and the fix to that is what you see in code.

Boaz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage 
  2008-01-31 10:08 ` Boaz Harrosh
  2008-01-31 11:07   ` Sven Köhler
@ 2008-02-12 17:30   ` Boaz Harrosh
  2008-02-12 17:35     ` [BUGFIX 1/2] gdth: scan for scsi devices Boaz Harrosh
  2008-02-12 17:40     ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
  1 sibling, 2 replies; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-12 17:30 UTC (permalink / raw)
  To: Sven Köhler, linux-scsi, James Bottomley, Christoph Hellwig,
	Jeff Garzik
  Cc: linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Thu, Jan 31 2008 at 12:08 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> On Wed, Jan 30 2008 at 21:47 +0200, Sven Köhler <skoehler@upb.de> wrote:
>> Hi,
>>
>> so i have upgraded a system to kernel 2.6.24. After that, it failed to 
>> boot with the usual message telling, that the rootfs on device /dev/sda1 
>>   cannot be mounted (a raid1 run by the controller below).
>>
>> With 2.6.23.12, everything is working fine.
>>
>> # lspci -v:
>>
>> 03:01.0 RAID bus controller: Intel Corporation RAID Controller
>>          Subsystem: Intel Corporation Unknown device 01db
>>          Flags: bus master, 66MHz, slow devsel, latency 64, IRQ 17
>>          Memory at ddffc000 (32-bit, prefetchable) [size=16K]
>>          [virtual] Expansion ROM at deef0000 [disabled] [size=32K]
>>          Capabilities: [80] Power Management version 2
>>
>> # GDT-related dmesg output (2.6.23.12):
>>
>> GDT-HA: Storage RAID Controller Driver. Version: 3.05
>> ACPI: PCI Interrupt 0000:03:01.0[A] -> GSI 24 (level, low) -> IRQ 17
>> GDT-HA: Found 1 PCI Storage RAID Controllers
>> Configuring GDT-PCI HA at 3/1 IRQ 17
>> GDT-HA 0: Name: SRCU42L
>> scsi0 : SRCU42L
>> scsi 0:0:0:0: Direct-Access     Intel    Host Drive  #00       PQ: 0 ANSI: 2
>> scsi 0:2:6:0: Processor         ESG-SHV  SCA HSBP M29     1.06 PQ: 0 ANSI: 2
>> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
>> sd 0:0:0:0: [sda] Assuming Write Enabled
>> sd 0:0:0:0: [sda] Assuming drive cache: write through
>> sd 0:0:0:0: [sda] 143299800 512-byte hardware sectors (73369 MB)
>> sd 0:0:0:0: [sda] Assuming Write Enabled
>> sd 0:0:0:0: [sda] Assuming drive cache: write through
>>   sda: sda1 sda2 < sda5 >
>> sd 0:0:0:0: [sda] Attached SCSI disk
>>
>> # cat /boot/config-2.6.24 |grep GDT
>>
>> CONFIG_SCSI_GDTH=y
>>
>>
>>
>>
>> Any ideas?
>>
>> http://www.kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fpatch-2.6.24.bz2 
>> show huge drivers/scsi/gdth* related changes.
>>
>> Can't test at the moment. System went production.
>>
>>
>> Regards,
>>    Sven
>>
> 
> Hi Sven!
<snip>

With the kind help of:
    Joerg Dorchain: <joerg@dorchain.net>
    Jon Chelton <jchelton@ffpglobal.com>
    Stefan Priebe <s.priebe@allied-internet.ag>

Which let me take up their machines their effort and their time
I hope I'm able to fix the gdth driver for the 2.6.24 kernel and forward.
Actually it was a simple miss by Christoph, but with my inexperience
it took me a bisection and a while to get it.

Both Joerg, and Stefan where able to boot with these patches and work
on their machine. Jon Chelton's disks array should also work, we're testing.

Submitted 2 patches. They should also be included after some testing
into the 2.6.24.x stable releases. (Will be posted after some more testing)

[PATCH 1/2] gdth: scan for scsi devices
 simple but must fatal.

[PATCH 2/2] gdth: bugfix for the Timer at exit crash
  James please inspect and comment on this patch. It was not yet tested
  by the original bug submitter.

In the original gdth series Christoph has forgotten to add the call to
scsi_scan_host(). Jeff alternative patches did do the scan. After everything
was probed, the code would loop on all found cards and scan. However I like
to individually scan at each probe, because I think this way it is more ready
for the hot-plug API where the discovery is done outside of the driver, and the
probe is called on single host at a time. Is that right? please comment.

Test away.

Meany thanks to Joerg, Jon && Stefan, Cheers.

Boaz


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX 1/2] gdth: scan for scsi devices
  2008-02-12 17:30   ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh
@ 2008-02-12 17:35     ` Boaz Harrosh
  2008-02-12 18:05       ` Christoph Hellwig
  2008-02-12 17:40     ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
  1 sibling, 1 reply; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-12 17:35 UTC (permalink / raw)
  To: Sven Köhler, linux-scsi, James Bottomley, Christoph Hellwig,
	Jeff Garzik
  Cc: linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain,
	Jon Chelton, Stefan Priebe - allied internet ag


The patch: "gdth: switch to modern scsi host registration"

missed one simple fact when moving a way from scsi_module.c.
That is to call scsi_scan_host() on the probed host.
With this the gdth driver from 2.6.24 is again able to
see drives and boot.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Tested-by: Joerg Dorchain: <joerg@dorchain.net>
Tested-by: Stefan Priebe <s.priebe@allied-internet.ag>
Tested-by: Jon Chelton <jchelton@ffpglobal.com>
---
 drivers/scsi/gdth.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index b253b8c..8eb78be 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -4838,6 +4838,9 @@ static int __init gdth_isa_probe_one(ulong32 isa_bios)
 	if (error)
 		goto out_free_coal_stat;
 	list_add_tail(&ha->list, &gdth_instances);
+
+	scsi_scan_host(shp);
+
 	return 0;
 
  out_free_coal_stat:
@@ -4965,6 +4968,9 @@ static int __init gdth_eisa_probe_one(ushort eisa_slot)
 	if (error)
 		goto out_free_coal_stat;
 	list_add_tail(&ha->list, &gdth_instances);
+
+	scsi_scan_host(shp);
+
 	return 0;
 
  out_free_ccb_phys:
@@ -5102,6 +5108,9 @@ static int __init gdth_pci_probe_one(gdth_pci_str *pcistr, int ctr)
 	if (error)
 		goto out_free_coal_stat;
 	list_add_tail(&ha->list, &gdth_instances);
+
+	scsi_scan_host(shp);
+
 	return 0;
 
  out_free_coal_stat:
-- 
1.5.3.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-12 17:30   ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh
  2008-02-12 17:35     ` [BUGFIX 1/2] gdth: scan for scsi devices Boaz Harrosh
@ 2008-02-12 17:40     ` Boaz Harrosh
  2008-02-13  7:06       ` Stefan Priebe - allied internet ag
                         ` (2 more replies)
  1 sibling, 3 replies; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-12 17:40 UTC (permalink / raw)
  To: Sven Köhler, Christoph Hellwig, Jeff Garzik
  Cc: linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain,
	Jon Chelton, Stefan Priebe - allied internet ag


gdth _exit would first remove all cards then stop the timer
and would not sync with the timer function. This caused a crash
in gdth_timer() when module was unloaded.

del_timer_sync the timer before we delete the cards.

NOT YET TESTED

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/gdth.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index 8eb78be..103280e 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data)
     gdth_ha_str *ha;
     ulong flags;
 
+    BUG_ON(list_empty(&gdth_instances));
+
     ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
     spin_lock_irqsave(&ha->smp_lock, flags);
 
@@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha)
 		ha->sdev = NULL;
 	}
 
-	gdth_flush(ha);
-
 	if (shp->irq)
 		free_irq(shp->irq,ha);
 
@@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void)
 {
 	gdth_ha_str *ha;
 
-	list_for_each_entry(ha, &gdth_instances, list)
-		gdth_remove_one(ha);
+	unregister_chrdev(major,"gdth");
+	unregister_reboot_notifier(&gdth_notifier);
 
 #ifdef GDTH_STATISTICS
-	del_timer(&gdth_timer);
+	del_timer_sync(&gdth_timer);
 #endif
-	unregister_chrdev(major,"gdth");
-	unregister_reboot_notifier(&gdth_notifier);
+
+	list_for_each_entry(ha, &gdth_instances, list)
+		gdth_remove_one(ha);
 }
 
 module_init(gdth_init);
-- 
1.5.3.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 1/2] gdth: scan for scsi devices
  2008-02-12 17:35     ` [BUGFIX 1/2] gdth: scan for scsi devices Boaz Harrosh
@ 2008-02-12 18:05       ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2008-02-12 18:05 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Sven K?hler, linux-scsi, James Bottomley, Christoph Hellwig,
	Jeff Garzik, linux-scsi, James Bottomley, linux-kernel,
	Joerg Dorchain, Jon Chelton, Stefan Priebe - allied internet ag

On Tue, Feb 12, 2008 at 07:35:22PM +0200, Boaz Harrosh wrote:
> 
> The patch: "gdth: switch to modern scsi host registration"
> 
> missed one simple fact when moving a way from scsi_module.c.
> That is to call scsi_scan_host() on the probed host.
> With this the gdth driver from 2.6.24 is again able to
> see drives and boot.

Doh, someone please hand me a brown paper bag.  My first series
of patches had this but it got dropped when I rebased it over various
janitor cleanups.

The patch looks obviously correct, thanks.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-12 17:40     ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
@ 2008-02-13  7:06       ` Stefan Priebe - allied internet ag
  2008-02-13  9:03         ` Boaz Harrosh
  2008-02-13 10:48       ` Boaz Harrosh
  2008-02-13 15:44       ` James Bottomley
  2 siblings, 1 reply; 31+ messages in thread
From: Stefan Priebe - allied internet ag @ 2008-02-13  7:06 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton

Hello!

I've tested this patch now - and it works fine. Now rmmod, halt and 
reboot also works.

Stefan Priebe


Boaz Harrosh schrieb:
> gdth _exit would first remove all cards then stop the timer
> and would not sync with the timer function. This caused a crash
> in gdth_timer() when module was unloaded.
> 
> del_timer_sync the timer before we delete the cards.
> 
> NOT YET TESTED
> 
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
>  drivers/scsi/gdth.c |   15 ++++++++-------
>  1 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
> index 8eb78be..103280e 100644
> --- a/drivers/scsi/gdth.c
> +++ b/drivers/scsi/gdth.c
> @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data)
>      gdth_ha_str *ha;
>      ulong flags;
>  
> +    BUG_ON(list_empty(&gdth_instances));
> +
>      ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
>      spin_lock_irqsave(&ha->smp_lock, flags);
>  
> @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha)
>  		ha->sdev = NULL;
>  	}
>  
> -	gdth_flush(ha);
> -
>  	if (shp->irq)
>  		free_irq(shp->irq,ha);
>  
> @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void)
>  {
>  	gdth_ha_str *ha;
>  
> -	list_for_each_entry(ha, &gdth_instances, list)
> -		gdth_remove_one(ha);
> +	unregister_chrdev(major,"gdth");
> +	unregister_reboot_notifier(&gdth_notifier);
>  
>  #ifdef GDTH_STATISTICS
> -	del_timer(&gdth_timer);
> +	del_timer_sync(&gdth_timer);
>  #endif
> -	unregister_chrdev(major,"gdth");
> -	unregister_reboot_notifier(&gdth_notifier);
> +
> +	list_for_each_entry(ha, &gdth_instances, list)
> +		gdth_remove_one(ha);
>  }
>  
>  module_init(gdth_init);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13  7:06       ` Stefan Priebe - allied internet ag
@ 2008-02-13  9:03         ` Boaz Harrosh
  2008-02-13 19:38           ` Jan Engelhardt
  0 siblings, 1 reply; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-13  9:03 UTC (permalink / raw)
  To: Stefan Priebe - allied internet ag
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	James Bottomley, linux-kernel, Joerg Dorchain, Jon Chelton

On Wed, Feb 13 2008 at 9:06 +0200, Stefan Priebe - allied internet ag <s.priebe@allied-internet.ag> wrote:
> Hello!
> 
> I've tested this patch now - and it works fine. Now rmmod, halt and 
> reboot also works.
> 
> Stefan Priebe
> 
This is grate news Stefan. Thank you very much for all your time
and effort, with out we could not have fixed all this.

Boaz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-12 17:40     ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
  2008-02-13  7:06       ` Stefan Priebe - allied internet ag
@ 2008-02-13 10:48       ` Boaz Harrosh
  2008-02-13 15:44       ` James Bottomley
  2 siblings, 0 replies; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-13 10:48 UTC (permalink / raw)
  To: Sven Köhler, Christoph Hellwig, Jeff Garzik
  Cc: linux-scsi, James Bottomley, linux-kernel, Joerg Dorchain,
	Jon Chelton, Stefan Priebe - allied internet ag, Dave Milter

On Tue, Feb 12 2008 at 19:40 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> gdth _exit would first remove all cards then stop the timer
> and would not sync with the timer function. This caused a crash
> in gdth_timer() when module was unloaded.
> 
> del_timer_sync the timer before we delete the cards.
> 
> NOT YET TESTED
> 
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>

Tested-by: Stefan Priebe <s.priebe@allied-internet.ag>

> ---
>  drivers/scsi/gdth.c |   15 ++++++++-------
>  1 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
> index 8eb78be..103280e 100644
> --- a/drivers/scsi/gdth.c
> +++ b/drivers/scsi/gdth.c
> @@ -3793,6 +3793,8 @@ static void gdth_timeout(ulong data)
>      gdth_ha_str *ha;
>      ulong flags;
>  
> +    BUG_ON(list_empty(&gdth_instances));
> +
>      ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
>      spin_lock_irqsave(&ha->smp_lock, flags);
>  
> @@ -5146,8 +5148,6 @@ static void gdth_remove_one(gdth_ha_str *ha)
>  		ha->sdev = NULL;
>  	}
>  
> -	gdth_flush(ha);
> -
>  	if (shp->irq)
>  		free_irq(shp->irq,ha);
>  
> @@ -5245,14 +5245,15 @@ static void __exit gdth_exit(void)
>  {
>  	gdth_ha_str *ha;
>  
> -	list_for_each_entry(ha, &gdth_instances, list)
> -		gdth_remove_one(ha);
> +	unregister_chrdev(major,"gdth");
> +	unregister_reboot_notifier(&gdth_notifier);
>  
>  #ifdef GDTH_STATISTICS
> -	del_timer(&gdth_timer);
> +	del_timer_sync(&gdth_timer);
>  #endif
> -	unregister_chrdev(major,"gdth");
> -	unregister_reboot_notifier(&gdth_notifier);
> +
> +	list_for_each_entry(ha, &gdth_instances, list)
> +		gdth_remove_one(ha);
>  }
>  
>  module_init(gdth_init);

James please put this patch in rc-fixes also. It has now been tested
by few people, and it solves a reproducible problem in the unloading
of the driver.

It was not yet confirmed by Andrew's reporter with the:
	+    if (list_empty(&gdth_instances))
	+	return;

at gdth_timer() In -mm tree. In my patch I have converted the if() to a 
BUG_ON because now it should not happen. But I figure it is not worse then 
what there is now, which is nothing.

With your recommendation I will push both patches to the stable branches
People have emailed me requesting it.

Thanks
Boaz
 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-12 17:40     ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
  2008-02-13  7:06       ` Stefan Priebe - allied internet ag
  2008-02-13 10:48       ` Boaz Harrosh
@ 2008-02-13 15:44       ` James Bottomley
  2008-02-13 15:54         ` Boaz Harrosh
  2 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2008-02-13 15:44 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> -	gdth_flush(ha);
> -

This piece doesn't look right.  gdth_flush() forces the internal cache
to disk backing.  If you remove it, you're taking the chance that the
machine will be powered off without a writeback which can cause data
corruption.

James



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 15:44       ` James Bottomley
@ 2008-02-13 15:54         ` Boaz Harrosh
  2008-02-13 16:33           ` Boaz Harrosh
  0 siblings, 1 reply; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-13 15:54 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>> -	gdth_flush(ha);
>> -
> 
> This piece doesn't look right.  gdth_flush() forces the internal cache
> to disk backing.  If you remove it, you're taking the chance that the
> machine will be powered off without a writeback which can cause data
> corruption.
> 
> James
> 
Yes. 
I have more problems reported, with exit, and am just sending one more patch that puts
this back in. Which was tested.

So I will resend this one plus one new one.

Boaz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 15:54         ` Boaz Harrosh
@ 2008-02-13 16:33           ` Boaz Harrosh
  2008-02-13 16:35             ` [PATCH ver2] gdth: bugfix for the at-exit problems Boaz Harrosh
  2008-02-13 16:45             ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash James Bottomley
  0 siblings, 2 replies; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-13 16:33 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>>> -	gdth_flush(ha);
>>> -
>> This piece doesn't look right.  gdth_flush() forces the internal cache
>> to disk backing.  If you remove it, you're taking the chance that the
>> machine will be powered off without a writeback which can cause data
>> corruption.
>>
>> James
>>
> Yes. 
> I have more problems reported, with exit, and am just sending one more patch that puts
> this back in. Which was tested.
> 
> So I will resend this one plus one new one.
> 
> Boaz
> 

The gdth driver would do a register_reboot_notifier(&gdth_notifier);
to a gdth_halt() function, which would then redo half of what gdth_exit
does, and wrongly so, and crash.  

Are we guaranteed in todays kernel that modules .exit function be called
on an halt or reboot? If so then there is no need for duplications and
the gdth_halt() should go.

Submitted a patch that replaces the previous one I submitted with a deeper
fix. 
	[PATCH] gdth: bugfix for the at-exit problems

If you ask me this all gdth_flush() is a crackup. sd and scsi-ml are doing 
scsi FLUSH commands when ever is needed. The controller as no business caching
data in memory longer then what is stated in standard. Raid controller or no raid
controller. Virtual or not virtual device. Data on Plate means data on plate.
What if there is a power outage? what the driver can do then?

Boaz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH ver2] gdth: bugfix for the at-exit problems
  2008-02-13 16:33           ` Boaz Harrosh
@ 2008-02-13 16:35             ` Boaz Harrosh
  2008-02-13 16:45             ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash James Bottomley
  1 sibling, 0 replies; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-13 16:35 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag


gdth_exit would first remove all cards then stop the timer
and would not sync with the timer function. This caused a crash
in gdth_timer() when module was unloaded.
So del_timer_sync the timer before we delete the cards.

also a reboot notifier function was registered but is
not needed anymore. And would crash. So remove gdth_halt
and the reboot notifier registration.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/gdth.c |   64 +++++++-------------------------------------------
 1 files changed, 9 insertions(+), 55 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index 8eb78be..8469fe6 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep,
                       unsigned int cmd, unsigned long arg);
 
 static void gdth_flush(gdth_ha_str *ha);
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf);
 static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *));
 static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
 				struct gdth_cmndinfo *cmndinfo);
@@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd,
 #include "gdth_proc.h"
 #include "gdth_proc.c"
 
-/* notifier block to get a notify on system shutdown/halt/reboot */
-static struct notifier_block gdth_notifier = {
-    gdth_halt, NULL, 0
-};
-static int notifier_disabled = 0;
-
 static gdth_ha_str *gdth_find_ha(int hanum)
 {
 	gdth_ha_str *ha;
@@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data)
     gdth_ha_str *ha;
     ulong flags;
 
+    BUG_ON(list_empty(&gdth_instances));
+
     ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
     spin_lock_irqsave(&ha->smp_lock, flags);
 
@@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha)
     }
 }
 
-/* shutdown routine */
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
-{
-    gdth_ha_str *ha;
-#ifndef __alpha__
-    gdth_cmd_str    gdtcmd;
-    char            cmnd[MAX_COMMAND_SIZE];   
-#endif
-
-    if (notifier_disabled)
-        return NOTIFY_OK;
-
-    TRACE2(("gdth_halt() event %d\n",(int)event));
-    if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
-        return NOTIFY_DONE;
-
-    notifier_disabled = 1;
-    printk("GDT-HA: Flushing all host drives .. ");
-    list_for_each_entry(ha, &gdth_instances, list) {
-        gdth_flush(ha);
-
-#ifndef __alpha__
-        /* controller reset */
-        memset(cmnd, 0xff, MAX_COMMAND_SIZE);
-        gdtcmd.BoardNode = LOCALBOARD;
-        gdtcmd.Service = CACHESERVICE;
-        gdtcmd.OpCode = GDT_RESET;
-        TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum));
-        gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL);
-#endif
-    }
-    printk("Done.\n");
-
-#ifdef GDTH_STATISTICS
-    del_timer(&gdth_timer);
-#endif
-    return NOTIFY_OK;
-}
-
 /* configure lun */
 static int gdth_slave_configure(struct scsi_device *sdev)
 {
@@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha)
 
 	scsi_remove_host(shp);
 
+	gdth_flush(ha);
+
 	if (ha->sdev) {
 		scsi_free_host_dev(ha->sdev);
 		ha->sdev = NULL;
 	}
 
-	gdth_flush(ha);
-
 	if (shp->irq)
 		free_irq(shp->irq,ha);
 
@@ -5235,8 +5191,6 @@ static int __init gdth_init(void)
 	add_timer(&gdth_timer);
 #endif
 	major = register_chrdev(0,"gdth", &gdth_fops);
-	notifier_disabled = 0;
-	register_reboot_notifier(&gdth_notifier);
 	gdth_polling = FALSE;
 	return 0;
 }
@@ -5245,14 +5199,14 @@ static void __exit gdth_exit(void)
 {
 	gdth_ha_str *ha;
 
-	list_for_each_entry(ha, &gdth_instances, list)
-		gdth_remove_one(ha);
+	unregister_chrdev(major,"gdth");
 
 #ifdef GDTH_STATISTICS
-	del_timer(&gdth_timer);
+	del_timer_sync(&gdth_timer);
 #endif
-	unregister_chrdev(major,"gdth");
-	unregister_reboot_notifier(&gdth_notifier);
+
+	list_for_each_entry(ha, &gdth_instances, list)
+		gdth_remove_one(ha);
 }
 
 module_init(gdth_init);
-- 
1.5.3.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 16:33           ` Boaz Harrosh
  2008-02-13 16:35             ` [PATCH ver2] gdth: bugfix for the at-exit problems Boaz Harrosh
@ 2008-02-13 16:45             ` James Bottomley
  2008-02-13 16:50               ` Boaz Harrosh
  1 sibling, 1 reply; 31+ messages in thread
From: James Bottomley @ 2008-02-13 16:45 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag


On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> > On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> >>> -	gdth_flush(ha);
> >>> -
> >> This piece doesn't look right.  gdth_flush() forces the internal cache
> >> to disk backing.  If you remove it, you're taking the chance that the
> >> machine will be powered off without a writeback which can cause data
> >> corruption.
> >>
> >> James
> >>
> > Yes. 
> > I have more problems reported, with exit, and am just sending one more patch that puts
> > this back in. Which was tested.
> > 
> > So I will resend this one plus one new one.
> > 
> > Boaz
> > 
> 
> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
> to a gdth_halt() function, which would then redo half of what gdth_exit
> does, and wrongly so, and crash.  
> 
> Are we guaranteed in todays kernel that modules .exit function be called
> on an halt or reboot? If so then there is no need for duplications and
> the gdth_halt() should go.

No.  The __exit section is actually discardable if you promise never to
remove the module.

James



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 16:45             ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash James Bottomley
@ 2008-02-13 16:50               ` Boaz Harrosh
  2008-02-13 17:03                 ` James Bottomley
  0 siblings, 1 reply; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-13 16:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>>>>> -	gdth_flush(ha);
>>>>> -
>>>> This piece doesn't look right.  gdth_flush() forces the internal cache
>>>> to disk backing.  If you remove it, you're taking the chance that the
>>>> machine will be powered off without a writeback which can cause data
>>>> corruption.
>>>>
>>>> James
>>>>
>>> Yes. 
>>> I have more problems reported, with exit, and am just sending one more patch that puts
>>> this back in. Which was tested.
>>>
>>> So I will resend this one plus one new one.
>>>
>>> Boaz
>>>
>> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
>> to a gdth_halt() function, which would then redo half of what gdth_exit
>> does, and wrongly so, and crash.  
>>
>> Are we guaranteed in todays kernel that modules .exit function be called
>> on an halt or reboot? If so then there is no need for duplications and
>> the gdth_halt() should go.
> 
> No.  The __exit section is actually discardable if you promise never to
> remove the module.
> 
I don't understand please explain. 
What does a driver need to do if it needs a consistent shutdown retine?
module or built in? unload or shutdown?


> James
> 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 16:50               ` Boaz Harrosh
@ 2008-02-13 17:03                 ` James Bottomley
  2008-02-13 17:12                   ` Boaz Harrosh
                                     ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: James Bottomley @ 2008-02-13 17:03 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
> >> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> >>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> >>>>> -	gdth_flush(ha);
> >>>>> -
> >>>> This piece doesn't look right.  gdth_flush() forces the internal cache
> >>>> to disk backing.  If you remove it, you're taking the chance that the
> >>>> machine will be powered off without a writeback which can cause data
> >>>> corruption.
> >>>>
> >>>> James
> >>>>
> >>> Yes. 
> >>> I have more problems reported, with exit, and am just sending one more patch that puts
> >>> this back in. Which was tested.
> >>>
> >>> So I will resend this one plus one new one.
> >>>
> >>> Boaz
> >>>
> >> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
> >> to a gdth_halt() function, which would then redo half of what gdth_exit
> >> does, and wrongly so, and crash.  
> >>
> >> Are we guaranteed in todays kernel that modules .exit function be called
> >> on an halt or reboot? If so then there is no need for duplications and
> >> the gdth_halt() should go.
> > 
> > No.  The __exit section is actually discardable if you promise never to
> > remove the module.
> > 
> I don't understand please explain. 
> What does a driver need to do if it needs a consistent shutdown retine?
> module or built in? unload or shutdown?

It needs to register a reboot notifier, which gdth does.

However, the notifier is only called on reboot, so it also needs to
clean up correctly on module exit as well.

The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
command.  That's done by a shutdown notifier from sd, so the correct
thing would always get done; however it does mean the driver has to be
in a condition to process the last sync cache command.

For the quick fix, just keep the current infrastructure and put back the
gdth_flush() command where it can be effective.

James



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 17:03                 ` James Bottomley
@ 2008-02-13 17:12                   ` Boaz Harrosh
  2008-02-13 17:36                     ` James Bottomley
  2008-02-13 17:18                   ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
  2008-02-14  6:51                   ` Christoph Hellwig
  2 siblings, 1 reply; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-13 17:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
>> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
>>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
>>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>>>>>>> -	gdth_flush(ha);
>>>>>>> -
>>>>>> This piece doesn't look right.  gdth_flush() forces the internal cache
>>>>>> to disk backing.  If you remove it, you're taking the chance that the
>>>>>> machine will be powered off without a writeback which can cause data
>>>>>> corruption.
>>>>>>
>>>>>> James
>>>>>>
>>>>> Yes. 
>>>>> I have more problems reported, with exit, and am just sending one more patch that puts
>>>>> this back in. Which was tested.
>>>>>
>>>>> So I will resend this one plus one new one.
>>>>>
>>>>> Boaz
>>>>>
>>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
>>>> to a gdth_halt() function, which would then redo half of what gdth_exit
>>>> does, and wrongly so, and crash.  
>>>>
>>>> Are we guaranteed in todays kernel that modules .exit function be called
>>>> on an halt or reboot? If so then there is no need for duplications and
>>>> the gdth_halt() should go.
>>> No.  The __exit section is actually discardable if you promise never to
>>> remove the module.
>>>
>> I don't understand please explain. 
>> What does a driver need to do if it needs a consistent shutdown retine?
>> module or built in? unload or shutdown?
> 
> It needs to register a reboot notifier, which gdth does.
> 
> However, the notifier is only called on reboot, so it also needs to
> clean up correctly on module exit as well.
> 
> The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
> command.  That's done by a shutdown notifier from sd, so the correct
> thing would always get done; however it does mean the driver has to be
> in a condition to process the last sync cache command.
> 
> For the quick fix, just keep the current infrastructure and put back the
> gdth_flush() command where it can be effective.
> 
> James
> 
> 
Totally untested.

---
From: Boaz Harrosh <bharrosh@panasas.com>
Subject: [PATCH] gdth: bugfix for the at-exit problems

gdth_exit would first remove all cards then stop the timer
and would not sync with the timer function. This caused a crash
in gdth_timer() when module was unloaded.
So del_timer_sync the timer before we delete the cards.

also the reboot notifier function would crash. So unify
the exit and halt functions with a gdth_shutdown() that's
called by both.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/gdth.c |   99 ++++++++++++++++++++------------------------------
 1 files changed, 40 insertions(+), 59 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index 8eb78be..7bb9b45 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep,
                       unsigned int cmd, unsigned long arg);
 
 static void gdth_flush(gdth_ha_str *ha);
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf);
 static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *));
 static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
 				struct gdth_cmndinfo *cmndinfo);
@@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd,
 #include "gdth_proc.h"
 #include "gdth_proc.c"
 
-/* notifier block to get a notify on system shutdown/halt/reboot */
-static struct notifier_block gdth_notifier = {
-    gdth_halt, NULL, 0
-};
-static int notifier_disabled = 0;
-
 static gdth_ha_str *gdth_find_ha(int hanum)
 {
 	gdth_ha_str *ha;
@@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data)
     gdth_ha_str *ha;
     ulong flags;
 
+    BUG_ON(list_empty(&gdth_instances));
+
     ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
     spin_lock_irqsave(&ha->smp_lock, flags);
 
@@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha)
     }
 }
 
-/* shutdown routine */
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
-{
-    gdth_ha_str *ha;
-#ifndef __alpha__
-    gdth_cmd_str    gdtcmd;
-    char            cmnd[MAX_COMMAND_SIZE];   
-#endif
-
-    if (notifier_disabled)
-        return NOTIFY_OK;
-
-    TRACE2(("gdth_halt() event %d\n",(int)event));
-    if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
-        return NOTIFY_DONE;
-
-    notifier_disabled = 1;
-    printk("GDT-HA: Flushing all host drives .. ");
-    list_for_each_entry(ha, &gdth_instances, list) {
-        gdth_flush(ha);
-
-#ifndef __alpha__
-        /* controller reset */
-        memset(cmnd, 0xff, MAX_COMMAND_SIZE);
-        gdtcmd.BoardNode = LOCALBOARD;
-        gdtcmd.Service = CACHESERVICE;
-        gdtcmd.OpCode = GDT_RESET;
-        TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum));
-        gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL);
-#endif
-    }
-    printk("Done.\n");
-
-#ifdef GDTH_STATISTICS
-    del_timer(&gdth_timer);
-#endif
-    return NOTIFY_OK;
-}
-
 /* configure lun */
 static int gdth_slave_configure(struct scsi_device *sdev)
 {
@@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha)
 
 	scsi_remove_host(shp);
 
+	gdth_flush(ha);
+
 	if (ha->sdev) {
 		scsi_free_host_dev(ha->sdev);
 		ha->sdev = NULL;
 	}
 
-	gdth_flush(ha);
-
 	if (shp->irq)
 		free_irq(shp->irq,ha);
 
@@ -5173,6 +5129,40 @@ static void gdth_remove_one(gdth_ha_str *ha)
 	scsi_host_put(shp);
 }
 
+static void gdth_shutdown(void);
+static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
+{
+	TRACE2(("gdth_halt() event %d\n",(int)event));
+	if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
+		return NOTIFY_DONE;
+
+	gdth_shutdown();
+	return NOTIFY_OK;
+}
+
+static struct notifier_block gdth_notifier = {
+    gdth_halt, NULL, 0
+};
+
+bool gdth_shutdown_done;
+static void gdth_shutdown()
+{
+	gdth_ha_str *ha;
+	if (gdth_shutdown_done)
+		return;
+
+	gdth_shutdown_done = true;
+	unregister_chrdev(major,"gdth");
+	unregister_reboot_notifier(&gdth_notifier);
+
+#ifdef GDTH_STATISTICS
+	del_timer_sync(&gdth_timer);
+#endif
+
+	list_for_each_entry(ha, &gdth_instances, list)
+		gdth_remove_one(ha);
+}
+
 static int __init gdth_init(void)
 {
 	if (disable) {
@@ -5185,6 +5175,7 @@ static int __init gdth_init(void)
 	       GDTH_VERSION_STR);
 
 	/* initializations */
+	gdth_shutdown_done = false;
 	gdth_polling = TRUE;
 	gdth_clear_events();
 
@@ -5235,7 +5226,6 @@ static int __init gdth_init(void)
 	add_timer(&gdth_timer);
 #endif
 	major = register_chrdev(0,"gdth", &gdth_fops);
-	notifier_disabled = 0;
 	register_reboot_notifier(&gdth_notifier);
 	gdth_polling = FALSE;
 	return 0;
@@ -5243,16 +5233,7 @@ static int __init gdth_init(void)
 
 static void __exit gdth_exit(void)
 {
-	gdth_ha_str *ha;
-
-	list_for_each_entry(ha, &gdth_instances, list)
-		gdth_remove_one(ha);
-
-#ifdef GDTH_STATISTICS
-	del_timer(&gdth_timer);
-#endif
-	unregister_chrdev(major,"gdth");
-	unregister_reboot_notifier(&gdth_notifier);
+	gdth_shutdown();
 }
 
 module_init(gdth_init);
-- 
1.5.3.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 17:03                 ` James Bottomley
  2008-02-13 17:12                   ` Boaz Harrosh
@ 2008-02-13 17:18                   ` Boaz Harrosh
  2008-02-13 17:33                     ` James Bottomley
  2008-02-14  6:51                   ` Christoph Hellwig
  2 siblings, 1 reply; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-13 17:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
>> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
>>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
>>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
>>>>>>> -	gdth_flush(ha);
>>>>>>> -
>>>>>> This piece doesn't look right.  gdth_flush() forces the internal cache
>>>>>> to disk backing.  If you remove it, you're taking the chance that the
>>>>>> machine will be powered off without a writeback which can cause data
>>>>>> corruption.
>>>>>>
>>>>>> James
>>>>>>
>>>>> Yes. 
>>>>> I have more problems reported, with exit, and am just sending one more patch that puts
>>>>> this back in. Which was tested.
>>>>>
>>>>> So I will resend this one plus one new one.
>>>>>
>>>>> Boaz
>>>>>
>>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
>>>> to a gdth_halt() function, which would then redo half of what gdth_exit
>>>> does, and wrongly so, and crash.  
>>>>
>>>> Are we guaranteed in todays kernel that modules .exit function be called
>>>> on an halt or reboot? If so then there is no need for duplications and
>>>> the gdth_halt() should go.
>>> No.  The __exit section is actually discardable if you promise never to
>>> remove the module.
>>>
>> I don't understand please explain. 
>> What does a driver need to do if it needs a consistent shutdown retine?
>> module or built in? unload or shutdown?
> 
> It needs to register a reboot notifier, which gdth does.
> 
> However, the notifier is only called on reboot, so it also needs to
> clean up correctly on module exit as well.
> 
> The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE

Why would we think that the controller does not support this command
is it not in the mandatory section of the standard?

> command.  That's done by a shutdown notifier from sd, so the correct
> thing would always get done; however it does mean the driver has to be
> in a condition to process the last sync cache command.

Why would it not be ready? what do other drivers do?
The drivers is ready until the very last module's .exit. Is that good
enough?

> 
> For the quick fix, just keep the current infrastructure and put back the
> gdth_flush() command where it can be effective.
> 

Just did. But if needed I would prefer to emulate the SCSI SYNCHRONIZE CACHE
command and not that boot notifier thing. Please advise.

> James
> 
> 

Boaz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 17:18                   ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
@ 2008-02-13 17:33                     ` James Bottomley
  0 siblings, 0 replies; 31+ messages in thread
From: James Bottomley @ 2008-02-13 17:33 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, 2008-02-13 at 19:18 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > It needs to register a reboot notifier, which gdth does.
> > 
> > However, the notifier is only called on reboot, so it also needs to
> > clean up correctly on module exit as well.
> > 
> > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
> 
> Why would we think that the controller does not support this command
> is it not in the mandatory section of the standard?

Um, because the controller isn't a SCSI device.  It's an emulated device
which means the SCSI comands are processed in the driver.  It does look
like the driver<->HBA communication is some sort of translated dialect
of SCSI.

> > command.  That's done by a shutdown notifier from sd, so the correct
> > thing would always get done; however it does mean the driver has to be
> > in a condition to process the last sync cache command.
> 
> Why would it not be ready? what do other drivers do?
> The drivers is ready until the very last module's .exit. Is that good
> enough?

shutdown is called as part of device removal and module unload ...
usually from scsi_remove_host().  So you can't tear down command
processing before that point.

> > 
> > For the quick fix, just keep the current infrastructure and put back the
> > gdth_flush() command where it can be effective.
> > 
> 
> Just did. But if needed I would prefer to emulate the SCSI SYNCHRONIZE CACHE
> command and not that boot notifier thing. Please advise.

I think such a change, though desirable, would be too large to count as
a bug fix.

James



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 17:12                   ` Boaz Harrosh
@ 2008-02-13 17:36                     ` James Bottomley
  2008-02-14 10:49                       ` Boaz Harrosh
  0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2008-02-13 17:36 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, 2008-02-13 at 19:12 +0200, Boaz Harrosh wrote:
> On Wed, Feb 13 2008 at 19:03 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > On Wed, 2008-02-13 at 18:50 +0200, Boaz Harrosh wrote:
> >> On Wed, Feb 13 2008 at 18:45 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>> On Wed, 2008-02-13 at 18:33 +0200, Boaz Harrosh wrote:
> >>>> On Wed, Feb 13 2008 at 17:54 +0200, Boaz Harrosh <bharrosh@panasas.com> wrote:
> >>>>> On Wed, Feb 13 2008 at 17:44 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >>>>>> On Tue, 2008-02-12 at 19:40 +0200, Boaz Harrosh wrote:
> >>>>>>> -	gdth_flush(ha);
> >>>>>>> -
> >>>>>> This piece doesn't look right.  gdth_flush() forces the internal cache
> >>>>>> to disk backing.  If you remove it, you're taking the chance that the
> >>>>>> machine will be powered off without a writeback which can cause data
> >>>>>> corruption.
> >>>>>>
> >>>>>> James
> >>>>>>
> >>>>> Yes. 
> >>>>> I have more problems reported, with exit, and am just sending one more patch that puts
> >>>>> this back in. Which was tested.
> >>>>>
> >>>>> So I will resend this one plus one new one.
> >>>>>
> >>>>> Boaz
> >>>>>
> >>>> The gdth driver would do a register_reboot_notifier(&gdth_notifier);
> >>>> to a gdth_halt() function, which would then redo half of what gdth_exit
> >>>> does, and wrongly so, and crash.  
> >>>>
> >>>> Are we guaranteed in todays kernel that modules .exit function be called
> >>>> on an halt or reboot? If so then there is no need for duplications and
> >>>> the gdth_halt() should go.
> >>> No.  The __exit section is actually discardable if you promise never to
> >>> remove the module.
> >>>
> >> I don't understand please explain. 
> >> What does a driver need to do if it needs a consistent shutdown retine?
> >> module or built in? unload or shutdown?
> > 
> > It needs to register a reboot notifier, which gdth does.
> > 
> > However, the notifier is only called on reboot, so it also needs to
> > clean up correctly on module exit as well.
> > 
> > The alternative for GDTH would be to process the SCSI SYNCHRONIZE CACHE
> > command.  That's done by a shutdown notifier from sd, so the correct
> > thing would always get done; however it does mean the driver has to be
> > in a condition to process the last sync cache command.
> > 
> > For the quick fix, just keep the current infrastructure and put back the
> > gdth_flush() command where it can be effective.
> > 
> > James
> > 
> > 
> Totally untested.
> 
> ---
> From: Boaz Harrosh <bharrosh@panasas.com>
> Subject: [PATCH] gdth: bugfix for the at-exit problems
> 
> gdth_exit would first remove all cards then stop the timer
> and would not sync with the timer function. This caused a crash
> in gdth_timer() when module was unloaded.
> So del_timer_sync the timer before we delete the cards.
> 
> also the reboot notifier function would crash. So unify
> the exit and halt functions with a gdth_shutdown() that's
> called by both.
> 
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
>  drivers/scsi/gdth.c |   99 ++++++++++++++++++++------------------------------
>  1 files changed, 40 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
> index 8eb78be..7bb9b45 100644
> --- a/drivers/scsi/gdth.c
> +++ b/drivers/scsi/gdth.c
> @@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep,
>                        unsigned int cmd, unsigned long arg);
>  
>  static void gdth_flush(gdth_ha_str *ha);
> -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf);
>  static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *));
>  static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
>  				struct gdth_cmndinfo *cmndinfo);
> @@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd,
>  #include "gdth_proc.h"
>  #include "gdth_proc.c"
>  
> -/* notifier block to get a notify on system shutdown/halt/reboot */
> -static struct notifier_block gdth_notifier = {
> -    gdth_halt, NULL, 0
> -};
> -static int notifier_disabled = 0;
> -
>  static gdth_ha_str *gdth_find_ha(int hanum)
>  {
>  	gdth_ha_str *ha;
> @@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data)
>      gdth_ha_str *ha;
>      ulong flags;
>  
> +    BUG_ON(list_empty(&gdth_instances));
> +
>      ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
>      spin_lock_irqsave(&ha->smp_lock, flags);
>  
> @@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha)
>      }
>  }
>  
> -/* shutdown routine */
> -static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
> -{
> -    gdth_ha_str *ha;
> -#ifndef __alpha__
> -    gdth_cmd_str    gdtcmd;
> -    char            cmnd[MAX_COMMAND_SIZE];   
> -#endif
> -
> -    if (notifier_disabled)
> -        return NOTIFY_OK;
> -
> -    TRACE2(("gdth_halt() event %d\n",(int)event));
> -    if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
> -        return NOTIFY_DONE;
> -
> -    notifier_disabled = 1;
> -    printk("GDT-HA: Flushing all host drives .. ");
> -    list_for_each_entry(ha, &gdth_instances, list) {
> -        gdth_flush(ha);
> -
> -#ifndef __alpha__
> -        /* controller reset */
> -        memset(cmnd, 0xff, MAX_COMMAND_SIZE);
> -        gdtcmd.BoardNode = LOCALBOARD;
> -        gdtcmd.Service = CACHESERVICE;
> -        gdtcmd.OpCode = GDT_RESET;
> -        TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum));
> -        gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL);
> -#endif
> -    }
> -    printk("Done.\n");
> -
> -#ifdef GDTH_STATISTICS
> -    del_timer(&gdth_timer);
> -#endif
> -    return NOTIFY_OK;
> -}
> -
>  /* configure lun */
>  static int gdth_slave_configure(struct scsi_device *sdev)
>  {
> @@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha)
>  
>  	scsi_remove_host(shp);
>  
> +	gdth_flush(ha);
> +
>  	if (ha->sdev) {
>  		scsi_free_host_dev(ha->sdev);
>  		ha->sdev = NULL;
>  	}
>  
> -	gdth_flush(ha);
> -
>  	if (shp->irq)
>  		free_irq(shp->irq,ha);
>  
> @@ -5173,6 +5129,40 @@ static void gdth_remove_one(gdth_ha_str *ha)
>  	scsi_host_put(shp);
>  }
>  
> +static void gdth_shutdown(void);
> +static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
> +{
> +	TRACE2(("gdth_halt() event %d\n",(int)event));
> +	if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
> +		return NOTIFY_DONE;
> +
> +	gdth_shutdown();
> +	return NOTIFY_OK;
> +}
> +
> +static struct notifier_block gdth_notifier = {
> +    gdth_halt, NULL, 0
> +};
> +
> +bool gdth_shutdown_done;

Static police alert!  Just make it static and move it into
gdth_shutdown()

> +static void gdth_shutdown()
> +{
> +	gdth_ha_str *ha;
> +	if (gdth_shutdown_done)
> +		return;
> +
> +	gdth_shutdown_done = true;
> +	unregister_chrdev(major,"gdth");
> +	unregister_reboot_notifier(&gdth_notifier);

I'm not sure you can do this, aren't reboot notifiers called with the
rwsem held?  In which case the unregister which also takes the rwsem
will hang the system.

James



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13  9:03         ` Boaz Harrosh
@ 2008-02-13 19:38           ` Jan Engelhardt
  2008-02-14 15:58             ` Boaz Harrosh
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Engelhardt @ 2008-02-13 19:38 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Stefan Priebe - allied internet ag, Sven Köhler,
	Christoph Hellwig, Jeff Garzik, linux-scsi, James Bottomley,
	linux-kernel, Joerg Dorchain, Jon Chelton


On Feb 13 2008 11:03, Boaz Harrosh wrote:
>> 
>> I've tested this patch now - and it works fine. Now rmmod, halt and 
>> reboot also works.
>> 
>> Stefan Priebe
>> 
>This is grate news Stefan. Thank you very much for all your time
>and effort, with out we could not have fixed all this.

Do you have a git tree with the latest pieces?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 17:03                 ` James Bottomley
  2008-02-13 17:12                   ` Boaz Harrosh
  2008-02-13 17:18                   ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
@ 2008-02-14  6:51                   ` Christoph Hellwig
  2 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2008-02-14  6:51 UTC (permalink / raw)
  To: James Bottomley
  Cc: Boaz Harrosh, Sven K?hler, Christoph Hellwig, Jeff Garzik,
	linux-scsi, linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, Feb 13, 2008 at 11:03:45AM -0600, James Bottomley wrote:
> > I don't understand please explain. 
> > What does a driver need to do if it needs a consistent shutdown retine?
> > module or built in? unload or shutdown?
> 
> It needs to register a reboot notifier, which gdth does.

Well, for crappy legacy driver that's the way, but it's not really
recommended.  As soon as a driver uses the proper driver models,
e.g. gdth for pci using Jeff's pci hotplug patches it can just
implement the ->shutdown method that is called before shutdown/kexec
and can do the right thing.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 17:36                     ` James Bottomley
@ 2008-02-14 10:49                       ` Boaz Harrosh
  2008-02-14 11:58                         ` [PATCH] gdth: bugfix for the at-exit problems Boaz Harrosh
  0 siblings, 1 reply; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-14 10:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Wed, Feb 13 2008 at 19:36 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
<snip>
>> ---
>> From: Boaz Harrosh <bharrosh@panasas.com>
>> Subject: [PATCH] gdth: bugfix for the at-exit problems
>>
>> gdth_exit would first remove all cards then stop the timer
>> and would not sync with the timer function. This caused a crash
>> in gdth_timer() when module was unloaded.
>> So del_timer_sync the timer before we delete the cards.
>>
>> also the reboot notifier function would crash. So unify
>> the exit and halt functions with a gdth_shutdown() that's
>> called by both.
>>
>> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
>> ---
<snip>
>> +static struct notifier_block gdth_notifier = {
>> +    gdth_halt, NULL, 0
>> +};
>> +
>> +bool gdth_shutdown_done;
> 

right forgot the static. But I use it in gdth_init(), so it
must be external. Unless you promise me that gdth_init() will
never ever be called after a call to shutdown.
Any way the hot-plug patch changes all that. This is only
for 2.6.24 bugfixs.

> Static police alert!  Just make it static and move it into
> gdth_shutdown()
> 
>> +static void gdth_shutdown()
>> +{
>> +	gdth_ha_str *ha;
>> +	if (gdth_shutdown_done)
>> +		return;
>> +
>> +	gdth_shutdown_done = true;
>> +	unregister_chrdev(major,"gdth");
>> +	unregister_reboot_notifier(&gdth_notifier);
> 
> I'm not sure you can do this, aren't reboot notifiers called with the
> rwsem held?  In which case the unregister which also takes the rwsem
> will hang the system.
> 
humm, can't remove a notifier from within the notifier. Thanks James for 
the catch, it's what happens when you don't test your own patches.

I have moved unregister_reboot_notifier to gdth_exit.
> James
> 

Will send a new version for review. Please note that this is a bugfix patch
on top of 2.6.24. It is not needed for Jeff's hot-plug path.

There will be one more bugfix patch for a crash at the user-mode ioctl code.

Boaz



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] gdth: bugfix for the at-exit problems
  2008-02-14 10:49                       ` Boaz Harrosh
@ 2008-02-14 11:58                         ` Boaz Harrosh
  2008-02-14 16:10                           ` James Bottomley
  0 siblings, 1 reply; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-14 11:58 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag


This is a bugfix for the 2.6.24.x stable releases.

gdth_exit would first remove all cards then stop the timer
and would not sync with the timer function. This caused a crash
in gdth_timer() when module was unloaded.
So del_timer_sync the timer before we delete the cards.

also the reboot notifier function would crash. So unify
the exit and halt functions with a gdth_shutdown() that's
called by both.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 drivers/scsi/gdth.c |   90 ++++++++++++++++++++------------------------------
 1 files changed, 36 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index 8eb78be..3828b23 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -183,7 +183,6 @@ static int gdth_ioctl(struct inode *inode, struct file *filep,
                       unsigned int cmd, unsigned long arg);
 
 static void gdth_flush(gdth_ha_str *ha);
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf);
 static int gdth_queuecommand(Scsi_Cmnd *scp,void (*done)(Scsi_Cmnd *));
 static int __gdth_queuecommand(gdth_ha_str *ha, struct scsi_cmnd *scp,
 				struct gdth_cmndinfo *cmndinfo);
@@ -418,12 +417,6 @@ static inline void gdth_set_sglist(struct scsi_cmnd *cmd,
 #include "gdth_proc.h"
 #include "gdth_proc.c"
 
-/* notifier block to get a notify on system shutdown/halt/reboot */
-static struct notifier_block gdth_notifier = {
-    gdth_halt, NULL, 0
-};
-static int notifier_disabled = 0;
-
 static gdth_ha_str *gdth_find_ha(int hanum)
 {
 	gdth_ha_str *ha;
@@ -3793,6 +3786,8 @@ static void gdth_timeout(ulong data)
     gdth_ha_str *ha;
     ulong flags;
 
+    BUG_ON(list_empty(&gdth_instances));
+
     ha = list_first_entry(&gdth_instances, gdth_ha_str, list);
     spin_lock_irqsave(&ha->smp_lock, flags);
 
@@ -4668,45 +4663,6 @@ static void gdth_flush(gdth_ha_str *ha)
     }
 }
 
-/* shutdown routine */
-static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
-{
-    gdth_ha_str *ha;
-#ifndef __alpha__
-    gdth_cmd_str    gdtcmd;
-    char            cmnd[MAX_COMMAND_SIZE];   
-#endif
-
-    if (notifier_disabled)
-        return NOTIFY_OK;
-
-    TRACE2(("gdth_halt() event %d\n",(int)event));
-    if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
-        return NOTIFY_DONE;
-
-    notifier_disabled = 1;
-    printk("GDT-HA: Flushing all host drives .. ");
-    list_for_each_entry(ha, &gdth_instances, list) {
-        gdth_flush(ha);
-
-#ifndef __alpha__
-        /* controller reset */
-        memset(cmnd, 0xff, MAX_COMMAND_SIZE);
-        gdtcmd.BoardNode = LOCALBOARD;
-        gdtcmd.Service = CACHESERVICE;
-        gdtcmd.OpCode = GDT_RESET;
-        TRACE2(("gdth_halt(): reset controller %d\n", ha->hanum));
-        gdth_execute(ha->shost, &gdtcmd, cmnd, 10, NULL);
-#endif
-    }
-    printk("Done.\n");
-
-#ifdef GDTH_STATISTICS
-    del_timer(&gdth_timer);
-#endif
-    return NOTIFY_OK;
-}
-
 /* configure lun */
 static int gdth_slave_configure(struct scsi_device *sdev)
 {
@@ -5141,13 +5097,13 @@ static void gdth_remove_one(gdth_ha_str *ha)
 
 	scsi_remove_host(shp);
 
+	gdth_flush(ha);
+
 	if (ha->sdev) {
 		scsi_free_host_dev(ha->sdev);
 		ha->sdev = NULL;
 	}
 
-	gdth_flush(ha);
-
 	if (shp->irq)
 		free_irq(shp->irq,ha);
 
@@ -5173,6 +5129,22 @@ static void gdth_remove_one(gdth_ha_str *ha)
 	scsi_host_put(shp);
 }
 
+static void gdth_shutdown(void);
+static int gdth_halt(struct notifier_block *nb, ulong event, void *buf)
+{
+	TRACE2(("gdth_halt() event %d\n", (int)event));
+	if (event != SYS_RESTART && event != SYS_HALT && event != SYS_POWER_OFF)
+		return NOTIFY_DONE;
+
+	gdth_shutdown();
+	return NOTIFY_OK;
+}
+
+static struct notifier_block gdth_notifier = {
+    gdth_halt, NULL, 0
+};
+static bool gdth_shutdown_done;
+
 static int __init gdth_init(void)
 {
 	if (disable) {
@@ -5185,6 +5157,7 @@ static int __init gdth_init(void)
 	       GDTH_VERSION_STR);
 
 	/* initializations */
+	gdth_shutdown_done = false;
 	gdth_polling = TRUE;
 	gdth_clear_events();
 
@@ -5235,23 +5208,32 @@ static int __init gdth_init(void)
 	add_timer(&gdth_timer);
 #endif
 	major = register_chrdev(0,"gdth", &gdth_fops);
-	notifier_disabled = 0;
 	register_reboot_notifier(&gdth_notifier);
 	gdth_polling = FALSE;
 	return 0;
 }
 
-static void __exit gdth_exit(void)
+static void gdth_shutdown()
 {
 	gdth_ha_str *ha;
 
-	list_for_each_entry(ha, &gdth_instances, list)
-		gdth_remove_one(ha);
+	if (gdth_shutdown_done)
+		return;
+
+	gdth_shutdown_done = true;
+	unregister_chrdev(major, "gdth");
 
 #ifdef GDTH_STATISTICS
-	del_timer(&gdth_timer);
+	del_timer_sync(&gdth_timer);
 #endif
-	unregister_chrdev(major,"gdth");
+
+	list_for_each_entry(ha, &gdth_instances, list)
+		gdth_remove_one(ha);
+}
+
+static void __exit gdth_exit(void)
+{
+	gdth_shutdown();
 	unregister_reboot_notifier(&gdth_notifier);
 }
 
-- 
1.5.3.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash
  2008-02-13 19:38           ` Jan Engelhardt
@ 2008-02-14 15:58             ` Boaz Harrosh
  0 siblings, 0 replies; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-14 15:58 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Stefan Priebe - allied internet ag, Sven Köhler,
	Christoph Hellwig, Jeff Garzik, linux-scsi, James Bottomley,
	linux-kernel, Joerg Dorchain, Jon Chelton

On Wed, Feb 13 2008 at 21:38 +0200, Jan Engelhardt <jengelh@computergmbh.de> wrote:
> On Feb 13 2008 11:03, Boaz Harrosh wrote:
>>> I've tested this patch now - and it works fine. Now rmmod, halt and 
>>> reboot also works.
>>>
>>> Stefan Priebe
>>>
>> This is grate news Stefan. Thank you very much for all your time
>> and effort, with out we could not have fixed all this.
> 
> Do you have a git tree with the latest pieces?
No, scsi-misc I guess ;)

I could put it here:
git://git.bhalevy.com/open-osd gdth

branch give me an hours

Boaz

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gdth: bugfix for the at-exit problems
  2008-02-14 11:58                         ` [PATCH] gdth: bugfix for the at-exit problems Boaz Harrosh
@ 2008-02-14 16:10                           ` James Bottomley
  2008-02-14 16:18                             ` Boaz Harrosh
  0 siblings, 1 reply; 31+ messages in thread
From: James Bottomley @ 2008-02-14 16:10 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Thu, 2008-02-14 at 13:58 +0200, Boaz Harrosh wrote:
> This is a bugfix for the 2.6.24.x stable releases.
> 
> gdth_exit would first remove all cards then stop the timer
> and would not sync with the timer function. This caused a crash
> in gdth_timer() when module was unloaded.
> So del_timer_sync the timer before we delete the cards.
> 
> also the reboot notifier function would crash. So unify
> the exit and halt functions with a gdth_shutdown() that's
> called by both.

The patch looks fine now, thanks.  Can we actually get a tester just to
make sure there's nothing I missed.

James



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] gdth: bugfix for the at-exit problems
  2008-02-14 16:10                           ` James Bottomley
@ 2008-02-14 16:18                             ` Boaz Harrosh
  0 siblings, 0 replies; 31+ messages in thread
From: Boaz Harrosh @ 2008-02-14 16:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sven Köhler, Christoph Hellwig, Jeff Garzik, linux-scsi,
	linux-kernel, Joerg Dorchain, Jon Chelton,
	Stefan Priebe - allied internet ag

On Thu, Feb 14 2008 at 18:10 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Thu, 2008-02-14 at 13:58 +0200, Boaz Harrosh wrote:
>> This is a bugfix for the 2.6.24.x stable releases.
>>
>> gdth_exit would first remove all cards then stop the timer
>> and would not sync with the timer function. This caused a crash
>> in gdth_timer() when module was unloaded.
>> So del_timer_sync the timer before we delete the cards.
>>
>> also the reboot notifier function would crash. So unify
>> the exit and halt functions with a gdth_shutdown() that's
>> called by both.
> 
> The patch looks fine now, thanks.  Can we actually get a tester just to
> make sure there's nothing I missed.
> 
> James
> 
> 

Yes, and the tester reported, a breakage. We are on it.
Apparently, you cannot do a full deallocation of resources
at reboot notifier, nor would you want to I guess.

But you can do the flush. The exit call is never called
on a reboot and the card access is valid to the end.

Please comment?

So I pretty much reverted that patch, but did leave some
cleanups.

Also we found the other problems reported with user-mode tools
and cat /proc/sys/gdth/0

so 2 patches on the way above reverted. Give us a few ours to test
every thing.

Thanks
Boaz


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2008-02-14 16:20 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-30 19:47 [BUG?] GDTH driver not working after upgrade to 2.6.24 Sven Köhler
2008-01-31 10:08 ` Boaz Harrosh
2008-01-31 11:07   ` Sven Köhler
2008-01-31 12:35     ` Boaz Harrosh
2008-01-31 16:39       ` Jan Engelhardt
2008-01-31 16:52         ` Boaz Harrosh
2008-02-12 17:30   ` [BUGFIXES 0/2] gdth: fix 2.6.24 driver breakage Boaz Harrosh
2008-02-12 17:35     ` [BUGFIX 1/2] gdth: scan for scsi devices Boaz Harrosh
2008-02-12 18:05       ` Christoph Hellwig
2008-02-12 17:40     ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
2008-02-13  7:06       ` Stefan Priebe - allied internet ag
2008-02-13  9:03         ` Boaz Harrosh
2008-02-13 19:38           ` Jan Engelhardt
2008-02-14 15:58             ` Boaz Harrosh
2008-02-13 10:48       ` Boaz Harrosh
2008-02-13 15:44       ` James Bottomley
2008-02-13 15:54         ` Boaz Harrosh
2008-02-13 16:33           ` Boaz Harrosh
2008-02-13 16:35             ` [PATCH ver2] gdth: bugfix for the at-exit problems Boaz Harrosh
2008-02-13 16:45             ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash James Bottomley
2008-02-13 16:50               ` Boaz Harrosh
2008-02-13 17:03                 ` James Bottomley
2008-02-13 17:12                   ` Boaz Harrosh
2008-02-13 17:36                     ` James Bottomley
2008-02-14 10:49                       ` Boaz Harrosh
2008-02-14 11:58                         ` [PATCH] gdth: bugfix for the at-exit problems Boaz Harrosh
2008-02-14 16:10                           ` James Bottomley
2008-02-14 16:18                             ` Boaz Harrosh
2008-02-13 17:18                   ` [BUGFIX 2/2] gdth: bugfix for the Timer at exit crash Boaz Harrosh
2008-02-13 17:33                     ` James Bottomley
2008-02-14  6:51                   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).