LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* qla2xxx BUG: workqueue leaked lock or atomic
@ 2007-02-26 13:31 Andre Noll
2007-02-26 18:26 ` Andrew Vasquez
0 siblings, 1 reply; 19+ messages in thread
From: Andre Noll @ 2007-02-26 13:31 UTC (permalink / raw)
To: Andrew Vasquez; +Cc: linux-kernel, linux-scsi, James Bottomley
[-- Attachment #1: Type: text/plain, Size: 6353 bytes --]
Hi
On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems
connected to a qla2xxx card and used as a single volume via lvm.
The system seems to lock up only if data gets written to both raid
systems at the same time.
On a standard kernel nothing makes it to the log, the system just
freezes. So we tried a lockdep kernel which reports two BUGs during
boot, see below.
Could this be related to our problem?
Thanks
Andre
[ 64.150773] Loading iSCSI transport class v2.0-724.
[ 64.151096] QLogic Fibre Channel HBA Driver
[ 64.151405] ACPI: PCI Interrupt 0000:05:08.0[A] -> GSI 32 (level, low) -> IRQ 32
[ 64.151821] qla2xxx 0000:05:08.0: Found an ISP2422, irq 32, iobase 0xffffc20000006000
[ 64.152231] qla2xxx 0000:05:08.0: Configuring PCI space...
[ 64.152498] qla2xxx 0000:05:08.0: Configure NVRAM parameters...
[ 64.159088] qla2xxx 0000:05:08.0: Verifying loaded RISC code...
[ 74.169623] qla2xxx 0000:05:08.0: Firmware image unavailable.
[ 74.169737] qla2xxx 0000:05:08.0: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/.
[ 74.169902] qla2xxx 0000:05:08.0: Attempting to load (potentially outdated) firmware from flash.
[ 74.760935] qla2xxx 0000:05:08.0: Allocated (64 KB) for EFT...
[ 74.761186] qla2xxx 0000:05:08.0: Allocated (1413 KB) for firmware dump...
[ 74.776988] scsi0 : qla2xxx
[ 74.961451] qla2xxx 0000:05:08.0:
[ 74.961452] QLogic Fibre Channel HBA Driver: 8.01.07-k4
[ 74.961453] QLogic HP AE369-60001 - QLA2340
[ 74.961454] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.0 hdma+, host#=0, fw=4.00.70 [IP]
[ 74.961970] ACPI: PCI Interrupt 0000:05:08.1[B] -> GSI 33 (level, low) -> IRQ 33
[ 74.962296] qla2xxx 0000:05:08.1: Found an ISP2422, irq 33, iobase 0xffffc20000172000
[ 74.962662] qla2xxx 0000:05:08.1: Configuring PCI space...
[ 74.962914] qla2xxx 0000:05:08.1: Configure NVRAM parameters...
[ 74.969494] qla2xxx 0000:05:08.1: Verifying loaded RISC code...
[ 75.353426] qla2xxx 0000:05:08.0: LIP reset occured (f7f7).
[ 75.385670] qla2xxx 0000:05:08.0: LIP occured (f7f7).
[ 75.388282] qla2xxx 0000:05:08.0: LOOP UP detected (2 Gbps).
[ 75.778656] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
[ 75.778771]
[ 75.778772] Call Trace:
[ 75.778967] <IRQ> [<ffffffff8024b877>] trace_hardirqs_on+0xd7/0x180
[ 75.779154] [<ffffffff8052bc1b>] _spin_unlock_irq+0x2b/0x40
[ 75.779271] [<ffffffff804605d7>] qla2x00_process_completed_request+0x137/0x1d0
[ 75.779424] [<ffffffff804606f2>] qla2x00_status_entry+0x82/0xa40
[ 75.779541] [<ffffffff8024b17f>] __lock_acquire+0xcdf/0xd90
[ 75.779657] [<ffffffff8052bcb2>] _spin_unlock_irqrestore+0x42/0x60
[ 75.779775] [<ffffffff8046228e>] qla24xx_intr_handler+0x4e/0x2b0
[ 75.779892] [<ffffffff804613e1>] qla24xx_process_response_queue+0xc1/0x1c0
[ 75.780012] [<ffffffff80462414>] qla24xx_intr_handler+0x1d4/0x2b0
[ 75.780131] [<ffffffff8025e950>] handle_IRQ_event+0x20/0x60
[ 75.780270] [<ffffffff802604ad>] handle_fasteoi_irq+0xbd/0x110
[ 75.780411] [<ffffffff8020cf62>] do_IRQ+0x132/0x1a0
[ 75.780545] [<ffffffff80208430>] default_idle+0x0/0x60
[ 75.780682] [<ffffffff8020a236>] ret_from_intr+0x0/0xf
[ 75.780818] <EOI> [<ffffffff80208467>] default_idle+0x37/0x60
[ 75.781021] [<ffffffff80208469>] default_idle+0x39/0x60
[ 75.781156] [<ffffffff80208467>] default_idle+0x37/0x60
[ 75.781294] [<ffffffff802084f1>] cpu_idle+0x61/0x90
[ 75.781429] [<ffffffff806d6f8b>] start_secondary+0x51b/0x530
[ 75.781569]
[ 75.781873] scsi 0:0:0:0: Direct-Access transtec T6100F16R1-E 342I PQ: 0 ANSI: 5
[ 75.782532] BUG: workqueue leaked lock or atomic: scsi_wq_0/0x00000000/362
[ 75.782678] last function: fc_scsi_scan_rport+0x0/0x90
[ 75.782878] 1 lock held by scsi_wq_0/362:
[ 75.783008] #0: (&shost->scan_mutex){--..}, at: [<ffffffff80529fe5>] mutex_lock+0x25/0x30
[ 75.783517]
[ 75.783518] Call Trace:
[ 75.783754] [<ffffffff80248319>] debug_show_held_locks+0x9/0x10
[ 75.783896] [<ffffffff8023eb49>] run_workqueue+0x149/0x1a0
[ 75.784036] [<ffffffff802427c0>] keventd_create_kthread+0x0/0x90
[ 75.784180] [<ffffffff8023edc1>] worker_thread+0x151/0x190
[ 75.784322] [<ffffffff80227e80>] default_wake_function+0x0/0x10
[ 75.784463] [<ffffffff8023ec70>] worker_thread+0x0/0x190
[ 75.784600] [<ffffffff80242a2a>] kthread+0xda/0x110
[ 75.784737] [<ffffffff8020ab08>] child_rip+0xa/0x12
[ 75.784875] [<ffffffff8052bc1b>] _spin_unlock_irq+0x2b/0x40
[ 75.785014] [<ffffffff8020a28c>] restore_args+0x0/0x30
[ 75.785149] [<ffffffff80242950>] kthread+0x0/0x110
[ 75.785285] [<ffffffff8020aafe>] child_rip+0x0/0x12
[ 75.785417]
[ 84.980341] qla2xxx 0000:05:08.1: Firmware image unavailable.
[ 84.980455] qla2xxx 0000:05:08.1: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/.
[ 84.980620] qla2xxx 0000:05:08.1: Attempting to load (potentially outdated) firmware from flash.
[ 85.571726] qla2xxx 0000:05:08.1: Allocated (64 KB) for EFT...
[ 85.571956] qla2xxx 0000:05:08.1: Allocated (1413 KB) for firmware dump...
[ 85.587766] scsi1 : qla2xxx
[ 85.718476] qla2xxx 0000:05:08.1:
[ 85.718478] QLogic Fibre Channel HBA Driver: 8.01.07-k4
[ 85.718479] QLogic HP AE369-60001 - QLA2340
[ 85.718480] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.1 hdma+, host#=1, fw=4.00.70 [IP]
[ 85.719505] sda : very big device. try to use READ CAPACITY(16).
[ 85.719727] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB)
[ 85.720114] sda: Write Protect is off
[ 85.720219] sda: Mode Sense: 9b 00 00 08
[ 85.720608] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 85.721008] sda : very big device. try to use READ CAPACITY(16).
[ 85.721206] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB)
[ 85.721552] sda: Write Protect is off
[ 85.721680] sda: Mode Sense: 9b 00 00 08
[ 85.722088] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 85.722298] sda: unknown partition table
[ 85.722897] sd 0:0:0:0: Attached scsi disk sda
[ 85.723205] sd 0:0:0:0: Attached scsi generic sg0 type 0
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-02-26 13:31 qla2xxx BUG: workqueue leaked lock or atomic Andre Noll
@ 2007-02-26 18:26 ` Andrew Vasquez
2007-02-27 10:11 ` Andre Noll
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Vasquez @ 2007-02-26 18:26 UTC (permalink / raw)
To: Andre Noll; +Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley
On Mon, 26 Feb 2007, Andre Noll wrote:
> On linux-2.6.20.1, we're seeing hard lockups with 2 raid systems
> connected to a qla2xxx card and used as a single volume via lvm.
> The system seems to lock up only if data gets written to both raid
> systems at the same time.
>
> On a standard kernel nothing makes it to the log, the system just
> freezes. So we tried a lockdep kernel which reports two BUGs during
> boot, see below.
>
> Could this be related to our problem?
Before we proceed further, could you retrieve the latest firmware
release for 24xx type HBAs:
> [ 64.151096] QLogic Fibre Channel HBA Driver
> [ 64.151405] ACPI: PCI Interrupt 0000:05:08.0[A] -> GSI 32 (level, low) -> IRQ 32
> [ 64.151821] qla2xxx 0000:05:08.0: Found an ISP2422, irq 32, iobase 0xffffc20000006000
> [ 64.152231] qla2xxx 0000:05:08.0: Configuring PCI space...
> [ 64.152498] qla2xxx 0000:05:08.0: Configure NVRAM parameters...
> [ 64.159088] qla2xxx 0000:05:08.0: Verifying loaded RISC code...
> [ 74.169623] qla2xxx 0000:05:08.0: Firmware image unavailable.
> [ 74.169737] qla2xxx 0000:05:08.0: Firmware images can be retrieved from: ftp://ftp.qlogic.com/outgoing/linux/firmware/.
> [ 74.169902] qla2xxx 0000:05:08.0: Attempting to load (potentially outdated) firmware from flash.
> [ 74.760935] qla2xxx 0000:05:08.0: Allocated (64 KB) for EFT...
> [ 74.761186] qla2xxx 0000:05:08.0: Allocated (1413 KB) for firmware dump...
> [ 74.776988] scsi0 : qla2xxx
> [ 74.961451] qla2xxx 0000:05:08.0:
> [ 74.961452] QLogic Fibre Channel HBA Driver: 8.01.07-k4
> [ 74.961453] QLogic HP AE369-60001 - QLA2340
> [ 74.961454] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.0 hdma+, host#=0, fw=4.00.70 [IP]
You are loading some stale firmware that's left over on the card --
I'm not even sure what 4.00.70 is, as the latest release firmware is
4.00.27. You can retrieve the image here:
ftp://ftp.qlogic.com/outgoing/linux/firmware/ql2400_fw.bin
Let's start there... before we move on to this:
> [ 75.778656] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
> [ 75.778771]
> [ 75.778772] Call Trace:
> [ 75.778967] <IRQ> [<ffffffff8024b877>] trace_hardirqs_on+0xd7/0x180
> [ 75.779154] [<ffffffff8052bc1b>] _spin_unlock_irq+0x2b/0x40
> [ 75.779271] [<ffffffff804605d7>] qla2x00_process_completed_request+0x137/0x1d0
> [ 75.779424] [<ffffffff804606f2>] qla2x00_status_entry+0x82/0xa40
> [ 75.779541] [<ffffffff8024b17f>] __lock_acquire+0xcdf/0xd90
> [ 75.779657] [<ffffffff8052bcb2>] _spin_unlock_irqrestore+0x42/0x60
> [ 75.779775] [<ffffffff8046228e>] qla24xx_intr_handler+0x4e/0x2b0
> [ 75.779892] [<ffffffff804613e1>] qla24xx_process_response_queue+0xc1/0x1c0
> [ 75.780012] [<ffffffff80462414>] qla24xx_intr_handler+0x1d4/0x2b0
> [ 75.780131] [<ffffffff8025e950>] handle_IRQ_event+0x20/0x60
Hmm....
Regards,
Andrew Vasquez
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-02-26 18:26 ` Andrew Vasquez
@ 2007-02-27 10:11 ` Andre Noll
2007-02-27 14:35 ` Andre Noll
2007-02-27 18:51 ` Andrew Vasquez
0 siblings, 2 replies; 19+ messages in thread
From: Andre Noll @ 2007-02-27 10:11 UTC (permalink / raw)
To: Andrew Vasquez; +Cc: linux-kernel, linux-scsi, James Bottomley
[-- Attachment #1: Type: text/plain, Size: 7880 bytes --]
On 10:26, Andrew Vasquez wrote:
> You are loading some stale firmware that's left over on the card --
> I'm not even sure what 4.00.70 is, as the latest release firmware is
> 4.00.27.
That's the firmware which came with the card. Anyway, I just upgraded
the firmware, but the bug remains. The backtrace differs a bit though
as now the tg3 network driver seems to be involved as well.
Thanks for your help
Andre
[ 67.511167] qla2xxx 0000:05:08.0: Allocated (64 KB) for EFT...
[ 67.511434] qla2xxx 0000:05:08.0: Allocated (1413 KB) for firmware dump...
[ 67.531231] scsi0 : qla2xxx
[ 67.854344] qla2xxx 0000:05:08.0:
[ 67.854346] QLogic Fibre Channel HBA Driver: 8.01.07-k4
[ 67.854347] QLogic HP AE369-60001 - QLA2340
[ 67.854348] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.0 hdma+, host#=0, fw=4.00.27 [IP]
[ 67.854881] ACPI: PCI Interrupt 0000:05:08.1[B] -> GSI 33 (level, low) -> IRQ 33
[ 67.855230] qla2xxx 0000:05:08.1: Found an ISP2422, irq 33, iobase 0xffffc20000012000
[ 67.855645] qla2xxx 0000:05:08.1: Configuring PCI space...
[ 67.855907] qla2xxx 0000:05:08.1: Configure NVRAM parameters...
[ 67.862486] qla2xxx 0000:05:08.1: Verifying loaded RISC code...
[ 68.106663] qla2xxx 0000:05:08.1: Allocated (64 KB) for EFT...
[ 68.107058] qla2xxx 0000:05:08.1: Allocated (1413 KB) for firmware dump...
[ 68.126759] scsi1 : qla2xxx
[ 68.196783] Adding 6540152k swap on /dev/md2. Priority:-1 extents:1 across:6540152k
[ 68.260645] qla2xxx 0000:05:08.0: LIP reset occured (f8f7).
[ 68.296027] qla2xxx 0000:05:08.0: LIP occured (f8f7).
[ 68.298214] qla2xxx 0000:05:08.0: LOOP UP detected (2 Gbps).
[ 68.326627] qla2xxx 0000:05:08.1:
[ 68.326628] QLogic Fibre Channel HBA Driver: 8.01.07-k4
[ 68.326630] QLogic HP AE369-60001 - QLA2340
[ 68.326631] ISP2422: PCI-X Mode 1 (133 MHz) @ 0000:05:08.1 hdma+, host#=1, fw=4.00.27 [IP]
[ 68.504335] EXT3 FS on md1, internal journal
[ 68.524627] PM: Writing back config space on device 0000:03:06.0 at offset b (was 164814e4, writing d00e11)
[ 68.524644] PM: Writing back config space on device 0000:03:06.0 at offset 3 (was 804000, writing 804010)
[ 68.524650] PM: Writing back config space on device 0000:03:06.0 at offset 2 (was 2000000, writing 2000010)
[ 68.524657] PM: Writing back config space on device 0000:03:06.0 at offset 1 (was 2b00000, writing 2b00146)
[ 68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
[ 68.532784]
[ 68.532785] Call Trace:
[ 68.532979] <IRQ> [<ffffffff8024b877>] trace_hardirqs_on+0xd7/0x180
[ 68.533168] [<ffffffff80511f5b>] _spin_unlock_irq+0x2b/0x40
[ 68.533295] [<ffffffff88032747>] :qla2xxx:qla2x00_process_completed_request+0x137/0x1d0
[ 68.533457] [<ffffffff88032862>] :qla2xxx:qla2x00_status_entry+0x82/0xa40
[ 68.533577] [<ffffffff8024b17f>] __lock_acquire+0xcdf/0xd90
[ 68.533693] [<ffffffff80511ff2>] _spin_unlock_irqrestore+0x42/0x60
[ 68.533816] [<ffffffff880343fe>] :qla2xxx:qla24xx_intr_handler+0x4e/0x2b0
[ 68.533942] [<ffffffff88033551>] :qla2xxx:qla24xx_process_response_queue+0xc1/0x1c0
[ 68.534102] [<ffffffff88034584>] :qla2xxx:qla24xx_intr_handler+0x1d4/0x2b0
[ 68.534224] [<ffffffff8025e950>] handle_IRQ_event+0x20/0x60
[ 68.534339] [<ffffffff802604ad>] handle_fasteoi_irq+0xbd/0x110
[ 68.534459] [<ffffffff8020cf62>] do_IRQ+0x132/0x1a0
[ 68.534574] [<ffffffff8020a236>] ret_from_intr+0x0/0xf
[ 68.534687] <EOI> [<ffffffff803ad15c>] __delay+0xc/0x20
[ 68.534862] [<ffffffff803ad1a7>] __const_udelay+0x37/0x40
[ 68.534982] [<ffffffff88006737>] :tg3:tg3_chip_reset+0x547/0x670
[ 68.535103] [<ffffffff8800df2d>] :tg3:tg3_reset_hw+0x5d/0x1790
[ 68.535218] [<ffffffff803ad1e7>] __udelay+0x37/0x40
[ 68.535333] [<ffffffff8800408d>] :tg3:_tw32_flush+0x6d/0x80
[ 68.535451] [<ffffffff88012196>] :tg3:tg3_open+0x2d6/0x610
[ 68.535569] [<ffffffff8800f6a2>] :tg3:tg3_init_hw+0x42/0x50
[ 68.535687] [<ffffffff880121a3>] :tg3:tg3_open+0x2e3/0x610
[ 68.535804] [<ffffffff804b36e3>] dev_open+0x43/0x90
[ 68.535917] [<ffffffff804b2814>] dev_change_flags+0x74/0x160
[ 68.536034] [<ffffffff804f3e66>] devinet_ioctl+0x2e6/0x730
[ 68.536149] [<ffffffff804b4bc2>] dev_ioctl+0x302/0x340
[ 68.536264] [<ffffffff803aa71b>] __up_read+0x9b/0xb0
[ 68.536378] [<ffffffff804f42fc>] inet_ioctl+0x4c/0x70
[ 68.536494] [<ffffffff804a73ec>] sock_ioctl+0x1fc/0x230
[ 68.536610] [<ffffffff8029c701>] do_ioctl+0x31/0xa0
[ 68.536722] [<ffffffff8029ca2b>] vfs_ioctl+0x2bb/0x2e0
[ 68.536836] [<ffffffff8029ca9a>] sys_ioctl+0x4a/0x80
[ 68.536948] [<ffffffff80209cee>] system_call+0x7e/0x83
[ 68.537059]
[ 68.712832] scsi 0:0:0:0: Direct-Access transtec T6100F16R1-E 342I PQ: 0 ANSI: 5
[ 68.713384] sda : very big device. try to use READ CAPACITY(16).
[ 68.713594] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB)
[ 68.713976] sda: Write Protect is off
[ 68.714079] sda: Mode Sense: 9b 00 00 08
[ 68.714483] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 68.714876] sda : very big device. try to use READ CAPACITY(16).
[ 68.715080] SCSI device sda: 11714863104 512-byte hdwr sectors (5998010 MB)
[ 68.715436] sda: Write Protect is off
[ 68.715539] sda: Mode Sense: 9b 00 00 08
[ 68.715944] SCSI device sda: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 68.718244] sda: unknown partition table
[ 68.718707] sd 0:0:0:0: Attached scsi disk sda
[ 68.718945] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 68.719413] BUG: workqueue leaked lock or atomic: scsi_wq_0/0x00000000/2138
[ 68.719556] last function: fc_scsi_scan_rport+0x0/0x90
[ 68.719754] 1 lock held by scsi_wq_0/2138:
[ 68.719878] #0: (&shost->scan_mutex){--..}, at: [<ffffffff80510325>] mutex_lock+0x25/0x30
[ 68.720380]
[ 68.720381] Call Trace:
[ 68.720616] [<ffffffff80248319>] debug_show_held_locks+0x9/0x10
[ 68.720757] [<ffffffff8023eb49>] run_workqueue+0x149/0x1a0
[ 68.720891] [<ffffffff802427c0>] keventd_create_kthread+0x0/0x90
[ 68.721030] [<ffffffff8023edc1>] worker_thread+0x151/0x190
[ 68.721167] [<ffffffff80227e80>] default_wake_function+0x0/0x10
[ 68.721307] [<ffffffff8023ec70>] worker_thread+0x0/0x190
[ 68.721443] [<ffffffff80242a2a>] kthread+0xda/0x110
[ 68.721575] [<ffffffff8020ab08>] child_rip+0xa/0x12
[ 68.721709] [<ffffffff80511f5b>] _spin_unlock_irq+0x2b/0x40
[ 68.721842] [<ffffffff8020a28c>] restore_args+0x0/0x30
[ 68.721973] [<ffffffff80242950>] kthread+0x0/0x110
[ 68.722106] [<ffffffff8020aafe>] child_rip+0x0/0x12
[ 68.722240]
[ 68.762666] qla2xxx 0000:05:08.1: LIP reset occured (f7f7).
[ 68.797954] qla2xxx 0000:05:08.1: LIP occured (f7f7).
[ 68.800134] qla2xxx 0000:05:08.1: LOOP UP detected (2 Gbps).
[ 69.127937] scsi 1:0:0:0: Direct-Access ADVUNI OXYGENRAID 416F 341B PQ: 0 ANSI: 3
[ 69.128528] sdb : very big device. try to use READ CAPACITY(16).
[ 69.128777] SCSI device sdb: 9370656768 512-byte hdwr sectors (4797776 MB)
[ 69.129220] sdb: Write Protect is off
[ 69.129326] sdb: Mode Sense: 8f 00 00 08
[ 69.129878] SCSI device sdb: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 69.130342] sdb : very big device. try to use READ CAPACITY(16).
[ 69.130585] SCSI device sdb: 9370656768 512-byte hdwr sectors (4797776 MB)
[ 69.131006] sdb: Write Protect is off
[ 69.131110] sdb: Mode Sense: 8f 00 00 08
[ 69.131660] SCSI device sdb: write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 69.131843] sdb: unknown partition table
[ 69.132401] sd 1:0:0:0: Attached scsi disk sdb
[ 69.132624] sd 1:0:0:0: Attached scsi generic sg1 type 0
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-02-27 10:11 ` Andre Noll
@ 2007-02-27 14:35 ` Andre Noll
2007-02-27 18:51 ` Andrew Vasquez
1 sibling, 0 replies; 19+ messages in thread
From: Andre Noll @ 2007-02-27 14:35 UTC (permalink / raw)
To: Andrew Vasquez; +Cc: linux-kernel, linux-scsi, James Bottomley
[-- Attachment #1: Type: text/plain, Size: 555 bytes --]
On 11:11, Andre Noll wrote:
> On 10:26, Andrew Vasquez wrote:
> > You are loading some stale firmware that's left over on the card --
> > I'm not even sure what 4.00.70 is, as the latest release firmware is
> > 4.00.27.
>
> That's the firmware which came with the card. Anyway, I just upgraded
> the firmware, but the bug remains.
the system crashed again btw., this time resulting in a kernel panic
instead of just locking up silently. Here's a screenshot:
http://systemlinux.org/~maan/shots/qla2xxx-crash-huangho2.png
Regards
Andre
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-02-27 10:11 ` Andre Noll
2007-02-27 14:35 ` Andre Noll
@ 2007-02-27 18:51 ` Andrew Vasquez
2007-02-28 15:18 ` Andre Noll
1 sibling, 1 reply; 19+ messages in thread
From: Andrew Vasquez @ 2007-02-27 18:51 UTC (permalink / raw)
To: Andre Noll; +Cc: linux-kernel, linux-scsi, James Bottomley
On Tue, 27 Feb 2007, Andre Noll wrote:
> On 10:26, Andrew Vasquez wrote:
> > You are loading some stale firmware that's left over on the card --
> > I'm not even sure what 4.00.70 is, as the latest release firmware is
> > 4.00.27.
>
> That's the firmware which came with the card. Anyway, I just upgraded
> the firmware, but the bug remains. The backtrace differs a bit though
> as now the tg3 network driver seems to be involved as well.
>
> Thanks for your help
> Andre
...
> [ 68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
> [ 68.532784]
> [ 68.532785] Call Trace:
> [ 68.532979] <IRQ> [<ffffffff8024b877>] trace_hardirqs_on+0xd7/0x180
> [ 68.533168] [<ffffffff80511f5b>] _spin_unlock_irq+0x2b/0x40
> [ 68.533295] [<ffffffff88032747>] :qla2xxx:qla2x00_process_completed_request+0x137/0x1d0
> [ 68.533457] [<ffffffff88032862>] :qla2xxx:qla2x00_status_entry+0x82/0xa40
> [ 68.533577] [<ffffffff8024b17f>] __lock_acquire+0xcdf/0xd90
> [ 68.533693] [<ffffffff80511ff2>] _spin_unlock_irqrestore+0x42/0x60
> [ 68.533816] [<ffffffff880343fe>] :qla2xxx:qla24xx_intr_handler+0x4e/0x2b0
> [ 68.533942] [<ffffffff88033551>] :qla2xxx:qla24xx_process_response_queue+0xc1/0x1c0
> [ 68.534102] [<ffffffff88034584>] :qla2xxx:qla24xx_intr_handler+0x1d4/0x2b0
Ok, since 2.6.20, there been a patch added to qla2xxx which drops the
spin_unlock_irq() call while attempting to ramp-up the queue-depth:
commit befede3dabd204e9c546cbfbe391b29286c57da2
Author: Seokmann Ju <seokmann.ju@qlogic.com>
Date: Tue Jan 9 11:37:52 2007 -0800
[SCSI] qla2xxx: correct locking while call starget_for_each_device()
Removed spin_unlock_irq()/spin_lock_irq() pairs surrounding
starget_for_each_device() calls.
As Matthew W. pointed out, starget_for_each_device() can be called under
a spinlock being held.
The change has been tested and verified on qla2xxx.ko module.
Thanks Matthew W. and Hisashi H. for help.
Signed-off-by: Andrew Vasquez <Andrew.vasquez@qlogic.com>
Signed-off-by: Seokmann Ju <Seokmann.ju@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
http://marc.theaimsgroup.com/?l=linux-scsi&m=116837234904583&w=2
Could you try the latest 2.6.21-rc which contains the correction?
Regards,
Andrew Vasquez
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-02-27 18:51 ` Andrew Vasquez
@ 2007-02-28 15:18 ` Andre Noll
2007-02-28 15:37 ` Andre Noll
0 siblings, 1 reply; 19+ messages in thread
From: Andre Noll @ 2007-02-28 15:18 UTC (permalink / raw)
To: Andrew Vasquez; +Cc: linux-kernel, linux-scsi, James Bottomley
[-- Attachment #1: Type: text/plain, Size: 676 bytes --]
On 10:51, Andrew Vasquez wrote:
> On Tue, 27 Feb 2007, Andre Noll wrote:
> > [ 68.532665] BUG: at kernel/lockdep.c:1860 trace_hardirqs_on()
>
> Ok, since 2.6.20, there been a patch added to qla2xxx which drops the
> spin_unlock_irq() call while attempting to ramp-up the queue-depth:
>
> Could you try the latest 2.6.21-rc which contains the correction?
With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
writing to both raid systems at the same time via lvm still locks up
the system within minutes.
As lockdep revealed another dm-related lock problem on this kernel,
I guess I'll have to bother the lvm people on this.
Thanks
Andre
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-02-28 15:18 ` Andre Noll
@ 2007-02-28 15:37 ` Andre Noll
2007-03-07 4:39 ` Andrew Morton
2007-03-07 18:46 ` Jens Axboe
0 siblings, 2 replies; 19+ messages in thread
From: Andre Noll @ 2007-02-28 15:37 UTC (permalink / raw)
To: Andrew Vasquez; +Cc: linux-kernel, linux-scsi, James Bottomley
[-- Attachment #1: Type: text/plain, Size: 319 bytes --]
On 16:18, Andre Noll wrote:
> With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> writing to both raid systems at the same time via lvm still locks up
> the system within minutes.
Screenshot of the resulting kernel panic:
http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
Andre
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-02-28 15:37 ` Andre Noll
@ 2007-03-07 4:39 ` Andrew Morton
2007-03-07 17:09 ` Andre Noll
2007-03-07 18:46 ` Jens Axboe
1 sibling, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-07 4:39 UTC (permalink / raw)
To: Andre Noll
Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley,
Jens Axboe, Alasdair G Kergon, Adrian Bunk
On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll <maan@systemlinux.org> wrote:
> On 16:18, Andre Noll wrote:
>
> > With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> > writing to both raid systems at the same time via lvm still locks up
> > the system within minutes.
>
> Screenshot of the resulting kernel panic:
>
> http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
>
It died in CFQ. Please try a different IO scheduler. Use something
like
echo deadline > /sys/block/sda/queue/scheduler
This could still be the old qla2xxx bug, or it could be a new qla2xxx bug,
or it could be a block bug, or it could be an LVM bug.
Adrian, can we please track this as a regression?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-07 4:39 ` Andrew Morton
@ 2007-03-07 17:09 ` Andre Noll
2007-03-07 19:45 ` Andrew Morton
0 siblings, 1 reply; 19+ messages in thread
From: Andre Noll @ 2007-03-07 17:09 UTC (permalink / raw)
To: Andrew Morton
Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley,
Jens Axboe, Alasdair G Kergon, Adrian Bunk
[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]
On 20:39, Andrew Morton wrote:
> On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll <maan@systemlinux.org> wrote:
>
> > On 16:18, Andre Noll wrote:
> >
> > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> > > writing to both raid systems at the same time via lvm still locks up
> > > the system within minutes.
> >
> > Screenshot of the resulting kernel panic:
> >
> > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
> >
>
> It died in CFQ. Please try a different IO scheduler. Use something
> like
>
> echo deadline > /sys/block/sda/queue/scheduler
>
> This could still be the old qla2xxx bug, or it could be a new qla2xxx bug,
> or it could be a block bug, or it could be an LVM bug.
OK. I'm running with deadline right now. But I guess this kernel
panic was caused by an LVM bug because lockdep reported problems with
LVM. Nobody responded to my bug report on the LVM mailing list (see
http://www.redhat.com/archives/linux-lvm/2007-February/msg00102.html).
Non-working snapshots and no help from the mailing list convinced me
to ditch the lvm setup [1] in favour of linear software raid. This
means I can't do lvm-related tests any more.
BTW: Are ext3 filesystem sizes greater than 8T now officially
supported?
Thanks
Andre
[1] vg of two hardware raids, 10T together, a single lv and some snapshots
--
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-02-28 15:37 ` Andre Noll
2007-03-07 4:39 ` Andrew Morton
@ 2007-03-07 18:46 ` Jens Axboe
2007-03-08 8:52 ` Andre Noll
1 sibling, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2007-03-07 18:46 UTC (permalink / raw)
To: Andre Noll; +Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley
On Wed, Feb 28 2007, Andre Noll wrote:
> On 16:18, Andre Noll wrote:
>
> > With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> > writing to both raid systems at the same time via lvm still locks up
> > the system within minutes.
>
> Screenshot of the resulting kernel panic:
>
> http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
Do you have the full oops as well?
--
Jens Axboe
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-07 17:09 ` Andre Noll
@ 2007-03-07 19:45 ` Andrew Morton
2007-03-07 20:05 ` Mingming Cao
0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2007-03-07 19:45 UTC (permalink / raw)
To: Andre Noll
Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley,
Jens Axboe, Alasdair G Kergon, Adrian Bunk, linux-ext4
On Wed, 7 Mar 2007 18:09:55 +0100 Andre Noll <maan@systemlinux.org> wrote:
> On 20:39, Andrew Morton wrote:
> > On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll <maan@systemlinux.org> wrote:
> >
> > > On 16:18, Andre Noll wrote:
> > >
> > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> > > > writing to both raid systems at the same time via lvm still locks up
> > > > the system within minutes.
> > >
> > > Screenshot of the resulting kernel panic:
> > >
> > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
> > >
> >
> > It died in CFQ. Please try a different IO scheduler. Use something
> > like
> >
> > echo deadline > /sys/block/sda/queue/scheduler
> >
> > This could still be the old qla2xxx bug, or it could be a new qla2xxx bug,
> > or it could be a block bug, or it could be an LVM bug.
>
> OK. I'm running with deadline right now. But I guess this kernel
> panic was caused by an LVM bug because lockdep reported problems with
> LVM. Nobody responded to my bug report on the LVM mailing list (see
> http://www.redhat.com/archives/linux-lvm/2007-February/msg00102.html).
>
> Non-working snapshots and no help from the mailing list convinced me
> to ditch the lvm setup [1] in favour of linear software raid. This
> means I can't do lvm-related tests any more.
Sigh.
> BTW: Are ext3 filesystem sizes greater than 8T now officially
> supported?
I think so, but I don't know how much 16TB testing developers and
distros are doing - perhaps the linux-ext4 denizens can tell us?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-07 19:45 ` Andrew Morton
@ 2007-03-07 20:05 ` Mingming Cao
2007-03-09 9:36 ` Andre Noll
0 siblings, 1 reply; 19+ messages in thread
From: Mingming Cao @ 2007-03-07 20:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Andre Noll, Andrew Vasquez, linux-kernel, linux-scsi,
James Bottomley, Jens Axboe, Alasdair G Kergon, Adrian Bunk,
linux-ext4
On Wed, 2007-03-07 at 11:45 -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2007 18:09:55 +0100 Andre Noll <maan@systemlinux.org> wrote:
>
> > On 20:39, Andrew Morton wrote:
> > > On Wed, 28 Feb 2007 16:37:22 +0100 Andre Noll <maan@systemlinux.org> wrote:
> > >
> > > > On 16:18, Andre Noll wrote:
> > > >
> > > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> > > > > writing to both raid systems at the same time via lvm still locks up
> > > > > the system within minutes.
> > > >
> > > > Screenshot of the resulting kernel panic:
> > > >
> > > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
> > > >
> > >
> > > It died in CFQ. Please try a different IO scheduler. Use something
> > > like
> > >
> > > echo deadline > /sys/block/sda/queue/scheduler
> > >
> > > This could still be the old qla2xxx bug, or it could be a new qla2xxx bug,
> > > or it could be a block bug, or it could be an LVM bug.
> >
> > OK. I'm running with deadline right now. But I guess this kernel
> > panic was caused by an LVM bug because lockdep reported problems with
> > LVM. Nobody responded to my bug report on the LVM mailing list (see
> > http://www.redhat.com/archives/linux-lvm/2007-February/msg00102.html).
> >
> > Non-working snapshots and no help from the mailing list convinced me
> > to ditch the lvm setup [1] in favour of linear software raid. This
> > means I can't do lvm-related tests any more.
>
> Sigh.
>
> > BTW: Are ext3 filesystem sizes greater than 8T now officially
> > supported?
>
> I think so, but I don't know how much 16TB testing developers and
> distros are doing - perhaps the linux-ext4 denizens can tell us?
> -
IBM has done some testing (dbench, fsstress, fsx, tiobench, iozone etc)
on 10TB ext3, I think RedHat and BULL have done similar test on >8TB
ext3 too.
Mingming
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-07 18:46 ` Jens Axboe
@ 2007-03-08 8:52 ` Andre Noll
2007-03-08 9:02 ` Jens Axboe
0 siblings, 1 reply; 19+ messages in thread
From: Andre Noll @ 2007-03-08 8:52 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley
[-- Attachment #1: Type: text/plain, Size: 742 bytes --]
On 19:46, Jens Axboe wrote:
> On Wed, Feb 28 2007, Andre Noll wrote:
> > On 16:18, Andre Noll wrote:
> >
> > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> > > writing to both raid systems at the same time via lvm still locks up
> > > the system within minutes.
> >
> > Screenshot of the resulting kernel panic:
> >
> > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
>
> Do you have the full oops as well?
Unfortunately not, as there's no way to scroll up after a kernel panic
(the screenshot was taken by using a KVM switch which just sends the
video output over ethernet).
Thanks
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-08 8:52 ` Andre Noll
@ 2007-03-08 9:02 ` Jens Axboe
2007-03-08 9:33 ` Andre Noll
0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2007-03-08 9:02 UTC (permalink / raw)
To: Andre Noll; +Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley
On Thu, Mar 08 2007, Andre Noll wrote:
> On 19:46, Jens Axboe wrote:
> > On Wed, Feb 28 2007, Andre Noll wrote:
> > > On 16:18, Andre Noll wrote:
> > >
> > > > With 2.6.21-rc2 I am unable to reproduce this BUG message. However,
> > > > writing to both raid systems at the same time via lvm still locks up
> > > > the system within minutes.
> > >
> > > Screenshot of the resulting kernel panic:
> > >
> > > http://systemlinux.org/~maan/shots/kernel-panic-21-rc2-huangho2.png
> >
> > Do you have the full oops as well?
>
> Unfortunately not, as there's no way to scroll up after a kernel panic
> (the screenshot was taken by using a KVM switch which just sends the
> video output over ethernet).
Do you still have the vmlinux? It'd be interesting to see what
$ gbd vmlinux
(gdb) l *cfq_dispatch_insert+0x28
says, here that'd be cfqq dereference. And that must be valid, it's set
on allocation time and only cleared after free. So unless lvm issues
private requests that aren't properly allocated, this whole thing looks
very bizarre.
--
Jens Axboe
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-08 9:02 ` Jens Axboe
@ 2007-03-08 9:33 ` Andre Noll
2007-03-08 9:36 ` Jens Axboe
0 siblings, 1 reply; 19+ messages in thread
From: Andre Noll @ 2007-03-08 9:33 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley
[-- Attachment #1: Type: text/plain, Size: 480 bytes --]
On 10:02, Jens Axboe wrote:
> Do you still have the vmlinux? It'd be interesting to see what
>
> $ gbd vmlinux
> (gdb) l *cfq_dispatch_insert+0x28
>
> says,
The vmlinux in the kernel dir is dated March 5 and my bug report
was Feb 28. So I'm afraid it's gone. I tried the gdb command anyway
but it only gave me
No symbol table is loaded. Use the "file" command.
Sorry
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-08 9:33 ` Andre Noll
@ 2007-03-08 9:36 ` Jens Axboe
2007-03-08 10:29 ` Andre Noll
0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2007-03-08 9:36 UTC (permalink / raw)
To: Andre Noll; +Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley
On Thu, Mar 08 2007, Andre Noll wrote:
> On 10:02, Jens Axboe wrote:
> > Do you still have the vmlinux? It'd be interesting to see what
> >
> > $ gbd vmlinux
> > (gdb) l *cfq_dispatch_insert+0x28
> >
> > says,
>
> The vmlinux in the kernel dir is dated March 5 and my bug report
> was Feb 28. So I'm afraid it's gone. I tried the gdb command anyway
> but it only gave me
>
> No symbol table is loaded. Use the "file" command.
Yeah, you'd need CONFIG_DEBUG_INFO enabled as well. I don't think there
were any CFQ changes between feb 28 and march 5, so you could probably
still try it out. A quicker way:
- Edit .config and set CONFIG_DEBUG_INFO=y (near the bottom)
- make oldconfig
- rm block/cfq-iosched.o
- make block/cfq-iosched.o
- gdb block/cfq-iosched.o
(gdb) l *cfq_dispatch_insert+0x28
and see what that says. Should not take you more than a minute or so,
would appreciate it!
--
Jens Axboe
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-08 9:36 ` Jens Axboe
@ 2007-03-08 10:29 ` Andre Noll
2007-03-08 10:35 ` Jens Axboe
0 siblings, 1 reply; 19+ messages in thread
From: Andre Noll @ 2007-03-08 10:29 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley
[-- Attachment #1: Type: text/plain, Size: 1421 bytes --]
On 10:36, Jens Axboe wrote:
> - Edit .config and set CONFIG_DEBUG_INFO=y (near the bottom)
> - make oldconfig
> - rm block/cfq-iosched.o
> - make block/cfq-iosched.o
> - gdb block/cfq-iosched.o
>
> (gdb) l *cfq_dispatch_insert+0x28
>
> and see what that says. Should not take you more than a minute or so,
> would appreciate it!
No problem, here we go:
# gdb block/cfq-iosched.o
GNU gdb 6.4-debian
Copyright 2005 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) l *cfq_dispatch_insert+0x28
0xcf8 is in cfq_dispatch_insert (block/cfq-iosched.c:865).
860 }
861
862 static void cfq_dispatch_insert(request_queue_t *q, struct request *rq)
863 {
864 struct cfq_data *cfqd = q->elevator->elevator_data;
865 struct cfq_queue *cfqq = RQ_CFQQ(rq);
866
867 cfq_remove_request(rq);
868 cfqq->on_dispatch[rq_is_sync(rq)]++;
869 elv_dispatch_sort(q, rq);
Regards
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-08 10:29 ` Andre Noll
@ 2007-03-08 10:35 ` Jens Axboe
0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2007-03-08 10:35 UTC (permalink / raw)
To: Andre Noll; +Cc: Andrew Vasquez, linux-kernel, linux-scsi, James Bottomley
On Thu, Mar 08 2007, Andre Noll wrote:
> On 10:36, Jens Axboe wrote:
> > - Edit .config and set CONFIG_DEBUG_INFO=y (near the bottom)
> > - make oldconfig
> > - rm block/cfq-iosched.o
> > - make block/cfq-iosched.o
> > - gdb block/cfq-iosched.o
> >
> > (gdb) l *cfq_dispatch_insert+0x28
> >
> > and see what that says. Should not take you more than a minute or so,
> > would appreciate it!
>
> No problem, here we go:
>
> # gdb block/cfq-iosched.o
> GNU gdb 6.4-debian
> Copyright 2005 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".
>
> (gdb) l *cfq_dispatch_insert+0x28
> 0xcf8 is in cfq_dispatch_insert (block/cfq-iosched.c:865).
> 860 }
> 861
> 862 static void cfq_dispatch_insert(request_queue_t *q, struct request *rq)
> 863 {
> 864 struct cfq_data *cfqd = q->elevator->elevator_data;
> 865 struct cfq_queue *cfqq = RQ_CFQQ(rq);
> 866
> 867 cfq_remove_request(rq);
> 868 cfqq->on_dispatch[rq_is_sync(rq)]++;
> 869 elv_dispatch_sort(q, rq);
Ok, so it's ->next_rq being NULL or invalid. Similar to the report from
Dan last week, that's a bit worrisome. I'll have to look further into
that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: qla2xxx BUG: workqueue leaked lock or atomic
2007-03-07 20:05 ` Mingming Cao
@ 2007-03-09 9:36 ` Andre Noll
0 siblings, 0 replies; 19+ messages in thread
From: Andre Noll @ 2007-03-09 9:36 UTC (permalink / raw)
To: Mingming Cao
Cc: Andrew Morton, Andrew Vasquez, linux-kernel, linux-scsi,
James Bottomley, Jens Axboe, Alasdair G Kergon, Adrian Bunk,
linux-ext4
[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]
On 12:05, Mingming Cao wrote:
> > > BTW: Are ext3 filesystem sizes greater than 8T now officially
> > > supported?
> >
> > I think so, but I don't know how much 16TB testing developers and
> > distros are doing - perhaps the linux-ext4 denizens can tell us?
> > -
>
> IBM has done some testing (dbench, fsstress, fsx, tiobench, iozone etc)
> on 10TB ext3, I think RedHat and BULL have done similar test on >8TB
> ext3 too.
Thanks. I'm asking because some days ago I tried to create a 10T ext3
filesytem on a linear software raid over two hardware raids, and it
failed horribly. mke2fs from e2fsprogs-1.39 refused to create such a
large filesystem but did it with -F, and I could mount it afterwards.
But writing data immediately produced zillions of errors and only
power-cycling the box helped.
We're now using a 7.9T filesystem on the same hardware. That seems
to work fine on 2.6.21-rc2, so I think this is an ext3 problem. I
cannot completely rule out other reasons though as the underlying
qla2xxx driver also had some problems on earlier kernels.
We'd much rather have a 10T filesystem if possible. So if you have
time to look into the issue I would be willing to recreate the 10T
filesystem and send details.
Regards
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2007-03-09 9:46 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-26 13:31 qla2xxx BUG: workqueue leaked lock or atomic Andre Noll
2007-02-26 18:26 ` Andrew Vasquez
2007-02-27 10:11 ` Andre Noll
2007-02-27 14:35 ` Andre Noll
2007-02-27 18:51 ` Andrew Vasquez
2007-02-28 15:18 ` Andre Noll
2007-02-28 15:37 ` Andre Noll
2007-03-07 4:39 ` Andrew Morton
2007-03-07 17:09 ` Andre Noll
2007-03-07 19:45 ` Andrew Morton
2007-03-07 20:05 ` Mingming Cao
2007-03-09 9:36 ` Andre Noll
2007-03-07 18:46 ` Jens Axboe
2007-03-08 8:52 ` Andre Noll
2007-03-08 9:02 ` Jens Axboe
2007-03-08 9:33 ` Andre Noll
2007-03-08 9:36 ` Jens Axboe
2007-03-08 10:29 ` Andre Noll
2007-03-08 10:35 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).