LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [BUG] 2.6.23-git18 Kernel oops in sg helpers
@ 2007-10-23 15:19 Kamalesh Babulal
  2007-10-23 18:44 ` Jens Axboe
  2007-10-23 22:42 ` FUJITA Tomonori
  0 siblings, 2 replies; 13+ messages in thread
From: Kamalesh Babulal @ 2007-10-23 15:19 UTC (permalink / raw)
  To: LKML; +Cc: Jens Axboe, Andy Whitcroft

Hi,

Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
over the AMD box

Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
 [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
PGD 10185b067 PUD 10075b067 PMD 0 
Oops: 0002 [1] SMP 
CPU 3 
Modules linked in:
Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
RIP: 0010:[<ffffffff8021f2f6>]  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
RSP: 0000:ffff810181edf948  EFLAGS: 00010002
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000004 RSI: 0000000000000002 RDI: ffffffff80573dac
RBP: ffff81018ca9a020 R08: 0000000000000004 R09: ffff810181edf8d4
R10: 00000000000000db R11: ffffffff8041926c R12: ffff81018ca9a040
R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff81018071e380(0063) knlGS:00000000f7f9a900
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000000101281000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process fsx-linux (pid: 18676, threadinfo ffff810181ede000, task ffff810181fc0720)
Stack:  0000000300000001 ffff810100000000 ffff81018ca9a040 0000000000000001
 0000000200000002 ffff81018ca9a000 ffff81010079d870 ffff810002903c40
 ffff810082692000 ffff810180712bd0 ffff810002903c70 0000000002000000
Call Trace:
 [<ffffffff803ecb6b>] scsi_dma_map+0x3f/0x4e
 [<ffffffff803fd3fd>] mptscsih_qcmd+0x1bc/0x4af
 [<ffffffff803e6b41>] scsi_dispatch_cmd+0x1e7/0x277
 [<ffffffff803ec0b8>] scsi_request_fn+0x2df/0x369
 [<ffffffff80350e4c>] cfq_insert_request+0x2a6/0x2ae
 [<ffffffff80346b91>] elv_insert+0xcf/0x18a
 [<ffffffff8034a3d6>] __make_request+0x550/0x58b
 [<ffffffff8034a62e>] generic_make_request+0x1bb/0x1f0
 [<ffffffff8034a737>] submit_bio+0xd4/0xdf
 [<ffffffff802a13f7>] dio_bio_submit+0x52/0x66
 [<ffffffff802a2107>] __blockdev_direct_IO+0x813/0xa1c
 [<ffffffff80260f14>] pagevec_lookup_tag+0x1a/0x21
 [<ffffffff802df355>] ext3_direct_IO+0x107/0x19e
 [<ffffffff802dfd8c>] ext3_get_block+0x0/0xe2
 [<ffffffff8025a7b7>] generic_file_direct_IO+0xcb/0x111
 [<ffffffff8025aebb>] generic_file_aio_read+0x86/0x160
 [<ffffffff8027e7a6>] do_sync_read+0xc8/0x10b
 [<ffffffff80298141>] __mark_inode_dirty+0x29/0x17d
 [<ffffffff80245f75>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80290ce3>] notify_change+0x255/0x26a
 [<ffffffff802813dd>] vfs_getattr+0x2b/0x2f
 [<ffffffff802814c5>] vfs_fstat+0x33/0x3a
 [<ffffffff8027e894>] vfs_read+0xab/0x12e
 [<ffffffff8027eb98>] sys_read+0x45/0x6e
 [<ffffffff80222922>] ia32_sysret+0x0/0xa
Code: c7 41 18 00 00 00 00 8b 44 24 20 e9 7b 01 00 00 e8 27 f8 ff 
RIP  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
 RSP <ffff810181edf948>
CR2: 0000000000000018
BUG: soft lockup - CPU#3 stuck for 11s! [swapper:0]
CPU 3:
Modules linked in:
Pid: 0, comm: swapper Tainted: G      D 2.6.23-git18-autokern1 #1
RIP: 0010:[<ffffffff8048b971>]  [<ffffffff8048b971>] _spin_lock_irqsave+0x15/0x24
RSP: 0000:ffff81000177fe98  EFLAGS: 00000286
RAX: 0000000000000282 RBX: ffff81018f6f0000 RCX: ffff81018f6f0068
RDX: ffff810082692800 RSI: 0000000000000001 RDI: ffff810082692850
RBP: ffff81000177fe10 R08: 0000000000000028 R09: 0000000000000086
R10: 0000000000000001 R11: 0000000000000028 R12: ffffffff8020c256
R13: 0000000000000001 R14: ffff810082692800 R15: 0000000000000028
FS:  0000000000000000(0000) GS:ffff81018071e380(0000) knlGS:00000000f7caf080
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000810e708 CR3: 000000018195f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
 <IRQ>  [<ffffffff803e8c5d>] scsi_eh_scmd_add+0x2c/0x9c
 [<ffffffff803e8dab>] scsi_times_out+0x0/0x87
 [<ffffffff803e8e19>] scsi_times_out+0x6e/0x87
 [<ffffffff8023bc46>] run_timer_softirq+0x14f/0x1a0
 [<ffffffff8022d6cf>] scheduler_tick+0xff/0x10b
 [<ffffffff802384e1>] __do_softirq+0x50/0xbb
 [<ffffffff8020c7ac>] call_softirq+0x1c/0x28
 [<ffffffff8020e54f>] do_softirq+0x2e/0x97
 [<ffffffff8021c684>] smp_apic_timer_interrupt+0x3e/0x51
 [<ffffffff802099ee>] default_idle+0x0/0x3d
 [<ffffffff8020c256>] apic_timer_interrupt+0x66/0x70
 <EOI>  [<ffffffff80209a17>] default_idle+0x29/0x3d
 [<ffffffff80209bc4>] cpu_idle+0x8b/0xae



-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-23 15:19 [BUG] 2.6.23-git18 Kernel oops in sg helpers Kamalesh Babulal
@ 2007-10-23 18:44 ` Jens Axboe
  2007-10-24 11:54   ` Andy Whitcroft
  2007-10-23 22:42 ` FUJITA Tomonori
  1 sibling, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2007-10-23 18:44 UTC (permalink / raw)
  To: Kamalesh Babulal; +Cc: LKML, Andy Whitcroft

On Tue, Oct 23 2007, Kamalesh Babulal wrote:
> Hi,
> 
> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> over the AMD box
> 
> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
>  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> PGD 10185b067 PUD 10075b067 PMD 0 
> Oops: 0002 [1] SMP 
> CPU 3 
> Modules linked in:
> Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
> RIP: 0010:[<ffffffff8021f2f6>]  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> RSP: 0000:ffff810181edf948  EFLAGS: 00010002

Can you check where gart_map_sg+0x26c is at? Make sure you have
CONFIG_DEBUG_INFO defined, then do:

$ gdb vmlinux
$ l *gart_map_sg+0x26c

Thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-23 15:19 [BUG] 2.6.23-git18 Kernel oops in sg helpers Kamalesh Babulal
  2007-10-23 18:44 ` Jens Axboe
@ 2007-10-23 22:42 ` FUJITA Tomonori
  2007-10-24  8:32   ` Jens Axboe
  1 sibling, 1 reply; 13+ messages in thread
From: FUJITA Tomonori @ 2007-10-23 22:42 UTC (permalink / raw)
  To: kamalesh; +Cc: linux-kernel, jens.axboe, apw, tomof

On Tue, 23 Oct 2007 20:49:40 +0530
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:

> Hi,
> 
> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> over the AMD box
> 
> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
>  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> PGD 10185b067 PUD 10075b067 PMD 0 

Does this work?


diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index c56e9ee..ae7e016 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -338,7 +338,6 @@ static int __dma_map_cont(struct scatterlist *start, int nelems,
 		
 		BUG_ON(s != start && s->offset);
 		if (s == start) {
-			*sout = *s; 
 			sout->dma_address = iommu_bus_base;
 			sout->dma_address += iommu_page*PAGE_SIZE + s->offset;
 			sout->dma_length = s->length;
@@ -365,7 +364,7 @@ static inline int dma_map_cont(struct scatterlist *start, int nelems,
 {
 	if (!need) {
 		BUG_ON(nelems != 1);
-		*sout = *start;
+		sout->dma_address = start->dma_address;
 		sout->dma_length = start->length;
 		return 0;
 	}
-- 
1.5.2.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-23 22:42 ` FUJITA Tomonori
@ 2007-10-24  8:32   ` Jens Axboe
  2007-10-24  8:50     ` Benny Halevy
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2007-10-24  8:32 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: kamalesh, linux-kernel, apw, tomof

On Wed, Oct 24 2007, FUJITA Tomonori wrote:
> On Tue, 23 Oct 2007 20:49:40 +0530
> Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> 
> > Hi,
> > 
> > Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> > over the AMD box
> > 
> > Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
> >  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> > PGD 10185b067 PUD 10075b067 PMD 0 
> 
> Does this work?
> 
> 
> diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
> index c56e9ee..ae7e016 100644
> --- a/arch/x86/kernel/pci-gart_64.c
> +++ b/arch/x86/kernel/pci-gart_64.c
> @@ -338,7 +338,6 @@ static int __dma_map_cont(struct scatterlist *start, int nelems,
>  		
>  		BUG_ON(s != start && s->offset);
>  		if (s == start) {
> -			*sout = *s; 
>  			sout->dma_address = iommu_bus_base;
>  			sout->dma_address += iommu_page*PAGE_SIZE + s->offset;
>  			sout->dma_length = s->length;
> @@ -365,7 +364,7 @@ static inline int dma_map_cont(struct scatterlist *start, int nelems,
>  {
>  	if (!need) {
>  		BUG_ON(nelems != 1);
> -		*sout = *start;
> +		sout->dma_address = start->dma_address;
>  		sout->dma_length = start->length;
>  		return 0;
>  	}
> -- 
> 1.5.2.4

Care to write up a proper changelog?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-24  8:32   ` Jens Axboe
@ 2007-10-24  8:50     ` Benny Halevy
  2007-10-25  8:53       ` Benny Halevy
  0 siblings, 1 reply; 13+ messages in thread
From: Benny Halevy @ 2007-10-24  8:50 UTC (permalink / raw)
  To: Jens Axboe, FUJITA Tomonori; +Cc: kamalesh, linux-kernel, apw, tomof

On Oct. 24, 2007, 10:32 +0200, Jens Axboe <jens.axboe@oracle.com> wrote:
> On Wed, Oct 24 2007, FUJITA Tomonori wrote:
>> On Tue, 23 Oct 2007 20:49:40 +0530
>> Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>>
>>> Hi,
>>>
>>> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
>>> over the AMD box
>>>
>>> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
>>>  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
>>> PGD 10185b067 PUD 10075b067 PMD 0 
>> Does this work?
>>
>>
>> diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
>> index c56e9ee..ae7e016 100644
>> --- a/arch/x86/kernel/pci-gart_64.c
>> +++ b/arch/x86/kernel/pci-gart_64.c
>> @@ -338,7 +338,6 @@ static int __dma_map_cont(struct scatterlist *start, int nelems,
>>  		
>>  		BUG_ON(s != start && s->offset);
>>  		if (s == start) {
>> -			*sout = *s; 
>>  			sout->dma_address = iommu_bus_base;
>>  			sout->dma_address += iommu_page*PAGE_SIZE + s->offset;
>>  			sout->dma_length = s->length;
>> @@ -365,7 +364,7 @@ static inline int dma_map_cont(struct scatterlist *start, int nelems,
>>  {
>>  	if (!need) {
>>  		BUG_ON(nelems != 1);
>> -		*sout = *start;
>> +		sout->dma_address = start->dma_address;

I don't see this could fix anything since "s" above and "start" here are still
dereferenced.  Also, this makes sout->dma_address inconsistent with sout->page_link
and with the end marker.

Benny

>>  		sout->dma_length = start->length;
>>  		return 0;
>>  	}
>> -- 
>> 1.5.2.4
> 
> Care to write up a proper changelog?
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-23 18:44 ` Jens Axboe
@ 2007-10-24 11:54   ` Andy Whitcroft
  2007-10-24 12:25     ` Jens Axboe
  2007-10-24 12:40     ` FUJITA Tomonori
  0 siblings, 2 replies; 13+ messages in thread
From: Andy Whitcroft @ 2007-10-24 11:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Kamalesh Babulal, LKML

On Tue, Oct 23, 2007 at 08:44:20PM +0200, Jens Axboe wrote:
> On Tue, Oct 23 2007, Kamalesh Babulal wrote:
> > Hi,
> > 
> > Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> > over the AMD box
> > 
> > Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
> >  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> > PGD 10185b067 PUD 10075b067 PMD 0 
> > Oops: 0002 [1] SMP 
> > CPU 3 
> > Modules linked in:
> > Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
> > RIP: 0010:[<ffffffff8021f2f6>]  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> > RSP: 0000:ffff810181edf948  EFLAGS: 00010002
> 
> Can you check where gart_map_sg+0x26c is at? Make sure you have
> CONFIG_DEBUG_INFO defined, then do:
> 
> $ gdb vmlinux
> $ l *gart_map_sg+0x26c

Ok, this problem still seems to be about in 2.6.24-rc1.  Here is the gdb
output from that version, the panic (also below) seems the same:

(gdb) l *gart_map_sg+0x26c
0xffffffff8022011e is in gart_map_sg (arch/x86/kernel/pci-gart_64.c:433).
428                     goto error;
429             out++;
430             flush_gart();
431             if (out < nents) {
432                     sgmap = sg_next(sgmap);
433                     sgmap->dma_length = 0;
434             }
435             return out;
436
437     error:

So it seems sg_next has returned 0.

-apw

elm3b6 login: -- 0:conmux-control -- time-stamp -- Oct/24/07  3:31:05 --
-- 0:conmux-control -- time-stamp -- Oct/24/07  3:46:40 --
Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
 [<ffffffff8022011e>] gart_map_sg+0x26c/0x406
PGD 101a8f067 PUD 10193c067 PMD 0 
Oops: 0002 [1] SMP 
CPU 3 
Modules linked in:
Pid: 18339, comm: fsx-linux Not tainted 2.6.24-rc1-autokern1 #1
RIP: 0010:[<ffffffff8022011e>]  [<ffffffff8022011e>] gart_map_sg+0x26c/0x406
RSP: 0000:ffff810181e03948  EFLAGS: 00010002
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000004 RSI: 0000000000000002 RDI: ffffffff8057918c
RBP: ffff810181d0d820 R08: 0000000000000004 R09: ffff810181e038d4
R10: 00000000000000db R11: ffffffff804198f0 R12: ffff810181d0d840
R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff81018071e380(0063) knlGS:00000000f7fb9900
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000000101a39000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process fsx-linux (pid: 18339, threadinfo ffff810181e02000, task ffff810181f2f560)
Stack:  0000000300000001 ffff810100000000 ffff810181d0d840 0000000000000001
 0000000200000002 ffff810181d0d800 ffff810100773870 ffff810002905da0
 ffff8100022d6000 ffff8101807082f0 ffff810002905dd0 0000000002000000
Call Trace:
 [<ffffffff803ed20b>] scsi_dma_map+0x3f/0x4e
 [<ffffffff803fda81>] mptscsih_qcmd+0x1bc/0x4af
 [<ffffffff803e71ad>] scsi_dispatch_cmd+0x1e7/0x277
 [<ffffffff803ec758>] scsi_request_fn+0x2df/0x369
 [<ffffffff803514a8>] cfq_insert_request+0x2a6/0x2ae
 [<ffffffff803471f5>] elv_insert+0xcf/0x18a
 [<ffffffff8034aa31>] __make_request+0x550/0x58b
 [<ffffffff8034ac89>] generic_make_request+0x1bb/0x1f0
 [<ffffffff8034ad92>] submit_bio+0xd4/0xdf
 [<ffffffff802a15fb>] dio_bio_submit+0x52/0x66
 [<ffffffff802a230b>] __blockdev_direct_IO+0x813/0xa1c
 [<ffffffff80261108>] pagevec_lookup_tag+0x1a/0x21
 [<ffffffff802df9b9>] ext3_direct_IO+0x107/0x19e
 [<ffffffff802e03f0>] ext3_get_block+0x0/0xe2
 [<ffffffff8025a9ab>] generic_file_direct_IO+0xcb/0x111
 [<ffffffff8025b0af>] generic_file_aio_read+0x86/0x160
 [<ffffffff8027e9a2>] do_sync_read+0xc8/0x10b
 [<ffffffff80298345>] __mark_inode_dirty+0x29/0x17d
 [<ffffffff80246141>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80290ee7>] notify_change+0x255/0x26a
 [<ffffffff802815d9>] vfs_getattr+0x2b/0x2f
 [<ffffffff802816c1>] vfs_fstat+0x33/0x3a
 [<ffffffff8027ea90>] vfs_read+0xab/0x12e
 [<ffffffff8027ed94>] sys_read+0x45/0x6e
 [<ffffffff802229d2>] ia32_sysret+0x0/0xa


Code: c7 41 18 00 00 00 00 8b 44 24 20 e9 7b 01 00 00 e8 27 f8 ff 
RIP  [<ffffffff8022011e>] gart_map_sg+0x26c/0x406
 RSP <ffff810181e03948>
CR2: 0000000000000018

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-24 11:54   ` Andy Whitcroft
@ 2007-10-24 12:25     ` Jens Axboe
  2007-10-24 12:40     ` FUJITA Tomonori
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2007-10-24 12:25 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: Kamalesh Babulal, LKML

On Wed, Oct 24 2007, Andy Whitcroft wrote:
> On Tue, Oct 23, 2007 at 08:44:20PM +0200, Jens Axboe wrote:
> > On Tue, Oct 23 2007, Kamalesh Babulal wrote:
> > > Hi,
> > > 
> > > Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> > > over the AMD box
> > > 
> > > Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
> > >  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> > > PGD 10185b067 PUD 10075b067 PMD 0 
> > > Oops: 0002 [1] SMP 
> > > CPU 3 
> > > Modules linked in:
> > > Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
> > > RIP: 0010:[<ffffffff8021f2f6>]  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> > > RSP: 0000:ffff810181edf948  EFLAGS: 00010002
> > 
> > Can you check where gart_map_sg+0x26c is at? Make sure you have
> > CONFIG_DEBUG_INFO defined, then do:
> > 
> > $ gdb vmlinux
> > $ l *gart_map_sg+0x26c
> 
> Ok, this problem still seems to be about in 2.6.24-rc1.  Here is the gdb
> output from that version, the panic (also below) seems the same:
> 
> (gdb) l *gart_map_sg+0x26c
> 0xffffffff8022011e is in gart_map_sg (arch/x86/kernel/pci-gart_64.c:433).
> 428                     goto error;
> 429             out++;
> 430             flush_gart();
> 431             if (out < nents) {
> 432                     sgmap = sg_next(sgmap);
> 433                     sgmap->dma_length = 0;
> 434             }
> 435             return out;
> 436
> 437     error:
> 
> So it seems sg_next has returned 0.

Interesting. Can you add a

        printk("mapped %d of %d\n", out, nents);

prior to that sg_next() call and reproduce?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-24 11:54   ` Andy Whitcroft
  2007-10-24 12:25     ` Jens Axboe
@ 2007-10-24 12:40     ` FUJITA Tomonori
  2007-10-24 16:08       ` Kamalesh Babulal
  1 sibling, 1 reply; 13+ messages in thread
From: FUJITA Tomonori @ 2007-10-24 12:40 UTC (permalink / raw)
  To: apw; +Cc: jens.axboe, kamalesh, linux-kernel

On Wed, 24 Oct 2007 12:54:36 +0100
Andy Whitcroft <apw@shadowen.org> wrote:

> On Tue, Oct 23, 2007 at 08:44:20PM +0200, Jens Axboe wrote:
> > On Tue, Oct 23 2007, Kamalesh Babulal wrote:
> > > Hi,
> > > 
> > > Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> > > over the AMD box
> > > 
> > > Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
> > >  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> > > PGD 10185b067 PUD 10075b067 PMD 0 
> > > Oops: 0002 [1] SMP 
> > > CPU 3 
> > > Modules linked in:
> > > Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
> > > RIP: 0010:[<ffffffff8021f2f6>]  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> > > RSP: 0000:ffff810181edf948  EFLAGS: 00010002
> > 
> > Can you check where gart_map_sg+0x26c is at? Make sure you have
> > CONFIG_DEBUG_INFO defined, then do:
> > 
> > $ gdb vmlinux
> > $ l *gart_map_sg+0x26c
> 
> Ok, this problem still seems to be about in 2.6.24-rc1.  Here is the gdb
> output from that version, the panic (also below) seems the same:
> 
> (gdb) l *gart_map_sg+0x26c
> 0xffffffff8022011e is in gart_map_sg (arch/x86/kernel/pci-gart_64.c:433).
> 428                     goto error;
> 429             out++;
> 430             flush_gart();
> 431             if (out < nents) {
> 432                     sgmap = sg_next(sgmap);
> 433                     sgmap->dma_length = 0;
> 434             }
> 435             return out;
> 436
> 437     error:
> 
> So it seems sg_next has returned 0.

Have you tried this?

http://marc.info/?l=linux-kernel&m=119317981406073&w=2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-24 12:40     ` FUJITA Tomonori
@ 2007-10-24 16:08       ` Kamalesh Babulal
  2007-10-24 18:06         ` Jens Axboe
  2007-10-24 22:09         ` FUJITA Tomonori
  0 siblings, 2 replies; 13+ messages in thread
From: Kamalesh Babulal @ 2007-10-24 16:08 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: apw, jens.axboe, linux-kernel

FUJITA Tomonori wrote:
> On Wed, 24 Oct 2007 12:54:36 +0100
> Andy Whitcroft <apw@shadowen.org> wrote:
> 
>> On Tue, Oct 23, 2007 at 08:44:20PM +0200, Jens Axboe wrote:
>>> On Tue, Oct 23 2007, Kamalesh Babulal wrote:
>>>> Hi,
>>>>
>>>> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
>>>> over the AMD box
>>>>
>>>> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
>>>>  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
>>>> PGD 10185b067 PUD 10075b067 PMD 0 
>>>> Oops: 0002 [1] SMP 
>>>> CPU 3 
>>>> Modules linked in:
>>>> Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
>>>> RIP: 0010:[<ffffffff8021f2f6>]  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
>>>> RSP: 0000:ffff810181edf948  EFLAGS: 00010002
>>> Can you check where gart_map_sg+0x26c is at? Make sure you have
>>> CONFIG_DEBUG_INFO defined, then do:
>>>
>>> $ gdb vmlinux
>>> $ l *gart_map_sg+0x26c
>> Ok, this problem still seems to be about in 2.6.24-rc1.  Here is the gdb
>> output from that version, the panic (also below) seems the same:
>>
>> (gdb) l *gart_map_sg+0x26c
>> 0xffffffff8022011e is in gart_map_sg (arch/x86/kernel/pci-gart_64.c:433).
>> 428                     goto error;
>> 429             out++;
>> 430             flush_gart();
>> 431             if (out < nents) {
>> 432                     sgmap = sg_next(sgmap);
>> 433                     sgmap->dma_length = 0;
>> 434             }
>> 435             return out;
>> 436
>> 437     error:
>>
>> So it seems sg_next has returned 0.
> 
> Have you tried this?
> 
> http://marc.info/?l=linux-kernel&m=119317981406073&w=2
> -
Hi,
Thanks, this patch solves the kernel oops.
-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-24 16:08       ` Kamalesh Babulal
@ 2007-10-24 18:06         ` Jens Axboe
  2007-10-24 22:09         ` FUJITA Tomonori
  1 sibling, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2007-10-24 18:06 UTC (permalink / raw)
  To: Kamalesh Babulal; +Cc: FUJITA Tomonori, apw, linux-kernel

On Wed, Oct 24 2007, Kamalesh Babulal wrote:
> FUJITA Tomonori wrote:
> > On Wed, 24 Oct 2007 12:54:36 +0100
> > Andy Whitcroft <apw@shadowen.org> wrote:
> > 
> >> On Tue, Oct 23, 2007 at 08:44:20PM +0200, Jens Axboe wrote:
> >>> On Tue, Oct 23 2007, Kamalesh Babulal wrote:
> >>>> Hi,
> >>>>
> >>>> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> >>>> over the AMD box
> >>>>
> >>>> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
> >>>>  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> >>>> PGD 10185b067 PUD 10075b067 PMD 0 
> >>>> Oops: 0002 [1] SMP 
> >>>> CPU 3 
> >>>> Modules linked in:
> >>>> Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
> >>>> RIP: 0010:[<ffffffff8021f2f6>]  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> >>>> RSP: 0000:ffff810181edf948  EFLAGS: 00010002
> >>> Can you check where gart_map_sg+0x26c is at? Make sure you have
> >>> CONFIG_DEBUG_INFO defined, then do:
> >>>
> >>> $ gdb vmlinux
> >>> $ l *gart_map_sg+0x26c
> >> Ok, this problem still seems to be about in 2.6.24-rc1.  Here is the gdb
> >> output from that version, the panic (also below) seems the same:
> >>
> >> (gdb) l *gart_map_sg+0x26c
> >> 0xffffffff8022011e is in gart_map_sg (arch/x86/kernel/pci-gart_64.c:433).
> >> 428                     goto error;
> >> 429             out++;
> >> 430             flush_gart();
> >> 431             if (out < nents) {
> >> 432                     sgmap = sg_next(sgmap);
> >> 433                     sgmap->dma_length = 0;
> >> 434             }
> >> 435             return out;
> >> 436
> >> 437     error:
> >>
> >> So it seems sg_next has returned 0.
> > 
> > Have you tried this?
> > 
> > http://marc.info/?l=linux-kernel&m=119317981406073&w=2
> > -
> Hi,
> Thanks, this patch solves the kernel oops.

Tomo, please do write the proper changelog so we can get this upstream.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-24 16:08       ` Kamalesh Babulal
  2007-10-24 18:06         ` Jens Axboe
@ 2007-10-24 22:09         ` FUJITA Tomonori
  2007-10-25  5:34           ` Jens Axboe
  1 sibling, 1 reply; 13+ messages in thread
From: FUJITA Tomonori @ 2007-10-24 22:09 UTC (permalink / raw)
  To: kamalesh, jens.axboe; +Cc: fujita.tomonori, apw, linux-kernel, tomof

On Wed, 24 Oct 2007 21:38:30 +0530
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:

> FUJITA Tomonori wrote:
> > On Wed, 24 Oct 2007 12:54:36 +0100
> > Andy Whitcroft <apw@shadowen.org> wrote:
> > 
> >> On Tue, Oct 23, 2007 at 08:44:20PM +0200, Jens Axboe wrote:
> >>> On Tue, Oct 23 2007, Kamalesh Babulal wrote:
> >>>> Hi,
> >>>>
> >>>> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> >>>> over the AMD box
> >>>>
> >>>> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
> >>>>  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> >>>> PGD 10185b067 PUD 10075b067 PMD 0 
> >>>> Oops: 0002 [1] SMP 
> >>>> CPU 3 
> >>>> Modules linked in:
> >>>> Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
> >>>> RIP: 0010:[<ffffffff8021f2f6>]  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> >>>> RSP: 0000:ffff810181edf948  EFLAGS: 00010002
> >>> Can you check where gart_map_sg+0x26c is at? Make sure you have
> >>> CONFIG_DEBUG_INFO defined, then do:
> >>>
> >>> $ gdb vmlinux
> >>> $ l *gart_map_sg+0x26c
> >> Ok, this problem still seems to be about in 2.6.24-rc1.  Here is the gdb
> >> output from that version, the panic (also below) seems the same:
> >>
> >> (gdb) l *gart_map_sg+0x26c
> >> 0xffffffff8022011e is in gart_map_sg (arch/x86/kernel/pci-gart_64.c:433).
> >> 428                     goto error;
> >> 429             out++;
> >> 430             flush_gart();
> >> 431             if (out < nents) {
> >> 432                     sgmap = sg_next(sgmap);
> >> 433                     sgmap->dma_length = 0;
> >> 434             }
> >> 435             return out;
> >> 436
> >> 437     error:
> >>
> >> So it seems sg_next has returned 0.
> > 
> > Have you tried this?
> > 
> > http://marc.info/?l=linux-kernel&m=119317981406073&w=2
> > -
> Hi,
> Thanks, this patch solves the kernel oops.

Thanks for testing!

Jens, here's the proper changelog.

-
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: [PATCH] x86: pci-gart fix

map_sg could copy the last sg element to another position (if merging
some elements). It breaks sg chaining. This copies only
dma_address/length instead of the whole sg element.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 arch/x86/kernel/pci-gart_64.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
index c56e9ee..ae7e016 100644
--- a/arch/x86/kernel/pci-gart_64.c
+++ b/arch/x86/kernel/pci-gart_64.c
@@ -338,7 +338,6 @@ static int __dma_map_cont(struct scatterlist *start, int nelems,
 		
 		BUG_ON(s != start && s->offset);
 		if (s == start) {
-			*sout = *s; 
 			sout->dma_address = iommu_bus_base;
 			sout->dma_address += iommu_page*PAGE_SIZE + s->offset;
 			sout->dma_length = s->length;
@@ -365,7 +364,7 @@ static inline int dma_map_cont(struct scatterlist *start, int nelems,
 {
 	if (!need) {
 		BUG_ON(nelems != 1);
-		*sout = *start;
+		sout->dma_address = start->dma_address;
 		sout->dma_length = start->length;
 		return 0;
 	}
-- 
1.5.2.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-24 22:09         ` FUJITA Tomonori
@ 2007-10-25  5:34           ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2007-10-25  5:34 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: kamalesh, fujita.tomonori, apw, linux-kernel

On Thu, Oct 25 2007, FUJITA Tomonori wrote:
> On Wed, 24 Oct 2007 21:38:30 +0530
> Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
> 
> > FUJITA Tomonori wrote:
> > > On Wed, 24 Oct 2007 12:54:36 +0100
> > > Andy Whitcroft <apw@shadowen.org> wrote:
> > > 
> > >> On Tue, Oct 23, 2007 at 08:44:20PM +0200, Jens Axboe wrote:
> > >>> On Tue, Oct 23 2007, Kamalesh Babulal wrote:
> > >>>> Hi,
> > >>>>
> > >>>> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
> > >>>> over the AMD box
> > >>>>
> > >>>> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
> > >>>>  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> > >>>> PGD 10185b067 PUD 10075b067 PMD 0 
> > >>>> Oops: 0002 [1] SMP 
> > >>>> CPU 3 
> > >>>> Modules linked in:
> > >>>> Pid: 18676, comm: fsx-linux Not tainted 2.6.23-git18-autokern1 #1
> > >>>> RIP: 0010:[<ffffffff8021f2f6>]  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
> > >>>> RSP: 0000:ffff810181edf948  EFLAGS: 00010002
> > >>> Can you check where gart_map_sg+0x26c is at? Make sure you have
> > >>> CONFIG_DEBUG_INFO defined, then do:
> > >>>
> > >>> $ gdb vmlinux
> > >>> $ l *gart_map_sg+0x26c
> > >> Ok, this problem still seems to be about in 2.6.24-rc1.  Here is the gdb
> > >> output from that version, the panic (also below) seems the same:
> > >>
> > >> (gdb) l *gart_map_sg+0x26c
> > >> 0xffffffff8022011e is in gart_map_sg (arch/x86/kernel/pci-gart_64.c:433).
> > >> 428                     goto error;
> > >> 429             out++;
> > >> 430             flush_gart();
> > >> 431             if (out < nents) {
> > >> 432                     sgmap = sg_next(sgmap);
> > >> 433                     sgmap->dma_length = 0;
> > >> 434             }
> > >> 435             return out;
> > >> 436
> > >> 437     error:
> > >>
> > >> So it seems sg_next has returned 0.
> > > 
> > > Have you tried this?
> > > 
> > > http://marc.info/?l=linux-kernel&m=119317981406073&w=2
> > > -
> > Hi,
> > Thanks, this patch solves the kernel oops.
> 
> Thanks for testing!
> 
> Jens, here's the proper changelog.

Thanks, applied!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [BUG] 2.6.23-git18 Kernel oops in sg helpers
  2007-10-24  8:50     ` Benny Halevy
@ 2007-10-25  8:53       ` Benny Halevy
  0 siblings, 0 replies; 13+ messages in thread
From: Benny Halevy @ 2007-10-25  8:53 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: Jens Axboe, kamalesh, linux-kernel, apw, tomof

On Oct. 24, 2007, 10:50 +0200, Benny Halevy <bhalevy@panasas.com> wrote:
> On Oct. 24, 2007, 10:32 +0200, Jens Axboe <jens.axboe@oracle.com> wrote:
>> On Wed, Oct 24 2007, FUJITA Tomonori wrote:
>>> On Tue, 23 Oct 2007 20:49:40 +0530
>>> Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Kernel oops is triggered while running fsx-linux test, followed by cpu softlock
>>>> over the AMD box
>>>>
>>>> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: 
>>>>  [<ffffffff8021f2f6>] gart_map_sg+0x26c/0x406
>>>> PGD 10185b067 PUD 10075b067 PMD 0 
>>> Does this work?
>>>
>>>
>>> diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c
>>> index c56e9ee..ae7e016 100644
>>> --- a/arch/x86/kernel/pci-gart_64.c
>>> +++ b/arch/x86/kernel/pci-gart_64.c
>>> @@ -338,7 +338,6 @@ static int __dma_map_cont(struct scatterlist *start, int nelems,
>>>  		
>>>  		BUG_ON(s != start && s->offset);
>>>  		if (s == start) {
>>> -			*sout = *s; 
>>>  			sout->dma_address = iommu_bus_base;
>>>  			sout->dma_address += iommu_page*PAGE_SIZE + s->offset;
>>>  			sout->dma_length = s->length;
>>> @@ -365,7 +364,7 @@ static inline int dma_map_cont(struct scatterlist *start, int nelems,
>>>  {
>>>  	if (!need) {
>>>  		BUG_ON(nelems != 1);
>>> -		*sout = *start;
>>> +		sout->dma_address = start->dma_address;
> 
> I don't see this could fix anything since "s" above and "start" here are still
> dereferenced.  Also, this makes sout->dma_address inconsistent with sout->page_link
> and with the end marker.

OK, it took me a day to figure out why the fix is working :)
The end of list marker was copied into sout and later, in line 432
sg_next(sgmap) returned NULL since sgmap became the last entry in the list
(which is strangely correct in the dma mapped vector).

431:	if (out < nents) {
432:		sgmap = sg_next(sgmap);
433:		sgmap->dma_length = 0;
434:	}

Alas, the dma mapping convention apparently requires dma_length == 0
as a terminator if the "compressed" list for dma mapping is shorter than
the sg list.

Although this change does not keep each sg->dma_address in sync with each
sg->page_link, previously there was nothing to keep sg->length in sync with
sg->dma_length so I actually think that keeping the dma mapping and the
page mappings orthogonal and independent may be even better since the
original sg list can still be reused safely even after dma mapping.

> 
> Benny
> 
>>>  		sout->dma_length = start->length;
>>>  		return 0;
>>>  	}
>>> -- 
>>> 1.5.2.4
>> Care to write up a proper changelog?
>>
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-10-25  8:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-23 15:19 [BUG] 2.6.23-git18 Kernel oops in sg helpers Kamalesh Babulal
2007-10-23 18:44 ` Jens Axboe
2007-10-24 11:54   ` Andy Whitcroft
2007-10-24 12:25     ` Jens Axboe
2007-10-24 12:40     ` FUJITA Tomonori
2007-10-24 16:08       ` Kamalesh Babulal
2007-10-24 18:06         ` Jens Axboe
2007-10-24 22:09         ` FUJITA Tomonori
2007-10-25  5:34           ` Jens Axboe
2007-10-23 22:42 ` FUJITA Tomonori
2007-10-24  8:32   ` Jens Axboe
2007-10-24  8:50     ` Benny Halevy
2007-10-25  8:53       ` Benny Halevy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).