LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
@ 2007-01-20 12:23 Justin Piszcz
2007-01-20 12:46 ` Justin Piszcz
2007-01-22 21:01 ` Chuck Ebbert
0 siblings, 2 replies; 15+ messages in thread
From: Justin Piszcz @ 2007-01-20 12:23 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-raid, xfs, Neil Brown
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3336 bytes --]
My .config is attached, please let me know if any other information is
needed and please CC (lkml) as I am not on the list, thanks!
Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to
the RAID5 running XFS.
Any idea what happened here?
[473795.214705] BUG: unable to handle kernel paging request at virtual
address fffb92b0
[473795.214715] printing eip:
[473795.214718] c0358b14
[473795.214721] *pde = 00003067
[473795.214723] *pte = 00000000
[473795.214726] Oops: 0000 [#1]
[473795.214729] PREEMPT SMP
[473795.214736] CPU: 0
[473795.214737] EIP: 0060:[<c0358b14>] Not tainted VLI
[473795.214738] EFLAGS: 00010286 (2.6.19.2 #1)
[473795.214746] EIP is at copy_data+0x6c/0x179
[473795.214750] eax: 00000000 ebx: 00001000 ecx: 00000354 edx: fffb9000
[473795.214754] esi: fffb92b0 edi: da86c2b0 ebp: 00001000 esp: f7927dc4
[473795.214757] ds: 007b es: 007b ss: 0068
[473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 task.ti=f7926000)
[473795.214765] Stack: c1ba7c40 00000003 f5538c80 00000001 da86c000 00000009 00000000 0000006c
[473795.214790] 00001000 da8536a8 aa6fee90 f5538c80 00000190 c0358d00 aa6fee88 0000ffff
[473795.214863] d7c5794c 00000001 da853488 f6fbec70 f6fbebc0 00000001 00000005 00000001
[473795.214876] Call Trace:
[473795.214880] [<c0358d00>] compute_parity5+0xdf/0x497
[473795.214887] [<c035b0dd>] handle_stripe+0x930/0x2986
[473795.214892] [<c01146b9>] find_busiest_group+0x124/0x4fd
[473795.214898] [<c03580e0>] release_stripe+0x21/0x2e
[473795.214902] [<c035d233>] raid5d+0x100/0x161
[473795.214907] [<c036b03c>] md_thread+0x40/0x103
[473795.214912] [<c012dbbe>] autoremove_wake_function+0x0/0x4b
[473795.214917] [<c036affc>] md_thread+0x0/0x103
[473795.214922] [<c012da1a>] kthread+0xfc/0x100
[473795.214926] [<c012d91e>] kthread+0x0/0x100
[473795.214930] [<c0103b4b>] kernel_thread_helper+0x7/0x1c
[473795.214935] =======================
[473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0
01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34
02 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14
[473795.215017] EIP: [<c0358b14>] copy_data+0x6c/0x179 SS:ESP
0068:f7927dc4
[473795.215024] <6>note: md4_raid5[1305] exited with preempt_count 2
# mdadm -D /dev/md4
/dev/md4:
Version : 01.00.03
Creation Time : Wed Jan 10 15:58:52 2007
Raid Level : raid5
Array Size : 1562834432 (1490.44 GiB 1600.34 GB)
Device Size : 781417216 (372.61 GiB 400.09 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 4
Persistence : Superblock is persistent
Update Time : Sat Jan 20 07:15:01 2007
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
Name : 4
UUID : 7f453e18:893e4dd9:6e810372:4c724f49
Events : 33
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 81 1 active sync /dev/sdf1
2 8 113 2 active sync /dev/sdh1
3 8 65 3 active sync /dev/sde1
5 8 49 4 active sync /dev/sdd1
[-- Attachment #2: Type: APPLICATION/octet-stream, Size: 7224 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-20 12:23 Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5) Justin Piszcz
@ 2007-01-20 12:46 ` Justin Piszcz
2007-01-22 21:01 ` Chuck Ebbert
1 sibling, 0 replies; 15+ messages in thread
From: Justin Piszcz @ 2007-01-20 12:46 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-raid, xfs, Neil Brown
On Sat, 20 Jan 2007, Justin Piszcz wrote:
> My .config is attached, please let me know if any other information is
> needed and please CC (lkml) as I am not on the list, thanks!
>
> Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to
> the RAID5 running XFS.
>
> Any idea what happened here?
>
It happened again under heavy read I/O when I was running md5sum -c on
some of my files.
[ 551.942958] BUG: unable to handle kernel paging request at virtual address fffb97b0
[ 551.942970] printing eip:
[ 551.942972] c0358bd8
[ 551.942974] *pde = 00003067
[ 551.942976] *pte = 00000000
[ 551.942980] Oops: 0002 [#1]
[ 551.942982] PREEMPT SMP
[ 551.942989] CPU: 0
[ 551.942990] EIP: 0060:[<c0358bd8>] Not tainted VLI
[ 551.942991] EFLAGS: 00010286 (2.6.19.2 #1)
[ 551.942999] EIP is at copy_data+0x130/0x179
[ 551.943001] eax: 00000000 ebx: 00001000 ecx: 00000214 edx: fffb9000
[ 551.943005] esi: dd2007b0 edi: fffb97b0 ebp: 00001000 esp: f76ffe1c
[ 551.943007] ds: 007b es: 007b ss: 0068
[ 551.943011] Process md4_raid5 (pid: 1309, ti=f76fe000 task=f7081560 task.ti=f76fe000)
[ 551.943013] Stack: c1d880c0 00000003 cd2f0540 00000000 dd200000 0000000e 00000000 000000a8
[ 551.943027] 00001000 cd2f0540 dd1f1adc f6435c48 dd1f1ad8 c035a977 34f3db20 c027be16
[ 551.943043] c0553328 00000002 00000002 c01146b9 f6435c48 c0553328 f6435c48 dd1f193c
[ 551.943056] Call Trace:
[ 551.943059] [<c035a977>] handle_stripe+0x1ca/0x2986
[ 551.943065] [<c027be16>] __next_cpu+0x22/0x33
[ 551.943072] [<c01146b9>] find_busiest_group+0x124/0x4fd
[ 551.943136] [<c01140af>] __wake_up+0x32/0x43
[ 551.943140] [<c03580e0>] release_stripe+0x21/0x2e
[ 551.943145] [<c035d233>] raid5d+0x100/0x161
[ 551.943150] [<c036b03c>] md_thread+0x40/0x103
[ 551.943155] [<c012dbbe>] autoremove_wake_function+0x0/0x4b
[ 551.943160] [<c036affc>] md_thread+0x0/0x103
[ 551.943165] [<c012da1a>] kthread+0xfc/0x100
[ 551.943169] [<c012d91e>] kthread+0x0/0x100
[ 551.943173] [<c0103b4b>] kernel_thread_helper+0x7/0x1c
[ 551.943178] =======================
[ 551.943180] Code: 8b 4c 24 08 8b 41 2c 8b 4c 24 1c 03 54 08 08 8b 44 24
0c 85 c0 0f 85 3a ff ff ff 89 d9 c1 e9 02 8b 44 24 18 8d 3c 02 03 74 24 10
<f3> a5 89 d9 83 e1 03 74 02 f3 a4 e9 37 ff ff ff 01 ee 89 74 24
[ 551.943254] EIP: [<c0358bd8>] copy_data+0x130/0x179 SS:ESP 0068:f76ffe1c
[ 551.943262] <6>note: md4_raid5[1309] exited with preempt_count 3
I will run resync/check on this array and then see if that fixes it.
Justin.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-20 12:23 Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5) Justin Piszcz
2007-01-20 12:46 ` Justin Piszcz
@ 2007-01-22 21:01 ` Chuck Ebbert
2007-01-22 21:59 ` Neil Brown
2007-01-24 23:37 ` Justin Piszcz
1 sibling, 2 replies; 15+ messages in thread
From: Chuck Ebbert @ 2007-01-22 21:01 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs, Neil Brown
Justin Piszcz wrote:
> My .config is attached, please let me know if any other information is
> needed and please CC (lkml) as I am not on the list, thanks!
>
> Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to
> the RAID5 running XFS.
>
> Any idea what happened here?
>
> [473795.214705] BUG: unable to handle kernel paging request at virtual
> address fffb92b0
> [473795.214715] printing eip:
> [473795.214718] c0358b14
> [473795.214721] *pde = 00003067
> [473795.214723] *pte = 00000000
> [473795.214726] Oops: 0000 [#1]
> [473795.214729] PREEMPT SMP
> [473795.214736] CPU: 0
> [473795.214737] EIP: 0060:[<c0358b14>] Not tainted VLI
> [473795.214738] EFLAGS: 00010286 (2.6.19.2 #1)
> [473795.214746] EIP is at copy_data+0x6c/0x179
> [473795.214750] eax: 00000000 ebx: 00001000 ecx: 00000354 edx: fffb9000
> [473795.214754] esi: fffb92b0 edi: da86c2b0 ebp: 00001000 esp: f7927dc4
> [473795.214757] ds: 007b es: 007b ss: 0068
> [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030 task.ti=f7926000)
> [473795.214765] Stack: c1ba7c40 00000003 f5538c80 00000001 da86c000 00000009 00000000 0000006c
> [473795.214790] 00001000 da8536a8 aa6fee90 f5538c80 00000190 c0358d00 aa6fee88 0000ffff
> [473795.214863] d7c5794c 00000001 da853488 f6fbec70 f6fbebc0 00000001 00000005 00000001
> [473795.214876] Call Trace:
> [473795.214880] [<c0358d00>] compute_parity5+0xdf/0x497
> [473795.214887] [<c035b0dd>] handle_stripe+0x930/0x2986
> [473795.214892] [<c01146b9>] find_busiest_group+0x124/0x4fd
> [473795.214898] [<c03580e0>] release_stripe+0x21/0x2e
> [473795.214902] [<c035d233>] raid5d+0x100/0x161
> [473795.214907] [<c036b03c>] md_thread+0x40/0x103
> [473795.214912] [<c012dbbe>] autoremove_wake_function+0x0/0x4b
> [473795.214917] [<c036affc>] md_thread+0x0/0x103
> [473795.214922] [<c012da1a>] kthread+0xfc/0x100
> [473795.214926] [<c012d91e>] kthread+0x0/0x100
> [473795.214930] [<c0103b4b>] kernel_thread_helper+0x7/0x1c
> [473795.214935] =======================
> [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0
> 01 c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34
> 02 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14
> [473795.215017] EIP: [<c0358b14>] copy_data+0x6c/0x179 SS:ESP
> 0068:f7927dc4
>
Without digging too deeply, I'd say you've hit the same bug Sami Farin
and others
have reported starting with 2.6.19: pages mapped with kmap_atomic()
become unmapped
during memcpy() or similar operations. Try disabling preempt -- that
seems to be the
common factor.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-22 21:01 ` Chuck Ebbert
@ 2007-01-22 21:59 ` Neil Brown
2007-01-23 1:44 ` Dan Williams
2007-01-23 10:56 ` Justin Piszcz
2007-01-24 23:37 ` Justin Piszcz
1 sibling, 2 replies; 15+ messages in thread
From: Neil Brown @ 2007-01-22 21:59 UTC (permalink / raw)
To: Chuck Ebbert; +Cc: Justin Piszcz, linux-kernel, linux-raid, xfs
On Monday January 22, cebbert@redhat.com wrote:
> Justin Piszcz wrote:
> > My .config is attached, please let me know if any other information is
> > needed and please CC (lkml) as I am not on the list, thanks!
> >
> > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to
> > the RAID5 running XFS.
> >
> > Any idea what happened here?
....
> >
> Without digging too deeply, I'd say you've hit the same bug Sami Farin
> and others
> have reported starting with 2.6.19: pages mapped with kmap_atomic()
> become unmapped
> during memcpy() or similar operations. Try disabling preempt -- that
> seems to be the
> common factor.
That is exactly the conclusion I had just come to (a kmap_atomic page
must be being unmapped during memcpy). I wasn't aware that others had
reported it - thanks for that.
Turning off CONFIG_PREEMPT certainly seems like a good idea.
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-22 21:59 ` Neil Brown
@ 2007-01-23 1:44 ` Dan Williams
2007-01-23 2:06 ` Neil Brown
2007-01-23 10:56 ` Justin Piszcz
1 sibling, 1 reply; 15+ messages in thread
From: Dan Williams @ 2007-01-23 1:44 UTC (permalink / raw)
To: Neil Brown; +Cc: Chuck Ebbert, Justin Piszcz, linux-kernel, linux-raid, xfs
On 1/22/07, Neil Brown <neilb@suse.de> wrote:
> On Monday January 22, cebbert@redhat.com wrote:
> > Justin Piszcz wrote:
> > > My .config is attached, please let me know if any other information is
> > > needed and please CC (lkml) as I am not on the list, thanks!
> > >
> > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to
> > > the RAID5 running XFS.
> > >
> > > Any idea what happened here?
> ....
> > >
> > Without digging too deeply, I'd say you've hit the same bug Sami Farin
> > and others
> > have reported starting with 2.6.19: pages mapped with kmap_atomic()
> > become unmapped
> > during memcpy() or similar operations. Try disabling preempt -- that
> > seems to be the
> > common factor.
>
> That is exactly the conclusion I had just come to (a kmap_atomic page
> must be being unmapped during memcpy). I wasn't aware that others had
> reported it - thanks for that.
>
> Turning off CONFIG_PREEMPT certainly seems like a good idea.
>
Coming from an ARM background I am not yet versed in the inner
workings of kmap_atomic, but if you have time for a question I am
curious as to why spin_lock(&sh->lock) is not sufficient pre-emption
protection for copy_data() in this case?
> NeilBrown
Regards,
Dan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-23 1:44 ` Dan Williams
@ 2007-01-23 2:06 ` Neil Brown
0 siblings, 0 replies; 15+ messages in thread
From: Neil Brown @ 2007-01-23 2:06 UTC (permalink / raw)
To: Dan Williams; +Cc: Chuck Ebbert, Justin Piszcz, linux-kernel, linux-raid, xfs
On Monday January 22, dan.j.williams@gmail.com wrote:
> On 1/22/07, Neil Brown <neilb@suse.de> wrote:
> > On Monday January 22, cebbert@redhat.com wrote:
> > > Justin Piszcz wrote:
> > > > My .config is attached, please let me know if any other information is
> > > > needed and please CC (lkml) as I am not on the list, thanks!
> > > >
> > > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to
> > > > the RAID5 running XFS.
> > > >
> > > > Any idea what happened here?
> > ....
> > > >
> > > Without digging too deeply, I'd say you've hit the same bug Sami Farin
> > > and others
> > > have reported starting with 2.6.19: pages mapped with kmap_atomic()
> > > become unmapped
> > > during memcpy() or similar operations. Try disabling preempt -- that
> > > seems to be the
> > > common factor.
> >
> > That is exactly the conclusion I had just come to (a kmap_atomic page
> > must be being unmapped during memcpy). I wasn't aware that others had
> > reported it - thanks for that.
> >
> > Turning off CONFIG_PREEMPT certainly seems like a good idea.
> >
> Coming from an ARM background I am not yet versed in the inner
> workings of kmap_atomic, but if you have time for a question I am
> curious as to why spin_lock(&sh->lock) is not sufficient pre-emption
> protection for copy_data() in this case?
>
Presumably there is a bug somewhere.
kmap_atomic itself calls inc_preempt_count so that preemption should
be disabled at least until the kunmap_atomic is called.
But apparently not. The symptoms point exactly to the page getting
unmapped when it shouldn't. Until that bug is found and fixed, the
work around of turning of CONFIG_PREEMPT seems to make sense.
Of course it would be great if someone who can easily reproduce this
bug could do the 'git bisect' thing to find out where the bug crept
in.....
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-22 21:59 ` Neil Brown
2007-01-23 1:44 ` Dan Williams
@ 2007-01-23 10:56 ` Justin Piszcz
2007-01-23 11:08 ` Michael Tokarev
1 sibling, 1 reply; 15+ messages in thread
From: Justin Piszcz @ 2007-01-23 10:56 UTC (permalink / raw)
To: Neil Brown; +Cc: Chuck Ebbert, linux-kernel, linux-raid, xfs
On Tue, 23 Jan 2007, Neil Brown wrote:
> On Monday January 22, cebbert@redhat.com wrote:
> > Justin Piszcz wrote:
> > > My .config is attached, please let me know if any other information is
> > > needed and please CC (lkml) as I am not on the list, thanks!
> > >
> > > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to
> > > the RAID5 running XFS.
> > >
> > > Any idea what happened here?
> ....
> > >
> > Without digging too deeply, I'd say you've hit the same bug Sami Farin
> > and others
> > have reported starting with 2.6.19: pages mapped with kmap_atomic()
> > become unmapped
> > during memcpy() or similar operations. Try disabling preempt -- that
> > seems to be the
> > common factor.
>
> That is exactly the conclusion I had just come to (a kmap_atomic page
> must be being unmapped during memcpy). I wasn't aware that others had
> reported it - thanks for that.
>
> Turning off CONFIG_PREEMPT certainly seems like a good idea.
>
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Is this a bug that can or will be fixed or should I disable pre-emption on
critical and/or server machines?
Justin.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-23 10:56 ` Justin Piszcz
@ 2007-01-23 11:08 ` Michael Tokarev
2007-01-23 11:59 ` Justin Piszcz
0 siblings, 1 reply; 15+ messages in thread
From: Michael Tokarev @ 2007-01-23 11:08 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-kernel, linux-raid
Justin Piszcz wrote:
[]
> Is this a bug that can or will be fixed or should I disable pre-emption on
> critical and/or server machines?
Disabling pre-emption on critical and/or server machines seems to be a good
idea in the first place. IMHO anyway.. ;)
/mjt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-23 11:08 ` Michael Tokarev
@ 2007-01-23 11:59 ` Justin Piszcz
2007-01-23 12:48 ` Michael Tokarev
0 siblings, 1 reply; 15+ messages in thread
From: Justin Piszcz @ 2007-01-23 11:59 UTC (permalink / raw)
To: Michael Tokarev; +Cc: linux-kernel, linux-raid, xfs, Alan Piszcz
On Tue, 23 Jan 2007, Michael Tokarev wrote:
> Justin Piszcz wrote:
> []
> > Is this a bug that can or will be fixed or should I disable pre-emption on
> > critical and/or server machines?
>
> Disabling pre-emption on critical and/or server machines seems to be a good
> idea in the first place. IMHO anyway.. ;)
>
> /mjt
>
So for a server system, the following options should be as follows:
Preemption Model (No Forced Preemption (Server)) --->
[ ] Preempt The Big Kernel Lock
Also, my mobo has HPET timer support in the BIOS, is there any reason to
use this on a server? I do run X on it via the Intel 965 chipset video.
So bottom line is make sure not to use preemption on servers or else you
will get weird spinlock/deadlocks on RAID devices--GOOD To know!
Thanks!
Justin.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-23 11:59 ` Justin Piszcz
@ 2007-01-23 12:48 ` Michael Tokarev
2007-01-23 13:46 ` Justin Piszcz
0 siblings, 1 reply; 15+ messages in thread
From: Michael Tokarev @ 2007-01-23 12:48 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs, Alan Piszcz
Justin Piszcz wrote:
>
> On Tue, 23 Jan 2007, Michael Tokarev wrote:
>
>> Disabling pre-emption on critical and/or server machines seems to be a good
>> idea in the first place. IMHO anyway.. ;)
>
> So bottom line is make sure not to use preemption on servers or else you
> will get weird spinlock/deadlocks on RAID devices--GOOD To know!
This is not a reason. The reason is that preemption usually works worse
on servers, esp. high-loaded servers - the more often you interrupt a
(kernel) work, the more nedleess context switches you'll have, and the
more slow the whole thing works.
Another point is that with preemption enabled, we have more chances to
hit one or another bug somewhere. Those bugs should be found and fixed
for sure, but important servers/data isn't a place usually for bughunting.
/mjt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-23 12:48 ` Michael Tokarev
@ 2007-01-23 13:46 ` Justin Piszcz
0 siblings, 0 replies; 15+ messages in thread
From: Justin Piszcz @ 2007-01-23 13:46 UTC (permalink / raw)
To: Michael Tokarev; +Cc: linux-kernel, linux-raid, xfs, Alan Piszcz
On Tue, 23 Jan 2007, Michael Tokarev wrote:
> Justin Piszcz wrote:
> >
> > On Tue, 23 Jan 2007, Michael Tokarev wrote:
> >
> >> Disabling pre-emption on critical and/or server machines seems to be a good
> >> idea in the first place. IMHO anyway.. ;)
> >
> > So bottom line is make sure not to use preemption on servers or else you
> > will get weird spinlock/deadlocks on RAID devices--GOOD To know!
>
> This is not a reason. The reason is that preemption usually works worse
> on servers, esp. high-loaded servers - the more often you interrupt a
> (kernel) work, the more nedleess context switches you'll have, and the
> more slow the whole thing works.
>
> Another point is that with preemption enabled, we have more chances to
> hit one or another bug somewhere. Those bugs should be found and fixed
> for sure, but important servers/data isn't a place usually for bughunting.
>
> /mjt
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Thanks for the update/info.
Justin.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-22 21:01 ` Chuck Ebbert
2007-01-22 21:59 ` Neil Brown
@ 2007-01-24 23:37 ` Justin Piszcz
2007-01-26 9:25 ` Andrew Morton
1 sibling, 1 reply; 15+ messages in thread
From: Justin Piszcz @ 2007-01-24 23:37 UTC (permalink / raw)
To: Chuck Ebbert; +Cc: linux-kernel, linux-raid, xfs, Neil Brown
On Mon, 22 Jan 2007, Chuck Ebbert wrote:
> Justin Piszcz wrote:
> > My .config is attached, please let me know if any other information is
> > needed and please CC (lkml) as I am not on the list, thanks!
> >
> > Running Kernel 2.6.19.2 on a MD RAID5 volume. Copying files over Samba to
> > the RAID5 running XFS.
> >
> > Any idea what happened here?
> >
> > [473795.214705] BUG: unable to handle kernel paging request at virtual
> > address fffb92b0
> > [473795.214715] printing eip:
> > [473795.214718] c0358b14
> > [473795.214721] *pde = 00003067
> > [473795.214723] *pte = 00000000
> > [473795.214726] Oops: 0000 [#1]
> > [473795.214729] PREEMPT SMP [473795.214736] CPU: 0
> > [473795.214737] EIP: 0060:[<c0358b14>] Not tainted VLI
> > [473795.214738] EFLAGS: 00010286 (2.6.19.2 #1)
> > [473795.214746] EIP is at copy_data+0x6c/0x179
> > [473795.214750] eax: 00000000 ebx: 00001000 ecx: 00000354 edx:
> > fffb9000
> > [473795.214754] esi: fffb92b0 edi: da86c2b0 ebp: 00001000 esp:
> > f7927dc4
> > [473795.214757] ds: 007b es: 007b ss: 0068
> > [473795.214761] Process md4_raid5 (pid: 1305, ti=f7926000 task=f7ea9030
> > task.ti=f7926000)
> > [473795.214765] Stack: c1ba7c40 00000003 f5538c80 00000001 da86c000 00000009
> > 00000000 0000006c [473795.214790] 00001000 da8536a8 aa6fee90 f5538c80
> > 00000190 c0358d00 aa6fee88 0000ffff [473795.214863] d7c5794c 00000001
> > da853488 f6fbec70 f6fbebc0 00000001 00000005 00000001 [473795.214876] Call
> > Trace:
> > [473795.214880] [<c0358d00>] compute_parity5+0xdf/0x497
> > [473795.214887] [<c035b0dd>] handle_stripe+0x930/0x2986
> > [473795.214892] [<c01146b9>] find_busiest_group+0x124/0x4fd
> > [473795.214898] [<c03580e0>] release_stripe+0x21/0x2e
> > [473795.214902] [<c035d233>] raid5d+0x100/0x161
> > [473795.214907] [<c036b03c>] md_thread+0x40/0x103
> > [473795.214912] [<c012dbbe>] autoremove_wake_function+0x0/0x4b
> > [473795.214917] [<c036affc>] md_thread+0x0/0x103
> > [473795.214922] [<c012da1a>] kthread+0xfc/0x100
> > [473795.214926] [<c012d91e>] kthread+0x0/0x100
> > [473795.214930] [<c0103b4b>] kernel_thread_helper+0x7/0x1c
> > [473795.214935] =======================
> > [473795.214938] Code: 14 39 d1 0f 8d 10 01 00 00 89 c8 01 c0 01 c8 01 c0 01
> > c0 89 44 24 1c eb 51 89 d9 c1 e9 02 8b 7c 24 10 01 f7 8b 44 24 18 8d 34 02
> > <f3> a5 89 d9 83 e1 03 74 02 f3 a4 c7 44 24 04 03 00 00 00 89 14
> > [473795.215017] EIP: [<c0358b14>] copy_data+0x6c/0x179 SS:ESP 0068:f7927dc4
> >
> Without digging too deeply, I'd say you've hit the same bug Sami Farin and
> others
> have reported starting with 2.6.19: pages mapped with kmap_atomic() become
> unmapped
> during memcpy() or similar operations. Try disabling preempt -- that seems to
> be the
> common factor.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
After I run some other tests, I am going to re-run this test and see if it
OOPSes again with PREEMPT off.
Justin.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-24 23:37 ` Justin Piszcz
@ 2007-01-26 9:25 ` Andrew Morton
2007-01-26 9:37 ` Justin Piszcz
2007-01-26 12:31 ` Justin Piszcz
0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2007-01-26 9:25 UTC (permalink / raw)
To: Justin Piszcz; +Cc: Chuck Ebbert, linux-kernel, linux-raid, xfs, Neil Brown
On Wed, 24 Jan 2007 18:37:15 -0500 (EST)
Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> > Without digging too deeply, I'd say you've hit the same bug Sami Farin and
> > others
> > have reported starting with 2.6.19: pages mapped with kmap_atomic() become
> > unmapped
> > during memcpy() or similar operations. Try disabling preempt -- that seems to
> > be the
> > common factor.
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
>
> After I run some other tests, I am going to re-run this test and see if it
> OOPSes again with PREEMPT off.
Strange. The below debug patch might catch it - please run with this
applied.
--- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
+++ a/arch/i386/mm/highmem.c
@@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu
{
enum fixed_addresses idx;
unsigned long vaddr;
+ static unsigned warn_count = 10;
+ if (unlikely(warn_count == 0))
+ goto skip;
+
+ if (unlikely(in_interrupt())) {
+ if (in_irq()) {
+ if (type != KM_IRQ0 && type != KM_IRQ1 &&
+ type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
+ type != KM_BOUNCE_READ) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ } else if (!irqs_disabled()) { /* softirq */
+ if (type != KM_IRQ0 && type != KM_IRQ1 &&
+ type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
+ type != KM_SKB_SUNRPC_DATA &&
+ type != KM_SKB_DATA_SOFTIRQ &&
+ type != KM_BOUNCE_READ) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ }
+ }
+
+ if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) {
+ if (!irqs_disabled()) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
+ if (irq_count() == 0 && !irqs_disabled()) {
+ WARN_ON(1);
+ warn_count--;
+ }
+ }
+skip:
/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
pagefault_disable();
if (!PageHighMem(page))
_
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-26 9:25 ` Andrew Morton
@ 2007-01-26 9:37 ` Justin Piszcz
2007-01-26 12:31 ` Justin Piszcz
1 sibling, 0 replies; 15+ messages in thread
From: Justin Piszcz @ 2007-01-26 9:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: Chuck Ebbert, linux-kernel, linux-raid, xfs, Neil Brown
On Fri, 26 Jan 2007, Andrew Morton wrote:
> On Wed, 24 Jan 2007 18:37:15 -0500 (EST)
> Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>
> > > Without digging too deeply, I'd say you've hit the same bug Sami Farin and
> > > others
> > > have reported starting with 2.6.19: pages mapped with kmap_atomic() become
> > > unmapped
> > > during memcpy() or similar operations. Try disabling preempt -- that seems to
> > > be the
> > > common factor.
> > >
> > >
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > >
> >
> > After I run some other tests, I am going to re-run this test and see if it
> > OOPSes again with PREEMPT off.
>
> Strange. The below debug patch might catch it - please run with this
> applied.
>
>
> --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
> +++ a/arch/i386/mm/highmem.c
> @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu
> {
> enum fixed_addresses idx;
> unsigned long vaddr;
> + static unsigned warn_count = 10;
>
> + if (unlikely(warn_count == 0))
> + goto skip;
> +
> + if (unlikely(in_interrupt())) {
> + if (in_irq()) {
> + if (type != KM_IRQ0 && type != KM_IRQ1 &&
> + type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
> + type != KM_BOUNCE_READ) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + } else if (!irqs_disabled()) { /* softirq */
> + if (type != KM_IRQ0 && type != KM_IRQ1 &&
> + type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
> + type != KM_SKB_SUNRPC_DATA &&
> + type != KM_SKB_DATA_SOFTIRQ &&
> + type != KM_BOUNCE_READ) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + }
> + }
> +
> + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) {
> + if (!irqs_disabled()) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
> + if (irq_count() == 0 && !irqs_disabled()) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + }
> +skip:
> /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
> pagefault_disable();
> if (!PageHighMem(page))
> _
>
>
The RAID5 bug may be hard to trigger, I have only made it happen once so
far (but only tried it once, don't like locking up the raid :)), I will
re-run the test after applying this patch.
Justin.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
2007-01-26 9:25 ` Andrew Morton
2007-01-26 9:37 ` Justin Piszcz
@ 2007-01-26 12:31 ` Justin Piszcz
1 sibling, 0 replies; 15+ messages in thread
From: Justin Piszcz @ 2007-01-26 12:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: Chuck Ebbert, linux-kernel, linux-raid, xfs, Neil Brown
Just re-ran the test 4-5 times, could not reproduce this one, but I'll
keep running this kernel w/patch for a while and see if it happens again.
On Fri, 26 Jan 2007, Andrew Morton wrote:
> On Wed, 24 Jan 2007 18:37:15 -0500 (EST)
> Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>
> > > Without digging too deeply, I'd say you've hit the same bug Sami Farin and
> > > others
> > > have reported starting with 2.6.19: pages mapped with kmap_atomic() become
> > > unmapped
> > > during memcpy() or similar operations. Try disabling preempt -- that seems to
> > > be the
> > > common factor.
> > >
> > >
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > >
> >
> > After I run some other tests, I am going to re-run this test and see if it
> > OOPSes again with PREEMPT off.
>
> Strange. The below debug patch might catch it - please run with this
> applied.
>
>
> --- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
> +++ a/arch/i386/mm/highmem.c
> @@ -30,7 +30,43 @@ void *kmap_atomic(struct page *page, enu
> {
> enum fixed_addresses idx;
> unsigned long vaddr;
> + static unsigned warn_count = 10;
>
> + if (unlikely(warn_count == 0))
> + goto skip;
> +
> + if (unlikely(in_interrupt())) {
> + if (in_irq()) {
> + if (type != KM_IRQ0 && type != KM_IRQ1 &&
> + type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
> + type != KM_BOUNCE_READ) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + } else if (!irqs_disabled()) { /* softirq */
> + if (type != KM_IRQ0 && type != KM_IRQ1 &&
> + type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
> + type != KM_SKB_SUNRPC_DATA &&
> + type != KM_SKB_DATA_SOFTIRQ &&
> + type != KM_BOUNCE_READ) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + }
> + }
> +
> + if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ) {
> + if (!irqs_disabled()) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + } else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
> + if (irq_count() == 0 && !irqs_disabled()) {
> + WARN_ON(1);
> + warn_count--;
> + }
> + }
> +skip:
> /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
> pagefault_disable();
> if (!PageHighMem(page))
> _
>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-01-26 12:31 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-20 12:23 Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5) Justin Piszcz
2007-01-20 12:46 ` Justin Piszcz
2007-01-22 21:01 ` Chuck Ebbert
2007-01-22 21:59 ` Neil Brown
2007-01-23 1:44 ` Dan Williams
2007-01-23 2:06 ` Neil Brown
2007-01-23 10:56 ` Justin Piszcz
2007-01-23 11:08 ` Michael Tokarev
2007-01-23 11:59 ` Justin Piszcz
2007-01-23 12:48 ` Michael Tokarev
2007-01-23 13:46 ` Justin Piszcz
2007-01-24 23:37 ` Justin Piszcz
2007-01-26 9:25 ` Andrew Morton
2007-01-26 9:37 ` Justin Piszcz
2007-01-26 12:31 ` Justin Piszcz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).