LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
@ 2007-03-18 20:50 Stefan Priebe
2007-03-20 7:27 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Priebe @ 2007-03-18 20:50 UTC (permalink / raw)
To: linux-kernel
Hello!
We've a very strange Problem with Kernel 2.6.20.x
If i try to access a SCSI or SATA Disk (tested with Adaptec U320
ASC-29320, ICP Vortex 9024, Promise TX300) the whole server hangs - no
output - no error on the screen - but it hangs completely. But it does
not happen on all our systems affected are only old 604pin xeons and
socket 940 Opterons. Socket F Opteron or 771 Xeons does work fine.
I've also testet apci=off pci=routeirq but both does not help. The
systems work fine with 2.6.19.x and before.
Stefan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
2007-03-18 20:50 Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers Stefan Priebe
@ 2007-03-20 7:27 ` Andrew Morton
2007-03-20 10:33 ` Stefan Priebe
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2007-03-20 7:27 UTC (permalink / raw)
To: Stefan Priebe; +Cc: linux-kernel, linux-scsi
On Sun, 18 Mar 2007 21:50:46 +0100 Stefan Priebe <stefan@prie.be> wrote:
> Hello!
>
> We've a very strange Problem with Kernel 2.6.20.x
>
> If i try to access a SCSI or SATA Disk (tested with Adaptec U320
> ASC-29320, ICP Vortex 9024, Promise TX300) the whole server hangs - no
> output - no error on the screen - but it hangs completely. But it does
> not happen on all our systems affected are only old 604pin xeons and
> socket 940 Opterons. Socket F Opteron or 771 Xeons does work fine.
>
> I've also testet apci=off pci=routeirq but both does not help. The
> systems work fine with 2.6.19.x and before.
Well that's a bit sad.
Could you please set up netconsole
(Documentation/networking/netconsole.txt) and add initcall_debug to the
kernel boot command line and then send us the full bootup logs?
(Even better: serial console with earlyprintk).
If that doesn't shed any light, we might have to ask you to perform a
git-bisect search to find the buggy commit, I'm afraid.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
2007-03-20 7:27 ` Andrew Morton
@ 2007-03-20 10:33 ` Stefan Priebe
2007-03-20 10:54 ` Olaf Kirch
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Priebe @ 2007-03-20 10:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, linux-scsi
Hello!
Here are more informations... the problem seems to be a little bit more
special.
1.) I've bootet these systems through NFS and would like to access
/dev/sda or /dev/sdb then. For example via fdisk and this does not work.
2.) I've now tested the following kernels -
2.6.18.8 - works
2.6.19.7 - works
2.6.20 - does not work
2.6.21-rc4 - does not work
3.) The funny thing is, i can boot the whole system via 2.6.18.8 for
example - fdisk the harddisk and format it + plus copying the whole
image with a 2.6.20.3 kernel - and then the Server installied works
perfectly i also can fdisk /dev/sdb or so. It only does not work if the
system itself is bootet via NFS...
Stefan
Andrew Morton schrieb:
> On Sun, 18 Mar 2007 21:50:46 +0100 Stefan Priebe <stefan@prie.be> wrote:
>
>> Hello!
>>
>> We've a very strange Problem with Kernel 2.6.20.x
>>
>> If i try to access a SCSI or SATA Disk (tested with Adaptec U320
>> ASC-29320, ICP Vortex 9024, Promise TX300) the whole server hangs - no
>> output - no error on the screen - but it hangs completely. But it does
>> not happen on all our systems affected are only old 604pin xeons and
>> socket 940 Opterons. Socket F Opteron or 771 Xeons does work fine.
>>
>> I've also testet apci=off pci=routeirq but both does not help. The
>> systems work fine with 2.6.19.x and before.
>
> Well that's a bit sad.
>
> Could you please set up netconsole
> (Documentation/networking/netconsole.txt) and add initcall_debug to the
> kernel boot command line and then send us the full bootup logs?
>
> (Even better: serial console with earlyprintk).
>
> If that doesn't shed any light, we might have to ask you to perform a
> git-bisect search to find the buggy commit, I'm afraid.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
2007-03-20 10:33 ` Stefan Priebe
@ 2007-03-20 10:54 ` Olaf Kirch
2007-03-20 10:59 ` Stefan Priebe
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Olaf Kirch @ 2007-03-20 10:54 UTC (permalink / raw)
To: stefan; +Cc: linux-kernel, linux-scsi
On Tuesday 20 March 2007 11:33, Stefan Priebe wrote:
> 1.) I've bootet these systems through NFS and would like to access
> /dev/sda or /dev/sdb then. For example via fdisk and this does not work.
What do you mean by "booted through NFS"? Do you mean the machine
runs with the root file system mounted via NFS? Or does it mean you
booted, and started the NFS server?
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
2007-03-20 10:54 ` Olaf Kirch
@ 2007-03-20 10:59 ` Stefan Priebe
2007-03-20 11:20 ` Stefan Priebe
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Stefan Priebe @ 2007-03-20 10:59 UTC (permalink / raw)
To: Olaf Kirch; +Cc: linux-kernel, linux-scsi
Hello!
It runs with nfsroot
# mount
192.168.0.100:/PXE/debian on / type nfs (rw)
Kernel command line: nfs root=/dev/nfs nfsroot=192.168.0.100:/PXE/debian
ip=dhcp
Stefan
Olaf Kirch schrieb:
> On Tuesday 20 March 2007 11:33, Stefan Priebe wrote:
>> 1.) I've bootet these systems through NFS and would like to access
>> /dev/sda or /dev/sdb then. For example via fdisk and this does not work.
>
> What do you mean by "booted through NFS"? Do you mean the machine
> runs with the root file system mounted via NFS? Or does it mean you
> booted, and started the NFS server?
>
> Olaf
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
2007-03-20 10:54 ` Olaf Kirch
2007-03-20 10:59 ` Stefan Priebe
@ 2007-03-20 11:20 ` Stefan Priebe
2007-03-20 12:23 ` Stefan Priebe
2007-03-20 13:28 ` Stefan Priebe
3 siblings, 0 replies; 9+ messages in thread
From: Stefan Priebe @ 2007-03-20 11:20 UTC (permalink / raw)
To: Olaf Kirch; +Cc: linux-kernel, linux-scsi
Hello!
Here a some more information:
- sometimes the whole systems crash - sometimes they are still alive
- if they are alive fdisk consumes 99% CPU
- fdisk cannot be killed also not with kill -9
- the same happens with a cat on /dev/sdX
- no problem when trying to access /dev/hdX
Stefan
Olaf Kirch schrieb:
> On Tuesday 20 March 2007 11:33, Stefan Priebe wrote:
>> 1.) I've bootet these systems through NFS and would like to access
>> /dev/sda or /dev/sdb then. For example via fdisk and this does not work.
>
> What do you mean by "booted through NFS"? Do you mean the machine
> runs with the root file system mounted via NFS? Or does it mean you
> booted, and started the NFS server?
>
> Olaf
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
2007-03-20 10:54 ` Olaf Kirch
2007-03-20 10:59 ` Stefan Priebe
2007-03-20 11:20 ` Stefan Priebe
@ 2007-03-20 12:23 ` Stefan Priebe
2007-03-20 13:28 ` Stefan Priebe
3 siblings, 0 replies; 9+ messages in thread
From: Stefan Priebe @ 2007-03-20 12:23 UTC (permalink / raw)
To: Olaf Kirch; +Cc: linux-kernel, linux-scsi
> - on a 2.6.20 system, try "dd if=/dev/sdb of=/dev/null bs=4k count=1" or
> something like this (with NFS root) - does this crash, too?
no it does not crash it is also no problem to set the count= to 10000 or
so or change the bs to 16k ...
> - do you have ACLs on files in /dev?
no
> - enable the sysrq key, make sure kernel messages go to the console
> by using "dmesg -n7", and when the kernel hangs, try sysrq-p, and
> sysrq-t
> (sysrq is documented in Documation/sysrq.txt in the kernel source)
> - try to capture the oops message - there must be one.
OK i've done the following:
1.) I've set up netconsole
2.) dmesg -n7
3.) fdisk /dev/sda
4.) sysrq-t / sysrq-p
So here is the output of -p and -t it hangs at nfs_sync_mapping_wait:
SysRq : Show Regs
Pid: 1598, comm: fdisk
EIP: 0060:[<c03bf506>] CPU: 0
EIP is at _spin_lock+0x7/0xf
EFLAGS: 00000286 Not tainted (2.6.20.3 #6)
EAX: c3117afc EBX: c3117a2c ECX: 00000020 EDX: 00000000
ESI: f7b63ed4 EDI: f7b63f04 EBP: f7b63edc DS: 007b ES: 007b GS: 00d8
CR0: 8005003b CR2: b7f00f90 CR3: 033ea000 CR4: 000006d0
[<c01b5c92>] nfs_sync_mapping_wait+0x83/0x1aa
[<c01516c5>] cache_alloc_refill+0xc8/0x196
[<c01b5eca>] nfs_sync_mapping_range+0x97/0xb6
[<c01ae5cf>] nfs_getattr+0x3a/0x96
[<c01ae595>] nfs_getattr+0x0/0x96
[<c01565d9>] vfs_getattr+0x21/0x30
[<c01566a3>] vfs_fstat+0x22/0x31
[<c0156c51>] sys_fstat64+0xf/0x23
[<c015da9c>] sys_ioctl+0x33/0x4b
[<c0114358>] do_page_fault+0x0/0x549
[<c010291c>] syscall_call+0x7/0xb
[<c03b0033>] call_verify+0x182/0x36f
=======================
SysRq : Show State
free sibling
task PC stack pid father child younger older
init S C0117721 0 1 0 2 (NOTLB)
c313fc48 00000082 c312fa90 c0117721 00100100 00200200 f7da9600
f7941e40
00000010 c313fc04 00000008 00000002 c3022700 c312fa90 c312fb9c
000008dd
64bf803e 00000029 c312f030 c313fc90 00000000 c30013c0 c03b3515
c03b352f
Call Trace:
[<c0117721>] default_wake_function+0x0/0xc
[<c03b3515>] rpc_wait_bit_interruptible+0x0/0x1f
[<c03b352f>] rpc_wait_bit_interruptible+0x1a/0x1f
[<c03beb38>] __wait_on_bit+0x2c/0x51
[<c03b3515>] rpc_wait_bit_interruptible+0x0/0x1f
[<c03bebd0>] out_of_line_wait_on_bit+0x73/0x7b
[<c012c950>] wake_bit_function+0x0/0x3c
[<c012c950>] wake_bit_function+0x0/0x3c
[<c03b3c6a>] __rpc_execute+0xdb/0x18b
[<c03b354d>] rpc_set_active+0x19/0x57
[<c03af1ef>] rpc_call_sync+0x71/0x98
[<c01b1824>] nfs_proc_getattr+0x5b/0x7f
[<c01ae981>] __nfs_revalidate_inode+0xe7/0x21a
[<c01ad415>] nfs_permission+0x0/0x133
[<c01ad415>] nfs_permission+0x0/0x133
[<c01ad527>] nfs_permission+0x112/0x133
[<c01ad415>] nfs_permission+0x0/0x133
[<c0159928>] permission+0x94/0xa2
[<c0159e57>] __link_path_walk+0x6c/0xa59
[<c013e20c>] __alloc_pages+0x4a/0x2a3
[<c015a883>] link_path_walk+0x3f/0xa4
[<c015abc5>] do_path_lookup+0x170/0x18b
[<c015ae0c>] __user_walk_fd+0x2d/0x43
[<c0156601>] vfs_stat_fd+0x19/0x40
[<c0156c0b>] sys_stat64+0xf/0x23
[<c02456d4>] copy_to_user+0x2f/0x37
[<c01234f6>] do_gettimeofday+0x35/0x119
[<c011f93e>] sys_time+0x1e/0x2e
[<c010291c>] syscall_call+0x7/0xb
=======================
ksoftirqd/0 S C33442C0 0 3 1 4 2 (L-TLB)
c3149fb8 00000046 c013cd73 c33442c0 00000000 c30131e0 00000003
f7931900
c301321c 00000000 c33f5030 00000000 c3012700 c3136030 c313613c
000001d9
a733fbbd 00000004 c04a8cc0 c0539380 c0539380 c0120494 fffffffc
c01204d6
Call Trace:
[<c013cd73>] mempool_free+0x65/0x6a
[<c0120494>] ksoftirqd+0x0/0xa7
[<c01204d6>] ksoftirqd+0x42/0xa7
[<c012c5e6>] kthread+0x72/0x96
[<c012c574>] kthread+0x0/0x96
[<c01034f7>] kernel_thread_helper+0x7/0x10
=======================
migration/1 S F745BF24 0 4 1 5 3 (L-TLB)
c314bfb0 00000046 00000092 f745bf24 00000001 f745bf70 c314bf94
f7ab03c0
00000000 00000001 f745bf74 00000001 c301a700 c3139a90 c3139b9c
000023c5
b7d09ccb 00000004 c312f560 c301b054 c301a700 00000001 c314bfc4
c0118643
Call Trace:
[<c0118643>] migration_thread+0x7a/0xd2
[<c01185c9>] migration_thread+0x0/0xd2
[<c012c5e6>] kthread+0x72/0x96
[<c012c574>] kthread+0x0/0x96
[<c01034f7>] kernel_thread_helper+0x7/0x10
=======================
ksoftirqd/1 S C301B1A0 0 5 1 6 4 (L-TLB)
c316ffb8 00000046 00000000 c301b1a0 00000008 c012a884 c301b1e0
f7f39040
c012aa25 c301b21c 00000000 00000001 c301a700 c3139560 c313966c
00000c4f
48c808e9 00000004 c312f560 c0539380 c0539380 c0120494 fffffffc
c01204d6
Call Trace:
[<c012a884>] rcu_do_batch+0x1a/0x7f
[<c012aa25>] __rcu_process_callbacks+0x8f/0xa1
[<c0120494>] ksoftirqd+0x0/0xa7
[<c01204d6>] ksoftirqd+0x42/0xa7
[<c012c5e6>] kthread+0x72/0x96
[<c012c574>] kthread+0x0/0x96
[<c01034f7>] kernel_thread_helper+0x7/0x10
=======================
migration/2 S F7B63F24 0 6 1 7 5 (L-TLB)
c3171fb0 00000046 00000092 f7b63f24 00000001 f7b63f70 c3171f94
f79703c0
00000000 00000001 f7b63f74 00000002 c3022700 c3139030 c313913c
000011f0
482d3411 00000022 c312f030 c3023054 c3022700 00000002 c3171fc4
c0118643
Call Trace:
[<c0118643>] migration_thread+0x7a/0xd2
[<c01185c9>] migration_thread+0x0/0xd2
[<c012c5e6>] kthread+0x72/0x96
[<c012c574>] kthread+0x0/0x96
[<c01034f7>] kernel_thread_helper+0x7/0x10
=======================
ksoftirqd/2 S C324D780 0 7 1 8 6 (L-TLB)
c3175fb8 00000046 c013cd73 c324d780 00000000 c30231e0 00000003
f7ba2740
c302321c 00000000 c053ab90 00000002 c3022700 c3155a90 c3155b9c
00000564
610707d5 00000004 c312f030 c0539380 c0539380 c0120494 fffffffc
c01204d6
Call Trace:
[<c013cd73>] mempool_free+0x65/0x6a
[<c0120494>] ksoftirqd+0x0/0xa7
[<c01204d6>] ksoftirqd+0x42/0xa7
[<c012c5e6>] kthread+0x72/0x96
[<c012c574>] kthread+0x0/0x96
[<c01034f7>] kernel_thread_helper+0x7/0x10
=======================
migration/3 S F74F1F24 0 8 1 9 7 (L-TLB)
c3177fb0 00000046 00000092 f74f1f24 00000001 f74f1f70 c3177f94
f7ab03c0
00000000 00000001 f74f1f74 00000003 c302a700 c3155560 c315566c
00000ea1
b2116928 00000004 c3136a90 c302b054 c302a700 00000003 c3177fc4
c0118643
Call Trace:
[<c0118643>] migration_thread+0x7a/0xd2
[<c01185c9>] migration_thread+0x0/0xd2
[<c012c5e6>] kthread+0x72/0x96
[<c012c574>] kthread+0x0/0x96
[<c01034f7>] kernel_thread_helper+0x7/0x10
=======================
ksoftirqd/3 S C317BFC4 0 9 1 10 8 (L-TLB)
c317bfb8 00000046 c03be392 c317bfc4 00000046 00000086 c313fee8
00000002 c312f560 kthread+0x72/0x96
0000002e schedule_timeout+0x70/0x8d
00000082 prep_new_page+0xb2/0xea
[<c02456d4>] inet_csk_accept+0x51/0x125
Stefan
Olaf Kirch schrieb:
> On Tuesday 20 March 2007 11:59, Stefan Priebe wrote:
>> Kernel command line: nfs root=/dev/nfs nfsroot=192.168.0.100:/PXE/debian
>> ip=dhcp
>
> Some things that may be worth trying:
>
> - on a 2.6.20 system, try "dd if=/dev/sdb of=/dev/null bs=4k count=1" or
> something like this (with NFS root) - does this crash, too?
>
> - do you have ACLs on files in /dev?
>
> - enable the sysrq key, make sure kernel messages go to the console
> by using "dmesg -n7", and when the kernel hangs, try sysrq-p, and
sysrq-t
> (sysrq is documented in Documation/sysrq.txt in the kernel source)
>
> - try to capture the oops message - there must be one.
>
> Olaf
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
2007-03-20 10:54 ` Olaf Kirch
` (2 preceding siblings ...)
2007-03-20 12:23 ` Stefan Priebe
@ 2007-03-20 13:28 ` Stefan Priebe
2007-03-20 16:01 ` Chuck Ebbert
3 siblings, 1 reply; 9+ messages in thread
From: Stefan Priebe @ 2007-03-20 13:28 UTC (permalink / raw)
To: Olaf Kirch; +Cc: linux-kernel, linux-scsi, akpm
[-- Attachment #1: Type: text/plain, Size: 293 bytes --]
Hello!
With the sysrq i've found the function with is the problem:
inode.c => nfs_getattr => nfs_sync_mapping_range
I've also found the attached patch - which is not included in any stable
release nor in 2.6.21.X but is public since 20.02.07
I think this is very important.
Stefan Priebe
[-- Attachment #2: linux-2.6.20-001-fix_block_device_getattr.dif --]
[-- Type: text/plain, Size: 818 bytes --]
commit 090ad38f8ceea3cc048981e9fe9cc62ed43fee58
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Tue Feb 20 19:28:07 2007 -0500
NFS: nfs_getattr() can't call nfs_sync_mapping_range() for non-regular files
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index af53c02..93d046c 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -429,7 +429,8 @@ int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
int err;
/* Flush out writes to the server in order to update c/mtime */
- nfs_sync_mapping_range(inode->i_mapping, 0, 0, FLUSH_NOCOMMIT);
+ if (S_ISREG(inode->i_mode))
+ nfs_sync_mapping_range(inode->i_mapping, 0, 0, FLUSH_NOCOMMIT);
/*
* We may force a getattr if the user cares about atime.
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers
2007-03-20 13:28 ` Stefan Priebe
@ 2007-03-20 16:01 ` Chuck Ebbert
0 siblings, 0 replies; 9+ messages in thread
From: Chuck Ebbert @ 2007-03-20 16:01 UTC (permalink / raw)
To: stefan; +Cc: Olaf Kirch, linux-kernel, linux-scsi, akpm
Stefan Priebe wrote:
> Hello!
>
> With the sysrq i've found the function with is the problem:
> inode.c => nfs_getattr => nfs_sync_mapping_range
>
> I've also found the attached patch - which is not included in any stable
> release nor in 2.6.21.X but is public since 20.02.07
>
> I think this is very important.
>
It is queued for 2.6.20.4.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-03-20 16:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-18 20:50 Kernel 2.6.20 does not work anymore with SCSI or SATA on old Opteron / Xeon servers Stefan Priebe
2007-03-20 7:27 ` Andrew Morton
2007-03-20 10:33 ` Stefan Priebe
2007-03-20 10:54 ` Olaf Kirch
2007-03-20 10:59 ` Stefan Priebe
2007-03-20 11:20 ` Stefan Priebe
2007-03-20 12:23 ` Stefan Priebe
2007-03-20 13:28 ` Stefan Priebe
2007-03-20 16:01 ` Chuck Ebbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).