LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* stuck nfsd processes with 2.6.24-rc2
@ 2007-11-13 19:07 Christian Kujau
  2007-11-13 19:41 ` J. Bruce Fields
  2007-11-13 20:45 ` Mathieu Desnoyers
  0 siblings, 2 replies; 6+ messages in thread
From: Christian Kujau @ 2007-11-13 19:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: bfields

Hi there,

I noticed that I cannot use kernel nfsd any more with 2.6.24-rc2, last 
working kernel as of now is 2.6.23.1. First I was using nfsv4 but 
switching to nfsv3 did not help either: exported shares can be mounted 
(client: 2.6-git/powerpc32, nfs-common-1.1.1~git-20070709-3ubuntu1), but 
running "ls /mountpoint" (even without "-l") on the client is enough to 
get the [nfsd] processes in "D" state.
Restarting the rpc.nfsd process on the does not help much, the new 
rpc.nfsd processes get stuck quickly.

For more details please see http://nerdbynature.de/bits/2.6.24-rc2/nfsd/

I haven't seen this particular issue on the list recently, hence my post.
I'll try to git-bisect my way through the changesets, but it'll take some 
time...

Thanks,
Christian.
-- 
BOFH excuse #314:

You need to upgrade your VESA local bus to a MasterCard local bus.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: stuck nfsd processes with 2.6.24-rc2
  2007-11-13 19:07 stuck nfsd processes with 2.6.24-rc2 Christian Kujau
@ 2007-11-13 19:41 ` J. Bruce Fields
  2007-11-13 21:27   ` Christian Kujau
  2007-11-13 20:45 ` Mathieu Desnoyers
  1 sibling, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2007-11-13 19:41 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel

On Tue, Nov 13, 2007 at 08:07:55PM +0100, Christian Kujau wrote:
> I noticed that I cannot use kernel nfsd any more with 2.6.24-rc2, last 
> working kernel as of now is 2.6.23.1. First I was using nfsv4 but switching 
> to nfsv3 did not help either: exported shares can be mounted (client: 
> 2.6-git/powerpc32, nfs-common-1.1.1~git-20070709-3ubuntu1), but running "ls 
> /mountpoint" (even without "-l") on the client is enough to get the [nfsd] 
> processes in "D" state.
> Restarting the rpc.nfsd process on the does not help much, the new rpc.nfsd 
> processes get stuck quickly.
>
> For more details please see http://nerdbynature.de/bits/2.6.24-rc2/nfsd/
>
> I haven't seen this particular issue on the list recently, hence my post.
> I'll try to git-bisect my way through the changesets, but it'll take some 
> time...

Could you get a sysrq-T trace?  (First get the [nfsd] processes in [D]
state.  Then type sysrq-T, or "echo t >/proc/sysrq-trigger".  Then
extract the stuff that's dumped to the logs.)

--b.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: stuck nfsd processes with 2.6.24-rc2
  2007-11-13 19:07 stuck nfsd processes with 2.6.24-rc2 Christian Kujau
  2007-11-13 19:41 ` J. Bruce Fields
@ 2007-11-13 20:45 ` Mathieu Desnoyers
  1 sibling, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2007-11-13 20:45 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel, bfields

* Christian Kujau (lists@nerdbynature.de) wrote:
> Hi there,
>
> I noticed that I cannot use kernel nfsd any more with 2.6.24-rc2, last 
> working kernel as of now is 2.6.23.1. First I was using nfsv4 but switching 
> to nfsv3 did not help either: exported shares can be mounted (client: 
> 2.6-git/powerpc32, nfs-common-1.1.1~git-20070709-3ubuntu1), but running "ls 
> /mountpoint" (even without "-l") on the client is enough to get the [nfsd] 
> processes in "D" state.
> Restarting the rpc.nfsd process on the does not help much, the new rpc.nfsd 
> processes get stuck quickly.
>
> For more details please see http://nerdbynature.de/bits/2.6.24-rc2/nfsd/
>
> I haven't seen this particular issue on the list recently, hence my post.
> I'll try to git-bisect my way through the changesets, but it'll take some 
> time...
>

Hrm, sounds like an interesting case for LTTng. Feel free to try it
and get the text dump of the last events that led to the nfsd hang.

See http://ltt.polymtl.ca for the patchset for 2.6.24-rc3-git3,
just follow the quickstart guide to get started.

(the overall process could be faster than a git bisect....)

Mathieu


> Thanks,
> Christian.
> -- 
> BOFH excuse #314:
>
> You need to upgrade your VESA local bus to a MasterCard local bus.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: stuck nfsd processes with 2.6.24-rc2
  2007-11-13 19:41 ` J. Bruce Fields
@ 2007-11-13 21:27   ` Christian Kujau
  2007-11-14  1:36     ` Christian Kujau
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Kujau @ 2007-11-13 21:27 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-kernel, mathieu.desnoyers

On Tue, 13 Nov 2007, J. Bruce Fields wrote:
> Could you get a sysrq-T trace?  (First get the [nfsd] processes in [D]

Ah, I forgot about that. Will do as soon as I get a working kernel again. 
I'm in the middle of git-bisecting and I had to mark the last 2 versions 
as "bad" but only because they 1) Oopsed during boot or 2) could not load 
the kernel image:

-------------------------------------
# git-bisect log
git-bisect start
# bad: [6e800af233e0bdf108efb7bd23c11ea6fa34cdeb] ACPI: add documentation for deprecated /proc/acpi/battery in ACPI_PROCFS
git-bisect bad 6e800af233e0bdf108efb7bd23c11ea6fa34cdeb				0)
# good: [bbf25010f1a6b761914430f5fca081ec8c7accd1] Linux 2.6.23
git-bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1			OK
# good: [fba956c46a72f9e7503fd464ffee43c632307e31] Map volume and brightness events on thinkpads
git-bisect good fba956c46a72f9e7503fd464ffee43c632307e31			OK
# bad: [7b1915a989ea4d426d0fd98974ab80f30ef1d779] mm/oom_kill.c: Use list_for_each_entry instead of list_for_each
git-bisect bad 7b1915a989ea4d426d0fd98974ab80f30ef1d779				1)
# bad: [c223701cf6c706f42840631c1ca919a18e6e2800] ide: add "hdx=nodma" kernel parameter
git-bisect bad c223701cf6c706f42840631c1ca919a18e6e2800				2)

0) was at 2.4.24-rc2 or so
1) Oops during bootup, but not NFS related, IIRC
2) Grub could not load the kernel: "linux 'zimage' kernel too big, try
    'make bzImage'". Maybe related to the x86/i386 merge?
-------------------------------------

I've built these kernels like so:

  # make mrproper
  # cp ~/config.2.6.23.1 .config
  # yes '' | make oldconfig
  # make
  # sudo cp arch/i386/boot/bzImage System.map /boot/2.6/
  # sudo make modules_install

Marking versions as "bad" because of unrelated stuff might poison the 
results somehow, or make them unusable again. Hm...

@Mathieu: I'll try LTTng later on, many thanks for the hint!

Christian.
-- 
BOFH excuse #90:

Budget cuts

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: stuck nfsd processes with 2.6.24-rc2
  2007-11-13 21:27   ` Christian Kujau
@ 2007-11-14  1:36     ` Christian Kujau
  2007-11-14  3:23       ` J. Bruce Fields
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Kujau @ 2007-11-14  1:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: J. Bruce Fields

On Tue, 13 Nov 2007, Christian Kujau wrote:
> Ah, I forgot about that. Will do as soon as I get a working kernel again. I'm 
> in the middle of git-bisecting and I had to mark the last 2 versions as "bad" 
> but only because they 1) Oopsed during boot or 2) could not load the kernel 
> image:

Same again: after 3 working versions I have to mark yet another one as 
"bad" - not because the NFS bug showed up, but I was unable to boot again 
("linux 'zimage' kernel too big" message again). So now I am at:

-------------------------------------
git-bisect start
# bad: [6e800af233e0bdf108efb7bd23c11ea6fa34cdeb] ACPI: add documentation for deprecated /proc/acpi/battery in ACPI_PROCFS
git-bisect bad 6e800af233e0bdf108efb7bd23c11ea6fa34cdeb
# good: [bbf25010f1a6b761914430f5fca081ec8c7accd1] Linux 2.6.23
git-bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1
# good: [fba956c46a72f9e7503fd464ffee43c632307e31] Map volume and brightness events on thinkpads
git-bisect good fba956c46a72f9e7503fd464ffee43c632307e31
# bad: [7b1915a989ea4d426d0fd98974ab80f30ef1d779] mm/oom_kill.c: Use list_for_each_entry instead of list_for_each
git-bisect bad 7b1915a989ea4d426d0fd98974ab80f30ef1d779
# bad: [c223701cf6c706f42840631c1ca919a18e6e2800] ide: add "hdx=nodma" kernel parameter
git-bisect bad c223701cf6c706f42840631c1ca919a18e6e2800
# good: [1d677a6dfaac1d1cf51a7f58847077240985faf2] pm3fb: hardware cursor support
git-bisect good 1d677a6dfaac1d1cf51a7f58847077240985faf2
# good: [291702f017efdfe556cb87b8530eb7d1ff08cbae] [ALSA] Support ASUS P701 eeepc [0x1043 0x82a1] support
git-bisect good 291702f017efdfe556cb87b8530eb7d1ff08cbae
# good: [92d15c2ccbb3e31a3fc71ad28fdb55e1319383c0] Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
git-bisect good 92d15c2ccbb3e31a3fc71ad28fdb55e1319383c0
# bad: [f77bf01425b11947eeb3b5b54685212c302741b8] kbuild: introduce ccflags-y, asflags-y and ldflags-y
git-bisect bad f77bf01425b11947eeb3b5b54685212c302741b8
-------------------------------------

I'm not sure wether it's sane to continue at all ("only 19 revisions 
left!"); but this has to wait until tomorrow anyway...

@Bruce: sysrq-t will have to wait too, sorry.


thanks for any ideas,
Christian.
-- 
BOFH excuse #70:

nesting roaches shorted out the ether cable

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: stuck nfsd processes with 2.6.24-rc2
  2007-11-14  1:36     ` Christian Kujau
@ 2007-11-14  3:23       ` J. Bruce Fields
  0 siblings, 0 replies; 6+ messages in thread
From: J. Bruce Fields @ 2007-11-14  3:23 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel

On Wed, Nov 14, 2007 at 02:36:18AM +0100, Christian Kujau wrote:
> On Tue, 13 Nov 2007, Christian Kujau wrote:
>> Ah, I forgot about that. Will do as soon as I get a working kernel again. 
>> I'm in the middle of git-bisecting and I had to mark the last 2 versions 
>> as "bad" but only because they 1) Oopsed during boot or 2) could not load 
>> the kernel image:
>
> Same again: after 3 working versions I have to mark yet another one as 
> "bad" - not because the NFS bug showed up, but I was unable to boot again 
> ("linux 'zimage' kernel too big" message again). So now I am at:

Hm, so assuming I got the gitk invocation right.  (I basically just ran
gitk ^goodsha1 ^goodsha2 ...etc... badsha1 badsha2 ... fs/nfsd include/linux/nfsd net/sunrpc /include/linux/sunrpc),
nothing really obvious pops out from that list.  I think the only
nonobvious changes that touch code you're likely to be hitting are the
exportfs rewrite.

Have you checked that the filesystem itself is OK?  (If you 'ls
/path/to/exported/filesystem' on the server does that work?)

Anyway if you get the git-bisect results, that'll be very helpful.  Till
then, yeah, sysrq-T is probably still the first thing to try.

--b.


>
> -------------------------------------
> git-bisect start
> # bad: [6e800af233e0bdf108efb7bd23c11ea6fa34cdeb] ACPI: add documentation for deprecated /proc/acpi/battery in ACPI_PROCFS
> git-bisect bad 6e800af233e0bdf108efb7bd23c11ea6fa34cdeb
> # good: [bbf25010f1a6b761914430f5fca081ec8c7accd1] Linux 2.6.23
> git-bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1
> # good: [fba956c46a72f9e7503fd464ffee43c632307e31] Map volume and brightness events on thinkpads
> git-bisect good fba956c46a72f9e7503fd464ffee43c632307e31
> # bad: [7b1915a989ea4d426d0fd98974ab80f30ef1d779] mm/oom_kill.c: Use list_for_each_entry instead of list_for_each
> git-bisect bad 7b1915a989ea4d426d0fd98974ab80f30ef1d779
> # bad: [c223701cf6c706f42840631c1ca919a18e6e2800] ide: add "hdx=nodma" kernel parameter
> git-bisect bad c223701cf6c706f42840631c1ca919a18e6e2800
> # good: [1d677a6dfaac1d1cf51a7f58847077240985faf2] pm3fb: hardware cursor support
> git-bisect good 1d677a6dfaac1d1cf51a7f58847077240985faf2
> # good: [291702f017efdfe556cb87b8530eb7d1ff08cbae] [ALSA] Support ASUS P701 eeepc [0x1043 0x82a1] support
> git-bisect good 291702f017efdfe556cb87b8530eb7d1ff08cbae
> # good: [92d15c2ccbb3e31a3fc71ad28fdb55e1319383c0] Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block
> git-bisect good 92d15c2ccbb3e31a3fc71ad28fdb55e1319383c0
> # bad: [f77bf01425b11947eeb3b5b54685212c302741b8] kbuild: introduce ccflags-y, asflags-y and ldflags-y
> git-bisect bad f77bf01425b11947eeb3b5b54685212c302741b8
> -------------------------------------
>
> I'm not sure wether it's sane to continue at all ("only 19 revisions 
> left!"); but this has to wait until tomorrow anyway...
>
> @Bruce: sysrq-t will have to wait too, sorry.
>
>
> thanks for any ideas,
> Christian.
> -- 
> BOFH excuse #70:
>
> nesting roaches shorted out the ether cable

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-11-14  3:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-13 19:07 stuck nfsd processes with 2.6.24-rc2 Christian Kujau
2007-11-13 19:41 ` J. Bruce Fields
2007-11-13 21:27   ` Christian Kujau
2007-11-14  1:36     ` Christian Kujau
2007-11-14  3:23       ` J. Bruce Fields
2007-11-13 20:45 ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).