LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* NFS deadlock
@ 2004-05-18 10:15 Jaco Kroon
2004-05-19 16:09 ` samg
0 siblings, 1 reply; 4+ messages in thread
From: Jaco Kroon @ 2004-05-18 10:15 UTC (permalink / raw)
To: linux-kernel; +Cc: acooks
[-- Attachment #1: Type: text/plain, Size: 1472 bytes --]
Hello there
I've once again got problems with the kernel locking up. I'm now
convinced that it has something to do with NFS.
Previously weve had 2 machines that locked up, plus my one at home,
resulting in three machines. Sometimes they would recover by themselves
after some time, other times they could be left for 2 days or so without
recovering. All three of these use NFS to export files to other
machines, it's the only thing we can find they have in common, other
that x86 architecture, but then other machines would be dying as well.
It should be noted that none of these runs on the newest hardware, but
that should not matter, neither does any of our other servers. We have
a 3rd NFS server, which doesn't take nearly as heavy load via NFS. I've
been wondering why it hasn't locked up either, and this morning (right
now in fact) it has decided that it is it's turn and is currently unusable.
If anybody else is experiencing similar problems, or have possible work
arounds, it would be appreciated if you could share your knowledge.
Jaco
===========================================
This message and attachments are subject to a disclaimer. Please refer to www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. Volledige besonderhede is by www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
===========================================
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/x-pkcs7-signature, Size: 3174 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NFS deadlock
2004-05-18 10:15 NFS deadlock Jaco Kroon
@ 2004-05-19 16:09 ` samg
2004-05-20 8:10 ` Jaco Kroon
0 siblings, 1 reply; 4+ messages in thread
From: samg @ 2004-05-19 16:09 UTC (permalink / raw)
To: Jaco Kroon; +Cc: linux-kernel, acooks
Jaco,
How are your boxes locking up, I have nfs in use every day,
does rpc die?
what kernel are you using?
and are you transfering linux to linux, or to some other platform.
The only time I had problems was when my client locked up
because I disconnected the server, and it hung the client,
the only solution (based on the way I connected), was to reboot.
To make matters worse, I rean a script that used du every day, and
so there were 12+ instances of du, all trying to run about.
I would suggest using a program like sysstat, or sar, to help you
analyse the issues at hand.
-sam
> Hello there
>
> I've once again got problems with the kernel locking up. I'm now
> convinced that it has something to do with NFS.
>
> Previously weve had 2 machines that locked up, plus my one at home,
> resulting in three machines. Sometimes they would recover by themselves
> after some time, other times they could be left for 2 days or so without
> recovering. All three of these use NFS to export files to other
> machines, it's the only thing we can find they have in common, other
> that x86 architecture, but then other machines would be dying as well.
> It should be noted that none of these runs on the newest hardware, but
> that should not matter, neither does any of our other servers. We have
> a 3rd NFS server, which doesn't take nearly as heavy load via NFS. I've
> been wondering why it hasn't locked up either, and this morning (right
> now in fact) it has decided that it is it's turn and is currently
> unusable.
>
> If anybody else is experiencing similar problems, or have possible work
> arounds, it would be appreciated if you could share your knowledge.
>
> Jaco
>
> ===========================================
> This message and attachments are subject to a disclaimer. Please refer to
> www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
> Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig.
> Volledige besonderhede is by
> www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
> ===========================================
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NFS deadlock
2004-05-19 16:09 ` samg
@ 2004-05-20 8:10 ` Jaco Kroon
2004-05-20 14:46 ` Sam Gill
0 siblings, 1 reply; 4+ messages in thread
From: Jaco Kroon @ 2004-05-20 8:10 UTC (permalink / raw)
To: samg; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5044 bytes --]
Oh, sorry.
The one box at home is running 2.6.5 currently with the intent of
upgrading to 2.6.6 as soon as I can find the time. It is using an ext3
file system underlying. The same goes to the single client it serves to.
The one at the office that serves up to a hundred or so clients
currently runs 2.6.4 with a patch for the dpt_i2o driver, it has an ext3
partition but the one being served up via nfs is reiserfs. The clients
are running 2.6.5 at the moment (anything between 0 and 320 clients max
at any time, usually between 20 and 50 clients depending on lab usage).
The other server is using 2.4.24 (waiting for the dpt_i2o driver in the
2.6 kernel) with ext3 file systems and once again reiserfs for the nfs
exported part of the file system. Variety of clients in this case, from
2.4.20 kernels, right through to 2.6.6 kernels.
The machine that died yesterday is also running a 2.6.5 kernel, ext3
file system. It's two clients is the first of the two servers above and
the other runs kernel 2.6.6 as well.
What affects the regulularity of the crashes seems to be the load placed
on it by clients. In my case at home the client is considerably faster
that the server, which will enforce a relatively high load. I wish I
had more time to check this out. I'm suspecting some kind of race
condition that gets triggered by either heavy system load or a heavy
skew between speeds on the client/server. I might be totally wrong
though ...
Transfers in our case is always between linus and linux (at least as far
as we can control it, we are not aware of any other clients and would
probably manage to get such a person expelled should we find him).
The client lock-ups we've experienced as well. It eventually times out
after a *long* time, we usually bounce the server before that happens.
This can be explained and is in my oppinion quite normal.
What does sysstat and sar do? How can I use them to analyse the problem?
Jaco
samg@seven4sky.com wrote:
>Jaco,
>
>How are your boxes locking up, I have nfs in use every day,
>does rpc die?
>
>what kernel are you using?
>and are you transfering linux to linux, or to some other platform.
>
>The only time I had problems was when my client locked up
>because I disconnected the server, and it hung the client,
>the only solution (based on the way I connected), was to reboot.
>To make matters worse, I rean a script that used du every day, and
>so there were 12+ instances of du, all trying to run about.
>
>I would suggest using a program like sysstat, or sar, to help you
>analyse the issues at hand.
>
> -sam
>
>
>
>>Hello there
>>
>>I've once again got problems with the kernel locking up. I'm now
>>convinced that it has something to do with NFS.
>>
>>Previously weve had 2 machines that locked up, plus my one at home,
>>resulting in three machines. Sometimes they would recover by themselves
>>after some time, other times they could be left for 2 days or so without
>>recovering. All three of these use NFS to export files to other
>>machines, it's the only thing we can find they have in common, other
>>that x86 architecture, but then other machines would be dying as well.
>>It should be noted that none of these runs on the newest hardware, but
>>that should not matter, neither does any of our other servers. We have
>>a 3rd NFS server, which doesn't take nearly as heavy load via NFS. I've
>>been wondering why it hasn't locked up either, and this morning (right
>>now in fact) it has decided that it is it's turn and is currently
>>unusable.
>>
>>If anybody else is experiencing similar problems, or have possible work
>>arounds, it would be appreciated if you could share your knowledge.
>>
>>Jaco
>>
>>===========================================
>>This message and attachments are subject to a disclaimer. Please refer to
>>www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
>>Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig.
>>Volledige besonderhede is by
>>www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
>>===========================================
>>
>>
>>
>>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
--
"The strength of the Constitution lies entirely in the determination of each
citizen to defend it. Only if every single citizen feels duty bound to do
his share in this defense are the constitutional rights secure."
-- Albert Einstein
===========================================
This message and attachments are subject to a disclaimer. Please refer to www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. Volledige besonderhede is by www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
===========================================
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/x-pkcs7-signature, Size: 3174 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NFS deadlock
2004-05-20 8:10 ` Jaco Kroon
@ 2004-05-20 14:46 ` Sam Gill
0 siblings, 0 replies; 4+ messages in thread
From: Sam Gill @ 2004-05-20 14:46 UTC (permalink / raw)
To: Jaco Kroon; +Cc: samg, linux-kernel
Jaco,
sysstat is a package on debian, but includes the
sar utilities
such as sadc.
creates a directory in
redhat: /var/log/sa
debian: /var/log/sysstat
it captures statistics on the boxes its installed on
and saves them to a file saXX where XX is the day.
once you get the sar files, a cron job usually generates
a sar file sarXX, which you can then manually read, and it
will give you statictics about ever 15minutes. you can also
graph these numbers, on which I have been working on a program to
do this, but it is not finished quite yet.
You can change the frequency(cron), and manually generate the
sarXX graphs. It helped me diagnosis the situation, of a couple
failing computers. You have to know how to interpret the graphs,
but if you send me the saXX or sarXX files I can decode them for you.
I am planning on taking the project open-source, but I am not there
quite yet.
thanks,
-sam
> Oh, sorry.
>
> The one box at home is running 2.6.5 currently with the intent of
> upgrading to 2.6.6 as soon as I can find the time. It is using an ext3
> file system underlying. The same goes to the single client it serves to.
>
> The one at the office that serves up to a hundred or so clients
> currently runs 2.6.4 with a patch for the dpt_i2o driver, it has an ext3
> partition but the one being served up via nfs is reiserfs. The clients
> are running 2.6.5 at the moment (anything between 0 and 320 clients max
> at any time, usually between 20 and 50 clients depending on lab usage).
> The other server is using 2.4.24 (waiting for the dpt_i2o driver in the
> 2.6 kernel) with ext3 file systems and once again reiserfs for the nfs
> exported part of the file system. Variety of clients in this case, from
> 2.4.20 kernels, right through to 2.6.6 kernels.
>
> The machine that died yesterday is also running a 2.6.5 kernel, ext3
> file system. It's two clients is the first of the two servers above and
> the other runs kernel 2.6.6 as well.
>
> What affects the regulularity of the crashes seems to be the load placed
> on it by clients. In my case at home the client is considerably faster
> that the server, which will enforce a relatively high load. I wish I
> had more time to check this out. I'm suspecting some kind of race
> condition that gets triggered by either heavy system load or a heavy
> skew between speeds on the client/server. I might be totally wrong
> though ...
>
> Transfers in our case is always between linus and linux (at least as far
> as we can control it, we are not aware of any other clients and would
> probably manage to get such a person expelled should we find him).
>
> The client lock-ups we've experienced as well. It eventually times out
> after a *long* time, we usually bounce the server before that happens.
> This can be explained and is in my oppinion quite normal.
>
> What does sysstat and sar do? How can I use them to analyse the problem?
>
> Jaco
>
> samg@seven4sky.com wrote:
>
>>Jaco,
>>
>>How are your boxes locking up, I have nfs in use every day,
>>does rpc die?
>>
>>what kernel are you using?
>>and are you transfering linux to linux, or to some other platform.
>>
>>The only time I had problems was when my client locked up
>>because I disconnected the server, and it hung the client,
>>the only solution (based on the way I connected), was to reboot.
>>To make matters worse, I rean a script that used du every day, and
>>so there were 12+ instances of du, all trying to run about.
>>
>>I would suggest using a program like sysstat, or sar, to help you
>>analyse the issues at hand.
>>
>> -sam
>>
>>
>>
>>>Hello there
>>>
>>>I've once again got problems with the kernel locking up. I'm now
>>>convinced that it has something to do with NFS.
>>>
>>>Previously weve had 2 machines that locked up, plus my one at home,
>>>resulting in three machines. Sometimes they would recover by themselves
>>>after some time, other times they could be left for 2 days or so without
>>>recovering. All three of these use NFS to export files to other
>>>machines, it's the only thing we can find they have in common, other
>>>that x86 architecture, but then other machines would be dying as well.
>>>It should be noted that none of these runs on the newest hardware, but
>>>that should not matter, neither does any of our other servers. We have
>>>a 3rd NFS server, which doesn't take nearly as heavy load via NFS. I've
>>>been wondering why it hasn't locked up either, and this morning (right
>>>now in fact) it has decided that it is it's turn and is currently
>>>unusable.
>>>
>>>If anybody else is experiencing similar problems, or have possible work
>>>arounds, it would be appreciated if you could share your knowledge.
>>>
>>>Jaco
>>>
>>>===========================================
>>>This message and attachments are subject to a disclaimer. Please refer
>>> to
>>>www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
>>>Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig.
>>>Volledige besonderhede is by
>>>www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
>>>===========================================
>>>
>>>
>>>
>>>
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>> in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at http://www.tux.org/lkml/
>>
>>
>
> --
> "The strength of the Constitution lies entirely in the determination of
> each
> citizen to defend it. Only if every single citizen feels duty bound to do
> his share in this defense are the constitutional rights secure."
> -- Albert Einstein
> ===========================================
> This message and attachments are subject to a disclaimer. Please refer to
> www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
> Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig.
> Volledige besonderhede is by
> www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
> ===========================================
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-05-21 22:53 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-18 10:15 NFS deadlock Jaco Kroon
2004-05-19 16:09 ` samg
2004-05-20 8:10 ` Jaco Kroon
2004-05-20 14:46 ` Sam Gill
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).