LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Sam Gill" <samg@seven4sky.com>
To: "Jaco Kroon" <jkroon@cs.up.ac.za>
Cc: samg@seven4sky.com, linux-kernel@vger.kernel.org
Subject: Re: NFS deadlock
Date: Thu, 20 May 2004 10:46:54 -0400 (EDT)	[thread overview]
Message-ID: <2124.128.150.143.219.1085064414.squirrel@webmail.seven4sky.com> (raw)
In-Reply-To: <40AC67F8.5010307@cs.up.ac.za>

Jaco,

sysstat is a package on debian, but includes the
sar utilities

such as sadc.

creates a directory in
redhat: /var/log/sa
debian: /var/log/sysstat

it captures statistics on the boxes its installed on
and saves them to a file saXX where XX is the day.

once you get the sar files, a cron job usually generates
a sar file sarXX, which you can then manually read, and it
will give you statictics about ever 15minutes. you can also
graph these numbers, on which I have been working on a program to
do this, but it is not finished quite yet.

You can change the frequency(cron), and manually generate the
sarXX graphs. It helped me diagnosis the situation, of a couple
failing computers. You have to know how to interpret the graphs,
but if you send me the saXX or sarXX files I can decode them for you.

I am planning on taking the project open-source, but I am not there
quite yet.

thanks,
 -sam








> Oh, sorry.
>
> The one box at home is running 2.6.5 currently with the intent of
> upgrading to 2.6.6 as soon as I can find the time.  It is using an ext3
> file system underlying.  The same goes to the single client it serves to.
>
> The one at the office that serves up to a hundred or so clients
> currently runs 2.6.4 with a patch for the dpt_i2o driver, it has an ext3
> partition but the one being served up via nfs is reiserfs.  The clients
> are running 2.6.5 at the moment (anything between 0 and 320 clients max
> at any time, usually between 20 and 50 clients depending on lab usage).
> The other server is using 2.4.24 (waiting for the dpt_i2o driver in the
> 2.6 kernel) with ext3 file systems and once again reiserfs for the nfs
> exported part of the file system.  Variety of clients in this case, from
> 2.4.20 kernels, right through to 2.6.6 kernels.
>
> The machine that died yesterday is also running a 2.6.5 kernel, ext3
> file system.  It's two clients is the first of the two servers above and
> the other runs kernel 2.6.6 as well.
>
> What affects the regulularity of the crashes seems to be the load placed
> on it by clients.  In my case at home the client is considerably faster
> that the server, which will enforce a relatively high load.  I wish I
> had more time to check this out. I'm suspecting some kind of race
> condition that gets triggered by either heavy system load or a heavy
> skew between speeds on the client/server.  I might be totally wrong
> though ...
>
> Transfers in our case is always between linus and linux (at least as far
> as we can control it, we are not aware of any other clients and would
> probably manage to get such a person expelled should we find him).
>
> The client lock-ups we've experienced as well.  It eventually times out
> after a *long* time, we usually bounce the server before that happens.
> This can be explained and is in my oppinion quite normal.
>
> What does sysstat and sar do?  How can I use them to analyse the problem?
>
> Jaco
>
> samg@seven4sky.com wrote:
>
>>Jaco,
>>
>>How are your boxes locking up, I have nfs in use every day,
>>does rpc die?
>>
>>what kernel are you using?
>>and are you transfering linux to linux, or to some other platform.
>>
>>The only time I had problems was when my client locked up
>>because I disconnected the server, and it hung the client,
>>the only solution (based on the way I connected), was to reboot.
>>To make matters worse, I rean a script that used du every day, and
>>so there were 12+ instances of du, all trying to run about.
>>
>>I would suggest using a program like sysstat, or sar, to help you
>>analyse the issues at hand.
>>
>> -sam
>>
>>
>>
>>>Hello there
>>>
>>>I've once again got problems with the kernel locking up.  I'm now
>>>convinced that it has something to do with NFS.
>>>
>>>Previously weve had 2 machines that locked up, plus my one at home,
>>>resulting in three machines.  Sometimes they would recover by themselves
>>>after some time, other times they could be left for 2 days or so without
>>>recovering.  All three of these use NFS to export files to other
>>>machines, it's the only thing we can find they have in common, other
>>>that x86 architecture, but then other machines would be dying as well.
>>>It should be noted that none of these runs on the newest hardware, but
>>>that should not matter, neither does any of our other servers.  We have
>>>a 3rd NFS server, which doesn't take nearly as heavy load via NFS.  I've
>>>been wondering why it hasn't locked up either, and this morning (right
>>>now in fact) it has decided that it is it's turn and is currently
>>>unusable.
>>>
>>>If anybody else is experiencing similar problems, or have possible work
>>>arounds, it would be appreciated if you could share your knowledge.
>>>
>>>Jaco
>>>
>>>===========================================
>>>This message and attachments are subject to a disclaimer. Please refer
>>> to
>>>www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
>>>Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig.
>>>Volledige besonderhede is by
>>>www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
>>>===========================================
>>>
>>>
>>>
>>>
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel"
>> in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at  http://www.tux.org/lkml/
>>
>>
>
> --
> "The strength of the Constitution lies entirely in the determination of
> each
> citizen to defend it.  Only if every single citizen feels duty bound to do
> his share in this defense are the constitutional rights secure."
> -- Albert Einstein
> ===========================================
> This message and attachments are subject to a disclaimer. Please refer to
> www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
> Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig.
> Volledige besonderhede is by
> www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
> ===========================================
>
>


      reply	other threads:[~2004-05-21 22:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-18 10:15 Jaco Kroon
2004-05-19 16:09 ` samg
2004-05-20  8:10   ` Jaco Kroon
2004-05-20 14:46     ` Sam Gill [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2124.128.150.143.219.1085064414.squirrel@webmail.seven4sky.com \
    --to=samg@seven4sky.com \
    --cc=jkroon@cs.up.ac.za \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: NFS deadlock' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).