LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* NFS deadlock @ 2004-05-18 10:15 Jaco Kroon 2004-05-19 16:09 ` samg 0 siblings, 1 reply; 4+ messages in thread From: Jaco Kroon @ 2004-05-18 10:15 UTC (permalink / raw) To: linux-kernel; +Cc: acooks [-- Attachment #1: Type: text/plain, Size: 1472 bytes --] Hello there I've once again got problems with the kernel locking up. I'm now convinced that it has something to do with NFS. Previously weve had 2 machines that locked up, plus my one at home, resulting in three machines. Sometimes they would recover by themselves after some time, other times they could be left for 2 days or so without recovering. All three of these use NFS to export files to other machines, it's the only thing we can find they have in common, other that x86 architecture, but then other machines would be dying as well. It should be noted that none of these runs on the newest hardware, but that should not matter, neither does any of our other servers. We have a 3rd NFS server, which doesn't take nearly as heavy load via NFS. I've been wondering why it hasn't locked up either, and this morning (right now in fact) it has decided that it is it's turn and is currently unusable. If anybody else is experiencing similar problems, or have possible work arounds, it would be appreciated if you could share your knowledge. Jaco =========================================== This message and attachments are subject to a disclaimer. Please refer to www.it.up.ac.za/documentation/governance/disclaimer/ for full details. Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. Volledige besonderhede is by www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar. =========================================== [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/x-pkcs7-signature, Size: 3174 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NFS deadlock 2004-05-18 10:15 NFS deadlock Jaco Kroon @ 2004-05-19 16:09 ` samg 2004-05-20 8:10 ` Jaco Kroon 0 siblings, 1 reply; 4+ messages in thread From: samg @ 2004-05-19 16:09 UTC (permalink / raw) To: Jaco Kroon; +Cc: linux-kernel, acooks Jaco, How are your boxes locking up, I have nfs in use every day, does rpc die? what kernel are you using? and are you transfering linux to linux, or to some other platform. The only time I had problems was when my client locked up because I disconnected the server, and it hung the client, the only solution (based on the way I connected), was to reboot. To make matters worse, I rean a script that used du every day, and so there were 12+ instances of du, all trying to run about. I would suggest using a program like sysstat, or sar, to help you analyse the issues at hand. -sam > Hello there > > I've once again got problems with the kernel locking up. I'm now > convinced that it has something to do with NFS. > > Previously weve had 2 machines that locked up, plus my one at home, > resulting in three machines. Sometimes they would recover by themselves > after some time, other times they could be left for 2 days or so without > recovering. All three of these use NFS to export files to other > machines, it's the only thing we can find they have in common, other > that x86 architecture, but then other machines would be dying as well. > It should be noted that none of these runs on the newest hardware, but > that should not matter, neither does any of our other servers. We have > a 3rd NFS server, which doesn't take nearly as heavy load via NFS. I've > been wondering why it hasn't locked up either, and this morning (right > now in fact) it has decided that it is it's turn and is currently > unusable. > > If anybody else is experiencing similar problems, or have possible work > arounds, it would be appreciated if you could share your knowledge. > > Jaco > > =========================================== > This message and attachments are subject to a disclaimer. Please refer to > www.it.up.ac.za/documentation/governance/disclaimer/ for full details. > Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. > Volledige besonderhede is by > www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar. > =========================================== > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NFS deadlock 2004-05-19 16:09 ` samg @ 2004-05-20 8:10 ` Jaco Kroon 2004-05-20 14:46 ` Sam Gill 0 siblings, 1 reply; 4+ messages in thread From: Jaco Kroon @ 2004-05-20 8:10 UTC (permalink / raw) To: samg; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 5044 bytes --] Oh, sorry. The one box at home is running 2.6.5 currently with the intent of upgrading to 2.6.6 as soon as I can find the time. It is using an ext3 file system underlying. The same goes to the single client it serves to. The one at the office that serves up to a hundred or so clients currently runs 2.6.4 with a patch for the dpt_i2o driver, it has an ext3 partition but the one being served up via nfs is reiserfs. The clients are running 2.6.5 at the moment (anything between 0 and 320 clients max at any time, usually between 20 and 50 clients depending on lab usage). The other server is using 2.4.24 (waiting for the dpt_i2o driver in the 2.6 kernel) with ext3 file systems and once again reiserfs for the nfs exported part of the file system. Variety of clients in this case, from 2.4.20 kernels, right through to 2.6.6 kernels. The machine that died yesterday is also running a 2.6.5 kernel, ext3 file system. It's two clients is the first of the two servers above and the other runs kernel 2.6.6 as well. What affects the regulularity of the crashes seems to be the load placed on it by clients. In my case at home the client is considerably faster that the server, which will enforce a relatively high load. I wish I had more time to check this out. I'm suspecting some kind of race condition that gets triggered by either heavy system load or a heavy skew between speeds on the client/server. I might be totally wrong though ... Transfers in our case is always between linus and linux (at least as far as we can control it, we are not aware of any other clients and would probably manage to get such a person expelled should we find him). The client lock-ups we've experienced as well. It eventually times out after a *long* time, we usually bounce the server before that happens. This can be explained and is in my oppinion quite normal. What does sysstat and sar do? How can I use them to analyse the problem? Jaco samg@seven4sky.com wrote: >Jaco, > >How are your boxes locking up, I have nfs in use every day, >does rpc die? > >what kernel are you using? >and are you transfering linux to linux, or to some other platform. > >The only time I had problems was when my client locked up >because I disconnected the server, and it hung the client, >the only solution (based on the way I connected), was to reboot. >To make matters worse, I rean a script that used du every day, and >so there were 12+ instances of du, all trying to run about. > >I would suggest using a program like sysstat, or sar, to help you >analyse the issues at hand. > > -sam > > > >>Hello there >> >>I've once again got problems with the kernel locking up. I'm now >>convinced that it has something to do with NFS. >> >>Previously weve had 2 machines that locked up, plus my one at home, >>resulting in three machines. Sometimes they would recover by themselves >>after some time, other times they could be left for 2 days or so without >>recovering. All three of these use NFS to export files to other >>machines, it's the only thing we can find they have in common, other >>that x86 architecture, but then other machines would be dying as well. >>It should be noted that none of these runs on the newest hardware, but >>that should not matter, neither does any of our other servers. We have >>a 3rd NFS server, which doesn't take nearly as heavy load via NFS. I've >>been wondering why it hasn't locked up either, and this morning (right >>now in fact) it has decided that it is it's turn and is currently >>unusable. >> >>If anybody else is experiencing similar problems, or have possible work >>arounds, it would be appreciated if you could share your knowledge. >> >>Jaco >> >>=========================================== >>This message and attachments are subject to a disclaimer. Please refer to >>www.it.up.ac.za/documentation/governance/disclaimer/ for full details. >>Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. >>Volledige besonderhede is by >>www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar. >>=========================================== >> >> >> >> > >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > > -- "The strength of the Constitution lies entirely in the determination of each citizen to defend it. Only if every single citizen feels duty bound to do his share in this defense are the constitutional rights secure." -- Albert Einstein =========================================== This message and attachments are subject to a disclaimer. Please refer to www.it.up.ac.za/documentation/governance/disclaimer/ for full details. Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. Volledige besonderhede is by www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar. =========================================== [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/x-pkcs7-signature, Size: 3174 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NFS deadlock 2004-05-20 8:10 ` Jaco Kroon @ 2004-05-20 14:46 ` Sam Gill 0 siblings, 0 replies; 4+ messages in thread From: Sam Gill @ 2004-05-20 14:46 UTC (permalink / raw) To: Jaco Kroon; +Cc: samg, linux-kernel Jaco, sysstat is a package on debian, but includes the sar utilities such as sadc. creates a directory in redhat: /var/log/sa debian: /var/log/sysstat it captures statistics on the boxes its installed on and saves them to a file saXX where XX is the day. once you get the sar files, a cron job usually generates a sar file sarXX, which you can then manually read, and it will give you statictics about ever 15minutes. you can also graph these numbers, on which I have been working on a program to do this, but it is not finished quite yet. You can change the frequency(cron), and manually generate the sarXX graphs. It helped me diagnosis the situation, of a couple failing computers. You have to know how to interpret the graphs, but if you send me the saXX or sarXX files I can decode them for you. I am planning on taking the project open-source, but I am not there quite yet. thanks, -sam > Oh, sorry. > > The one box at home is running 2.6.5 currently with the intent of > upgrading to 2.6.6 as soon as I can find the time. It is using an ext3 > file system underlying. The same goes to the single client it serves to. > > The one at the office that serves up to a hundred or so clients > currently runs 2.6.4 with a patch for the dpt_i2o driver, it has an ext3 > partition but the one being served up via nfs is reiserfs. The clients > are running 2.6.5 at the moment (anything between 0 and 320 clients max > at any time, usually between 20 and 50 clients depending on lab usage). > The other server is using 2.4.24 (waiting for the dpt_i2o driver in the > 2.6 kernel) with ext3 file systems and once again reiserfs for the nfs > exported part of the file system. Variety of clients in this case, from > 2.4.20 kernels, right through to 2.6.6 kernels. > > The machine that died yesterday is also running a 2.6.5 kernel, ext3 > file system. It's two clients is the first of the two servers above and > the other runs kernel 2.6.6 as well. > > What affects the regulularity of the crashes seems to be the load placed > on it by clients. In my case at home the client is considerably faster > that the server, which will enforce a relatively high load. I wish I > had more time to check this out. I'm suspecting some kind of race > condition that gets triggered by either heavy system load or a heavy > skew between speeds on the client/server. I might be totally wrong > though ... > > Transfers in our case is always between linus and linux (at least as far > as we can control it, we are not aware of any other clients and would > probably manage to get such a person expelled should we find him). > > The client lock-ups we've experienced as well. It eventually times out > after a *long* time, we usually bounce the server before that happens. > This can be explained and is in my oppinion quite normal. > > What does sysstat and sar do? How can I use them to analyse the problem? > > Jaco > > samg@seven4sky.com wrote: > >>Jaco, >> >>How are your boxes locking up, I have nfs in use every day, >>does rpc die? >> >>what kernel are you using? >>and are you transfering linux to linux, or to some other platform. >> >>The only time I had problems was when my client locked up >>because I disconnected the server, and it hung the client, >>the only solution (based on the way I connected), was to reboot. >>To make matters worse, I rean a script that used du every day, and >>so there were 12+ instances of du, all trying to run about. >> >>I would suggest using a program like sysstat, or sar, to help you >>analyse the issues at hand. >> >> -sam >> >> >> >>>Hello there >>> >>>I've once again got problems with the kernel locking up. I'm now >>>convinced that it has something to do with NFS. >>> >>>Previously weve had 2 machines that locked up, plus my one at home, >>>resulting in three machines. Sometimes they would recover by themselves >>>after some time, other times they could be left for 2 days or so without >>>recovering. All three of these use NFS to export files to other >>>machines, it's the only thing we can find they have in common, other >>>that x86 architecture, but then other machines would be dying as well. >>>It should be noted that none of these runs on the newest hardware, but >>>that should not matter, neither does any of our other servers. We have >>>a 3rd NFS server, which doesn't take nearly as heavy load via NFS. I've >>>been wondering why it hasn't locked up either, and this morning (right >>>now in fact) it has decided that it is it's turn and is currently >>>unusable. >>> >>>If anybody else is experiencing similar problems, or have possible work >>>arounds, it would be appreciated if you could share your knowledge. >>> >>>Jaco >>> >>>=========================================== >>>This message and attachments are subject to a disclaimer. Please refer >>> to >>>www.it.up.ac.za/documentation/governance/disclaimer/ for full details. >>>Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. >>>Volledige besonderhede is by >>>www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar. >>>=========================================== >>> >>> >>> >>> >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-kernel" >> in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >>Please read the FAQ at http://www.tux.org/lkml/ >> >> > > -- > "The strength of the Constitution lies entirely in the determination of > each > citizen to defend it. Only if every single citizen feels duty bound to do > his share in this defense are the constitutional rights secure." > -- Albert Einstein > =========================================== > This message and attachments are subject to a disclaimer. Please refer to > www.it.up.ac.za/documentation/governance/disclaimer/ for full details. > Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. > Volledige besonderhede is by > www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar. > =========================================== > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-05-21 22:53 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-05-18 10:15 NFS deadlock Jaco Kroon 2004-05-19 16:09 ` samg 2004-05-20 8:10 ` Jaco Kroon 2004-05-20 14:46 ` Sam Gill
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).