LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* Interactivity issue in 2.6.25-rc3 @ 2008-02-28 18:44 Carlos R. Mafra 2008-02-28 19:18 ` Ingo Molnar 0 siblings, 1 reply; 10+ messages in thread From: Carlos R. Mafra @ 2008-02-28 18:44 UTC (permalink / raw) To: mingo; +Cc: linux-kernel I want to report a bad interactivity which happened in my desktop running the latest kernel (2.6.25-rc3-00081-g7704a8b). I tried to play 'flightgear' and the desktop became _very_ slow while flightgear was "loading scenery objects" (a task which never finished and I could not play it). The desktop is a P4 @ 3.0 GHz, 512MB, with the nv graphics driver. I ran 'fgfs' firstly as a regular user (mafra) and tried to execute Ingo's cfs-debug script as root in a terminal (mrxvt) which had 3 open tabs. It took minutes to switch tabs and run cfs-debug-info.sh, which completed its job only after I managed to kill flightgear (after more than a couple minutes). The result is here: http://www.ift.unesp.br/users/crmafra/cfs-debug-info-2008.02.28-14.04.59 [ I want to emphasize that I started the script while the system was slow, waited a few minutes, killed flightgear, and then the script finished its job. ] As I have the group scheduler enabled, I tried to run flightgear as root to see what happens. I could not even start to run the debug script, because each letter I typed took many seconds to appear. The mouse pointer however was responsive, but Window Maker wasn't. I tried to switch desktops, and after a few minutes the pressing of "Alt+5" took effect and the desktop 5 appeared ok in the screen. Then I tried to go back to desktop 1 with "Alt+1" and after exactly 12 minutes I gave up and hit "Ctrl-Alt-Backspace". After 5 or 10 seconds X was killed and everything was normal again (no need to reboot). Is this experience normal? I know that trying to play a game without 3D is a crazy thing, but it didn't feel _fair_ to wait minutes to see the result of a key press. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interactivity issue in 2.6.25-rc3 2008-02-28 18:44 Interactivity issue in 2.6.25-rc3 Carlos R. Mafra @ 2008-02-28 19:18 ` Ingo Molnar 2008-02-28 19:54 ` Ray Lee 0 siblings, 1 reply; 10+ messages in thread From: Ingo Molnar @ 2008-02-28 19:18 UTC (permalink / raw) To: linux-kernel, Carlos R. Mafra * Carlos R. Mafra <crmafra2@gmail.com> wrote: > I want to report a bad interactivity which happened in my desktop > running the latest kernel (2.6.25-rc3-00081-g7704a8b). > > I tried to play 'flightgear' and the desktop became _very_ slow while > flightgear was "loading scenery objects" (a task which never finished > and I could not play it). > > The desktop is a P4 @ 3.0 GHz, 512MB, with the nv graphics driver. yes, but this means you run the soft-3D driver under Xorg, right? That is known to starve its clients. The stats you sent show no worse than a few tens of msecs delays caused by CPU scheduling itself: mrxvt (3730, #threads: 1) se.wait_max : 18.679129 so the scheduler kept things within the ~40 msecs default latency target: .sysctl_sched_latency : 40.000000 Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interactivity issue in 2.6.25-rc3 2008-02-28 19:18 ` Ingo Molnar @ 2008-02-28 19:54 ` Ray Lee [not found] ` <20080228210627.GA4337@localhost.ift.unesp.br> 0 siblings, 1 reply; 10+ messages in thread From: Ray Lee @ 2008-02-28 19:54 UTC (permalink / raw) To: Carlos R. Mafra; +Cc: linux-kernel, Ingo Molnar On Thu, Feb 28, 2008 at 11:18 AM, Ingo Molnar <mingo@elte.hu> wrote: > > * Carlos R. Mafra <crmafra2@gmail.com> wrote: > > > I want to report a bad interactivity which happened in my desktop > > running the latest kernel (2.6.25-rc3-00081-g7704a8b). > > > > I tried to play 'flightgear' and the desktop became _very_ slow while > > flightgear was "loading scenery objects" (a task which never finished > > and I could not play it). > > > > The desktop is a P4 @ 3.0 GHz, 512MB, with the nv graphics driver. > > yes, but this means you run the soft-3D driver under Xorg, right? That > is known to starve its clients. The stats you sent show no worse than a > few tens of msecs delays caused by CPU scheduling itself: > > mrxvt (3730, #threads: 1) > se.wait_max : 18.679129 > > so the scheduler kept things within the ~40 msecs default latency > target: > > .sysctl_sched_latency : 40.000000 [...] >> As I have the group scheduler enabled, I tried to run flightgear as >> root to see what happens Also, please try disabling the group scheduler and run the test again. The group scheduler has known bad interactivity issues. Also be on the watch for any abnormally high disk activity, to rule out starvation due to the kernel choosing poor candidates for swap-in/swap-out. (Running a vmstat 1 in a console ahead of time and watching for high IO Wait -- the last column printed -- will give you a good indicator other than the drive LED.) Ray ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20080228210627.GA4337@localhost.ift.unesp.br>]
* Re: Interactivity issue in 2.6.25-rc3 [not found] ` <20080228210627.GA4337@localhost.ift.unesp.br> @ 2008-02-28 21:28 ` Ray Lee 2008-02-29 15:57 ` Ingo Molnar 2008-02-29 16:04 ` Ingo Molnar 1 sibling, 1 reply; 10+ messages in thread From: Ray Lee @ 2008-02-28 21:28 UTC (permalink / raw) To: Carlos R. Mafra, kernel list; +Cc: mingo Please keep all CCs. Others that will read lkml later will want to know what you found out. On Thu, Feb 28, 2008 at 1:06 PM, Carlos R. Mafra <crmafra2@gmail.com> wrote: > On Thu 28.Feb'08 at 11:54:04 -0800, Ray Lee wrote: > > On Thu, Feb 28, 2008 at 11:18 AM, Ingo Molnar <mingo@elte.hu> wrote: > > > > > > * Carlos R. Mafra <crmafra2@gmail.com> wrote: > > > > > > > I want to report a bad interactivity which happened in my desktop > > > > running the latest kernel (2.6.25-rc3-00081-g7704a8b). > > > > > > > > I tried to play 'flightgear' and the desktop became _very_ slow while > > > > flightgear was "loading scenery objects" (a task which never finished > > > > and I could not play it). > > > > > > > > The desktop is a P4 @ 3.0 GHz, 512MB, with the nv graphics driver. > > > > > > yes, but this means you run the soft-3D driver under Xorg, right? That > > > is known to starve its clients. The stats you sent show no worse than a > > > few tens of msecs delays caused by CPU scheduling itself: > > > > > > mrxvt (3730, #threads: 1) > > > se.wait_max : 18.679129 > > > > > Thanks Ingo! > > Hm, but I remember that my desktop became unusable. I was experiencing > latencies _much_ higher than msecs. But I think I have an explanation > for this low number, if you excuse my attempt to understand this. > > Could it be that your debug script generated these numbers while > fgfs was being killed, and then it felt no more the bad latencies > fgfs was causing? > > This was the first scenario: > > 1) Start 'fgfs' as normal user. > > 2) Wait a few seconds until fgfs' message "loading scenery objects" > appeared. > > 3) Everything becomes _very_ slow (measured in seconds, not msecs), > so I notice something is wrong (at least I thought it was). > > 4) Killed fgfs with Ctrl-c. > > 5) I go to Ingo's page and download his debug scripts. > > 6) Preparing for the battle to follow, I run cfs-debug-info-clear.sh > > 7) Start 'fgfs' and system becomes slow while loading scenery objects. > > 8) I try to reach the mrxvt terminal to run cfs-debug-info.sh. Each > letter I type takes seconds to appear. > > 9) I manage to start cfs-debug-info.sh. I could read the first line after > a few seconds: > > sched info dump (of tasks, modules, hw, dmesg, config, fs): > > But I am sure I did not see the > "gathering statistics for 15 seconds ..." > > As that was the first time ever that I used this script, I didn't > even know what I was supposed to do, but I was waiting for more > than a minute and nothing happened. > > 10) I managed to change tab and kill fgfs with Crtl-c. > > 11) I got back to the debug script tab and waited a few seconds > for it to finish. > > 12) That is the debug log which I put in the page mentioned in the > first email. > > I am sorry to especulate about it, but maybe the script got the > latencies after (or meanwhile) I was killing fgfs. > > > > Also, please try disabling the group scheduler and run the test again. > > The group scheduler has known bad interactivity issues. Also be on the > > watch for any abnormally high disk activity, to rule out starvation > > due to the kernel choosing poor candidates for swap-in/swap-out. > > (Running a vmstat 1 in a console ahead of time and watching for high > > IO Wait -- the last column printed -- will give you a good indicator > > other than the drive LED.) > > I have rebooted and tried to repeat the experiment, but I couldn't > reproduce the bad interactivity I reported earlier. Flightgear > loaded the scenery (which it did not do before) and the airplane > appeared in the screen. The game was slow, but I had a very good > usability of the desktop and I could type things almost normally, > I could switch desktops etc. Definitely not what happened before! > > So I am sorry for the noise, but something bad happened before > and unfortunately I am not sure I could run Ingo's debug script > correctly. I'll be more patient if that happens again. > > Anyway, I want to thank Ingo and Ray for their replies and > would like to humbly ask: > > Is it an scheduler anomaly if 'se.wait_max' is bigger than > 40 msecs for _any_ of the processes which appear in the > debug script log? In other words, is the scheduler > mathematically build to not allow latencies higher than > 40 msecs? Ingo will have to answer the scheduler part of that. But it's good to keep in mind that the scheduler can't do anything about slowdowns due to tasks being swapped out or waiting on reads from the disk. I mention this as shortly after CFS got into mainline, something changed in the VM that seems to make my system spend a lot more time in IO wait, causing the system to be much less responsive than it used to be. Unfortunately it seems to be dependent upon the history of what tasks I've run and their memory usage, so it's been hard to come up with a reproducible test case (well that and a complete lack of time). All I can say is that I've seen what you've reported as well, though it had nothing to do with using any 3d applications, just a browser, editor, gcc, etc. Ray ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interactivity issue in 2.6.25-rc3 2008-02-28 21:28 ` Ray Lee @ 2008-02-29 15:57 ` Ingo Molnar 2008-02-29 18:33 ` Ray Lee 0 siblings, 1 reply; 10+ messages in thread From: Ingo Molnar @ 2008-02-29 15:57 UTC (permalink / raw) To: Ray Lee; +Cc: Carlos R. Mafra, kernel list, Arjan van de Ven * Ray Lee <ray-lk@madrabbit.org> wrote: > But it's good to keep in mind that the scheduler can't do anything > about slowdowns due to tasks being swapped out or waiting on reads > from the disk. I mention this as shortly after CFS got into mainline, > something changed in the VM that seems to make my system spend a lot > more time in IO wait, causing the system to be much less responsive > than it used to be. > > Unfortunately it seems to be dependent upon the history of what tasks > I've run and their memory usage, so it's been hard to come up with a > reproducible test case (well that and a complete lack of time). All I > can say is that I've seen what you've reported as well, though it had > nothing to do with using any 3d applications, just a browser, editor, > gcc, etc. what does latencytop say about those delays, what causes them? It should be able to capture all sorts of latency sources in the system. Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interactivity issue in 2.6.25-rc3 2008-02-29 15:57 ` Ingo Molnar @ 2008-02-29 18:33 ` Ray Lee 0 siblings, 0 replies; 10+ messages in thread From: Ray Lee @ 2008-02-29 18:33 UTC (permalink / raw) To: Ingo Molnar; +Cc: Carlos R. Mafra, kernel list, Arjan van de Ven On Fri, Feb 29, 2008 at 7:57 AM, Ingo Molnar <mingo@elte.hu> wrote: > > * Ray Lee <ray-lk@madrabbit.org> wrote: > > > But it's good to keep in mind that the scheduler can't do anything > > about slowdowns due to tasks being swapped out or waiting on reads > > from the disk. I mention this as shortly after CFS got into mainline, > > something changed in the VM that seems to make my system spend a lot > > more time in IO wait, causing the system to be much less responsive > > than it used to be. > > > > Unfortunately it seems to be dependent upon the history of what tasks > > I've run and their memory usage, so it's been hard to come up with a > > reproducible test case (well that and a complete lack of time). All I > > can say is that I've seen what you've reported as well, though it had > > nothing to do with using any 3d applications, just a browser, editor, > > gcc, etc. > > what does latencytop say about those delays, what causes them? It should > be able to capture all sorts of latency sources in the system. Ah, thanks for the reminder. The last I tried it, it didn't show anything other than scheduler: waiting on cpu, but using a git clone version captured this almost immediately: EXT3: Waiting for journal access 2261.1 msec 15.0 % Writing a page to disk 112.4 msec 0.7 % Scheduler: waiting for cpu 57.9 msec 17.7 % default_wake_function __queue_work call_usermodehe 5.9 msec 0.1 % Waiting for event (poll) 5.0 msec 37.5 % Waiting for data on unix socket 5.0 msec 2.1 % Userspace lock contention 4.9 msec 23.5 % Waiting for TTY data 4.9 msec 0.3 % Sending TCP/IP data 4.9 msec 0.0 % I'm running postgresql in the background on this system and it does regular sync's, so perhaps that's related? It doesn't look like latencytop saves the trace data anywhere other than my eyes? I'll hack in a log and see what else it gets. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interactivity issue in 2.6.25-rc3 [not found] ` <20080228210627.GA4337@localhost.ift.unesp.br> 2008-02-28 21:28 ` Ray Lee @ 2008-02-29 16:04 ` Ingo Molnar 2008-02-29 17:14 ` Carlos R. Mafra 1 sibling, 1 reply; 10+ messages in thread From: Ingo Molnar @ 2008-02-29 16:04 UTC (permalink / raw) To: Carlos R. Mafra; +Cc: ray-lk, linux-kernel (on-list) * Carlos R. Mafra <crmafra2@gmail.com> wrote: > Is it an scheduler anomaly if 'se.wait_max' is bigger than 40 msecs > for _any_ of the processes which appear in the debug script log? In > other words, is the scheduler mathematically build to not allow > latencies higher than 40 msecs? it is definitely an anomaly if it's bigger than 40 msecs if you clear all stats via cfs-debug-info-clear.sh and the large latencies appear after that. You can force it to go above 40 msecs if you run more than say 40 CPU hogs in parallel, so it's not "mathematical", but you should never see large latencies under normal workloads - and that includes almost everything but "insanely high" workloads. and obviously, even if you only 'feel' long delays that's too an anomaly by definition, no matter what the scripts say about it. It might even be a scheduler anomaly as well: for example if the scheduler clock has an anomaly - on which the delay statistics are based too. generally, latencytop gives a pretty good idea about where app delays come from. (As a second-level mechanism, in sched-devel.git you can try the latency tracer.) Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interactivity issue in 2.6.25-rc3 2008-02-29 16:04 ` Ingo Molnar @ 2008-02-29 17:14 ` Carlos R. Mafra 2008-02-29 18:36 ` Ingo Molnar 0 siblings, 1 reply; 10+ messages in thread From: Carlos R. Mafra @ 2008-02-29 17:14 UTC (permalink / raw) To: Ingo Molnar; +Cc: ray-lk, linux-kernel On Fri 29.Feb'08 at 17:04:08 +0100, Ingo Molnar wrote: > > (on-list) > > * Carlos R. Mafra <crmafra2@gmail.com> wrote: > > > Is it an scheduler anomaly if 'se.wait_max' is bigger than 40 msecs > > for _any_ of the processes which appear in the debug script log? In > > other words, is the scheduler mathematically build to not allow > > latencies higher than 40 msecs? > > it is definitely an anomaly if it's bigger than 40 msecs if you clear > all stats via cfs-debug-info-clear.sh and the large latencies appear > after that. You can force it to go above 40 msecs if you run more than > say 40 CPU hogs in parallel, so it's not "mathematical", but you should > never see large latencies under normal workloads - and that includes > almost everything but "insanely high" workloads. Thank you for the explanation! > and obviously, even if you only 'feel' long delays that's too an anomaly > by definition, no matter what the scripts say about it. It might even be > a scheduler anomaly as well: for example if the scheduler clock has an > anomaly - on which the delay statistics are based too. But if the scripts say all 'se.wait_max' are < 40 msecs than it is not CFS' fault, right? Even if it takes 3 seconds for a typed letter to appear in the terminal? > generally, latencytop gives a pretty good idea about where app delays > come from. (As a second-level mechanism, in sched-devel.git you can try > the latency tracer.) Yeah, I must try latencytop to check for more things before sending an email reporting possible problems. Thanks again! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interactivity issue in 2.6.25-rc3 2008-02-29 17:14 ` Carlos R. Mafra @ 2008-02-29 18:36 ` Ingo Molnar 0 siblings, 0 replies; 10+ messages in thread From: Ingo Molnar @ 2008-02-29 18:36 UTC (permalink / raw) To: ray-lk, linux-kernel * Carlos R. Mafra <crmafra2@gmail.com> wrote: > > and obviously, even if you only 'feel' long delays that's too an > > anomaly by definition, no matter what the scripts say about it. It > > might even be a scheduler anomaly as well: for example if the > > scheduler clock has an anomaly - on which the delay statistics are > > based too. > > But if the scripts say all 'se.wait_max' are < 40 msecs than it is not > CFS' fault, right? Even if it takes 3 seconds for a typed letter to > appear in the terminal? yeah, in 99.9% of the cases it's not the scheduler's fault. Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Interactivity issue in 2.6.25-rc3 @ 2008-02-28 21:21 Carlos R. Mafra 0 siblings, 0 replies; 10+ messages in thread From: Carlos R. Mafra @ 2008-02-28 21:21 UTC (permalink / raw) To: linux-kernel; +Cc: ray-lk, mingo Resending with lkml Cc:ed. On Thu 28.Feb'08 at 11:54:04 -0800, Ray Lee wrote: > On Thu, Feb 28, 2008 at 11:18 AM, Ingo Molnar <mingo@elte.hu> wrote: > > > > * Carlos R. Mafra <crmafra2@gmail.com> wrote: > > > > > I want to report a bad interactivity which happened in my desktop > > > running the latest kernel (2.6.25-rc3-00081-g7704a8b). > > > > > > I tried to play 'flightgear' and the desktop became _very_ slow while > > > flightgear was "loading scenery objects" (a task which never finished > > > and I could not play it). > > > > > > The desktop is a P4 @ 3.0 GHz, 512MB, with the nv graphics driver. > > > > yes, but this means you run the soft-3D driver under Xorg, right? That > > is known to starve its clients. The stats you sent show no worse than a > > few tens of msecs delays caused by CPU scheduling itself: > > > > mrxvt (3730, #threads: 1) > > se.wait_max : 18.679129 > > Thanks Ingo! Hm, but I remember that my desktop became unusable. I was experiencing latencies _much_ higher than msecs. But I think I have an explanation for this low number, if you excuse my attempt to understand this. Could it be that your debug script generated these numbers while fgfs was being killed, and then it felt no more the bad latencies fgfs was causing? This was the first scenario: 1) Start 'fgfs' as normal user. 2) Wait a few seconds until fgfs' message "loading scenery objects" appeared. 3) Everything becomes _very_ slow (measured in seconds, not msecs), so I notice something is wrong (at least I thought it was). 4) Killed fgfs with Ctrl-c. 5) I go to Ingo's page and download his debug scripts. 6) Preparing for the battle to follow, I run cfs-debug-info-clear.sh 7) Start 'fgfs' and system becomes slow while loading scenery objects. 8) I try to reach the mrxvt terminal to run cfs-debug-info.sh. Each letter I type takes seconds to appear. 9) I manage to start cfs-debug-info.sh. I could read the first line after a few seconds: sched info dump (of tasks, modules, hw, dmesg, config, fs): But I am sure I did not see the "gathering statistics for 15 seconds ..." As that was the first time ever that I used this script, I didn't even know what I was supposed to do, but I was waiting for more than a minute and nothing happened. 10) I managed to change tab and kill fgfs with Crtl-c. 11) I got back to the debug script tab and waited a few seconds for it to finish. 12) That is the debug log which I put in the page mentioned in the first email. I am sorry to especulate about it, but maybe the script got the latencies after (or meanwhile) I was killing fgfs. > Also, please try disabling the group scheduler and run the test again. > The group scheduler has known bad interactivity issues. Also be on the > watch for any abnormally high disk activity, to rule out starvation > due to the kernel choosing poor candidates for swap-in/swap-out. > (Running a vmstat 1 in a console ahead of time and watching for high > IO Wait -- the last column printed -- will give you a good indicator > other than the drive LED.) I have rebooted and tried to repeat the experiment, but I couldn't reproduce the bad interactivity I reported earlier. Flightgear loaded the scenery (which it did not do before) and the airplane appeared in the screen. The game was slow, but I had a very good usability of the desktop and I could type things almost normally, I could switch desktops etc. Definitely not what happened before! So I am sorry for the noise, but something bad happened before and unfortunately I am not sure I could run Ingo's debug script correctly. I'll be more patient if that happens again. Anyway, I want to thank Ingo and Ray for their replies and would like to humbly ask: Is it an scheduler anomaly if 'se.wait_max' is bigger than 40 msecs for _any_ of the processes which appear in the debug script log? In other words, is the scheduler mathematically build to not allow latencies higher than 40 msecs? ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-02-29 18:37 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-02-28 18:44 Interactivity issue in 2.6.25-rc3 Carlos R. Mafra 2008-02-28 19:18 ` Ingo Molnar 2008-02-28 19:54 ` Ray Lee [not found] ` <20080228210627.GA4337@localhost.ift.unesp.br> 2008-02-28 21:28 ` Ray Lee 2008-02-29 15:57 ` Ingo Molnar 2008-02-29 18:33 ` Ray Lee 2008-02-29 16:04 ` Ingo Molnar 2008-02-29 17:14 ` Carlos R. Mafra 2008-02-29 18:36 ` Ingo Molnar 2008-02-28 21:21 Carlos R. Mafra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).