LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [BUG] scheduler: strange behavor with massive interactive processes
@ 2007-03-27 1:34 Satoru Takeuchi
2007-03-27 5:04 ` Mike Galbraith
2007-03-27 19:14 ` Ingo Molnar
0 siblings, 2 replies; 10+ messages in thread
From: Satoru Takeuchi @ 2007-03-27 1:34 UTC (permalink / raw)
To: Linux Kernel; +Cc: Ingo Molnar
Hi Ingo and all,
When I was executing massive interactive processes, I found that some of them
occupy CPU time and the others hardly run.
It seems that some of processes which occupy CPU time always has max effective
prio (default+5) and the others have max - 1. What happen here is...
1. If there are moderate number of max interactive processes, they can be
re-inserted into active queue without falling down its priority again and
again.
2. In this case, the others seldom run, and can't get max effective priority
at next exhausting because scheduler considers them to sleep too long.
3. Goto 1, OOPS!
Unfortunately I haven't been able to make the patch resolving this problem
yet. Any idea?
I also attach the test program which easily recreates this problem.
Test program flow:
1. First process starts child proesses and wait for 5 minutes.
2. Each child process executes "work 8 msec and sleep 1 msec" loop
continuously.
3. After 3 minits have passed, each child processes prints the # of loops
which executed.
What expected:
Each child processes execute nearly equal # of loops.
Test environment:
- kernel: 2.6.20(*1)
- # of CPUs: 1 or 2
- # of child processes: 200 or 400
- nice value: 0 or 20(*2)
*1) I confirmed that 2.6.21-rc5 has no change regarding this problem.
*2) If a process have nice 20, scheduler never regards it as interactive.
Test results:
-----------+----------------+------+------------------------------------
# of CPUs | # of processes | nice | result
-----------+----------------+------+------------------------------------
| | 20 | looks good
1(i386) | +------+------------------------------------
| | 0 | 4 processes occupy 98% of CPU time
-----------+ 200 +------+------------------------------------
| | 20 | looks good
| +------+------------------------------------
| | 0 | 8 processes occupy 72% of CPU time
2(ia64) +----------------+------+------------------------------------
| 400 | 20 | looks good
| +------+------------------------------------
| | 0 | 8 processes occupy 98% of CPU time
-----------+----------------+------+------------------------------------
FYI. 2.6.21-rc3-mm1 (enabling RSDL scheduler) works fine in the all casees :-)
Thanks,
Satoru
-------------------------------------------------------------------------------
/*
* massive_intr - run @nproc interactive processes and print the number of
* loops(*1) each process executes in @runtime secs.
*
* *1) "work 8 msec and sleep 1msec" loop
*
* Usage: massive_intr <nproc> <runtime>
*
* @nproc: number of processes
* @runtime: execute time[sec]
*
* ex) If you want to run 300 processes for 5 mins, issue the
* command as follows:
*
* $ massive_intr 300 300
*
* How to build:
*
* cc -o massive_intr massive_intr.c -lrt
*
*
* Copyright (C) 2007 Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or (at
* your option) any later version.
*
* This program is distributed in the hope that it will be useful, but
* WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
*
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <unistd.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <errno.h>
#include <err.h>
#define WORK_MSECS 8
#define SLEEP_MSECS 1
#define MAX_PROC 1024
#define SAMPLE_COUNT 1000000000
#define USECS_PER_SEC 1000000
#define USECS_PER_MSEC 1000
#define NSECS_PER_MSEC 1000000
#define SHMEMSIZE 4096
static const char *shmname = "/sched_interactive_shmem";
static void *shmem;
static sem_t *printsem;
static int nproc;
static int runtime;
static int fd;
static time_t *first;
static pid_t pid[MAX_PROC];
static int return_code;
static void cleanup_resources(void)
{
if (sem_destroy(printsem) < 0)
warn("sem_destroy() failed");
if (munmap(shmem, SHMEMSIZE) < 0)
warn("munmap() failed");
if (close(fd) < 0)
warn("close() failed");
}
static void abnormal_exit(void)
{
if (kill(getppid(), SIGUSR2) < 0)
err(EXIT_FAILURE, "kill() failed");
}
static void sighandler(int signo)
{
}
static void sighandler2(int signo)
{
return_code = EXIT_FAILURE;
}
static void loopfnc(int nloop)
{
int i;
for (i = 0; i < nloop; i++)
;
}
static int loop_per_msec(void)
{
struct timeval tv[2];
int before, after;
if (gettimeofday(&tv[0], NULL) < 0)
return -1;
loopfnc(SAMPLE_COUNT);
if (gettimeofday(&tv[1], NULL) < 0)
return -1;
before = tv[0].tv_sec*USECS_PER_SEC+tv[0].tv_usec;
after = tv[1].tv_sec*USECS_PER_SEC+tv[1].tv_usec;
return SAMPLE_COUNT/(after - before)*USECS_PER_MSEC;
}
static void *test_job(void *arg)
{
int l = (int)arg;
int count = 0;
time_t current;
sigset_t sigset;
struct sigaction sa;
struct timespec ts = { 0, NSECS_PER_MSEC*SLEEP_MSECS};
sa.sa_handler = sighandler;
if (sigemptyset(&sa.sa_mask) < 0) {
warn("sigemptyset() failed");
abnormal_exit();
}
sa.sa_flags = 0;
if (sigaction(SIGUSR1, &sa, NULL) < 0) {
warn("sigaction() failed");
abnormal_exit();
}
if (sigemptyset(&sigset) < 0) {
warn("sigfillset() failed");
abnormal_exit();
}
sigsuspend(&sigset);
if (errno != EINTR) {
warn("sigsuspend() failed");
abnormal_exit();
}
/* main loop */
do {
loopfnc(WORK_MSECS*l);
if (nanosleep(&ts, NULL) < 0) {
warn("nanosleep() failed");
abnormal_exit();
}
count++;
if (time(¤t) == -1) {
warn("time() failed");
abnormal_exit();
}
} while (difftime(current, *first) < runtime);
if (sem_wait(printsem) < 0) {
warn("sem_wait() failed");
abnormal_exit();
}
printf("%06d\t%08d\n", getpid(), count);
if (sem_post(printsem) < 0) {
warn("sem_post() failed");
abnormal_exit();
}
exit(EXIT_SUCCESS);
}
static void usage(void)
{
fprintf(stderr,
"Usage : massive_intr <nproc> <runtime>\n"
"\t\tnproc : number of processes\n"
"\t\truntime : execute time[sec]\n");
exit(EXIT_FAILURE);
}
int main(int argc, char **argv)
{
int i, j;
int status;
sigset_t sigset;
struct sigaction sa;
int c;
if (argc != 3)
usage();
nproc = strtol(argv[1], NULL, 10);
if (errno || nproc < 1 || nproc > MAX_PROC)
err(EXIT_FAILURE, "invalid multinum");
runtime = strtol(argv[2], NULL, 10);
if (errno || runtime <= 0)
err(EXIT_FAILURE, "invalid runtime");
sa.sa_handler = sighandler2;
if (sigemptyset(&sa.sa_mask) < 0)
err(EXIT_FAILURE, "sigemptyset() failed");
sa.sa_flags = 0;
if (sigaction(SIGUSR2, &sa, NULL) < 0)
err(EXIT_FAILURE, "sigaction() failed");
if (sigemptyset(&sigset) < 0)
err(EXIT_FAILURE, "sigemptyset() failed");
if (sigaddset(&sigset, SIGUSR1) < 0)
err(EXIT_FAILURE, "sigaddset() failed");
if (sigaddset(&sigset, SIGUSR2) < 0)
err(EXIT_FAILURE, "sigaddset() failed");
if (sigprocmask(SIG_BLOCK, &sigset, NULL) < 0)
err(EXIT_FAILURE, "sigprocmask() failed");
/* setup shared memory */
if ((fd = shm_open(shmname, O_CREAT | O_RDWR, 0644)) < 0)
err(EXIT_FAILURE, "shm_open() failed");
if (shm_unlink(shmname) < 0) {
warn("shm_unlink() failed");
goto err_close;
}
if (ftruncate(fd, SHMEMSIZE) < 0) {
warn("ftruncate() failed");
goto err_close;
}
shmem = mmap(NULL, SHMEMSIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (shmem == (void *)-1) {
warn("mmap() failed");
goto err_unmap;
}
printsem = shmem;
first = shmem + sizeof(*printsem);
/* initialize semaphore */
if ((sem_init(printsem, 1, 1)) < 0) {
warn("sem_init() failed");
goto err_unmap;
}
if ((c = loop_per_msec()) < 0) {
fprintf(stderr, "loop_per_msec() failed\n");
goto err_sem;
}
for (i = 0; i < nproc; i++) {
pid[i] = fork();
if (pid[i] == -1) {
warn("fork() failed\n");
for (j = 0; j < i; j++)
if (kill(pid[j], SIGKILL) < 0)
warn("kill() failed");
goto err_sem;
}
if (pid[i] == 0)
test_job((void *)c);
}
if (sigemptyset(&sigset) < 0) {
warn("sigemptyset() failed");
goto err_proc;
}
if (sigaddset(&sigset, SIGUSR2) < 0) {
warn("sigaddset() failed");
goto err_proc;
}
if (sigprocmask(SIG_UNBLOCK, &sigset, NULL) < 0) {
warn("sigprocmask() failed");
goto err_proc;
}
if (time(first) < 0) {
warn("time() failed");
goto err_proc;
}
if ((kill(0, SIGUSR1)) == -1) {
warn("kill() failed");
goto err_proc;
}
for (i = 0; i < nproc; i++) {
if (wait(&status) < 0) {
warn("wait() failed");
goto err_proc;
}
}
cleanup_resources();
exit(return_code);
err_proc:
for (i = 0; i < nproc; i++)
if (kill(pid[i], SIGKILL) < 0)
if (errno != ESRCH)
warn("kill() failed");
err_sem:
if (sem_destroy(printsem) < 0)
warn("sem_destroy() failed");
err_unmap:
if (munmap(shmem, SHMEMSIZE) < 0)
warn("munmap() failed");
err_close:
if (close(fd) < 0)
warn("close() failed");
exit(EXIT_FAILURE);
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] scheduler: strange behavor with massive interactive processes
2007-03-27 1:34 [BUG] scheduler: strange behavor with massive interactive processes Satoru Takeuchi
@ 2007-03-27 5:04 ` Mike Galbraith
2007-03-28 8:38 ` Mike Galbraith
2007-03-27 19:14 ` Ingo Molnar
1 sibling, 1 reply; 10+ messages in thread
From: Mike Galbraith @ 2007-03-27 5:04 UTC (permalink / raw)
To: Satoru Takeuchi; +Cc: Linux Kernel, Ingo Molnar
On Tue, 2007-03-27 at 10:34 +0900, Satoru Takeuchi wrote:
> Hi Ingo and all,
Hi,
> When I was executing massive interactive processes, I found that some of them
> occupy CPU time and the others hardly run.
>
> It seems that some of processes which occupy CPU time always has max effective
> prio (default+5) and the others have max - 1. What happen here is...
>
> 1. If there are moderate number of max interactive processes, they can be
> re-inserted into active queue without falling down its priority again and
> again.
> 2. In this case, the others seldom run, and can't get max effective priority
> at next exhausting because scheduler considers them to sleep too long.
> 3. Goto 1, OOPS!
Ah, that's a new twist. Cool testcase. A mechanism which was put in
specifically to prevent tasks from jumping straight to max interactive,
and thus hurting truly interactive tasks, is starving these not truly
interactive (but nothing says they couldn't be) tasks.
One way to prevent this may be to do something very simple like (I did
this here a while back) upon timeslice completion (or possibly better,
round-robin interval [whenever]), promote a task from a lower priority
queue. Tasks are then always crawling up the ladder, and _will_ be
inserted into the stream no matter what is going on in the system. The
lower the task's priority, the longer it takes to crawl up, but it will
get there. I'll try this again, and see how it handles your testcase,
and how it affects interactivity. There are many others ways of course,
limiting queue runtime etc etc.
-Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] scheduler: strange behavor with massive interactive processes
2007-03-27 1:34 [BUG] scheduler: strange behavor with massive interactive processes Satoru Takeuchi
2007-03-27 5:04 ` Mike Galbraith
@ 2007-03-27 19:14 ` Ingo Molnar
2007-03-28 1:16 ` Satoru Takeuchi
1 sibling, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2007-03-27 19:14 UTC (permalink / raw)
To: Satoru Takeuchi; +Cc: Linux Kernel
* Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> wrote:
> Hi Ingo and all,
>
> When I was executing massive interactive processes, I found that some
> of them occupy CPU time and the others hardly run.
yeah.
> I also attach the test program which easily recreates this problem.
thanks, this is really helpful - does the patch below improve the
situation?
Ingo
---------------------->
Subject: [patch] sched: improve starvation prevention
From: Ingo Molnar <mingo@elte.hu>
improve starvation prevention by guaranteeing STARVATION_LIMIT. This
also simplifies the code a bit.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -235,7 +235,7 @@ struct rq {
*/
unsigned long nr_uninterruptible;
- unsigned long expired_timestamp;
+ unsigned long switch_timestamp;
/* Cached timestamp set by update_cpu_clock() */
unsigned long long most_recent_timestamp;
struct task_struct *curr, *idle;
@@ -3103,9 +3103,9 @@ static inline int expired_starving(struc
{
if (rq->curr->static_prio > rq->best_expired_prio)
return 1;
- if (!STARVATION_LIMIT || !rq->expired_timestamp)
+ if (!STARVATION_LIMIT)
return 0;
- if (jiffies - rq->expired_timestamp > STARVATION_LIMIT * rq->nr_running)
+ if (jiffies - rq->switch_timestamp > STARVATION_LIMIT)
return 1;
return 0;
}
@@ -3218,8 +3218,6 @@ static void task_running_tick(struct rq
p->time_slice = task_timeslice(p);
p->first_time_slice = 0;
- if (!rq->expired_timestamp)
- rq->expired_timestamp = jiffies;
if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
enqueue_task(p, rq->expired);
if (p->static_prio < rq->best_expired_prio)
@@ -3406,7 +3404,7 @@ need_resched_nonpreemptible:
idle_balance(cpu, rq);
if (!rq->nr_running) {
next = rq->idle;
- rq->expired_timestamp = 0;
+ rq->switch_timestamp = jiffies;
goto switch_tasks;
}
}
@@ -3420,7 +3418,7 @@ need_resched_nonpreemptible:
rq->active = rq->expired;
rq->expired = array;
array = rq->active;
- rq->expired_timestamp = 0;
+ rq->switch_timestamp = jiffies;
rq->best_expired_prio = MAX_PRIO;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] scheduler: strange behavor with massive interactive processes
2007-03-27 19:14 ` Ingo Molnar
@ 2007-03-28 1:16 ` Satoru Takeuchi
2007-03-31 8:16 ` Satoru Takeuchi
0 siblings, 1 reply; 10+ messages in thread
From: Satoru Takeuchi @ 2007-03-28 1:16 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Satoru Takeuchi, Linux Kernel
Hi Ingo,
At Tue, 27 Mar 2007 21:14:20 +0200,
Ingo Molnar wrote:
>
>
> * Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> wrote:
>
> > Hi Ingo and all,
> >
> > When I was executing massive interactive processes, I found that some
> > of them occupy CPU time and the others hardly run.
>
> yeah.
>
> > I also attach the test program which easily recreates this problem.
>
> thanks, this is really helpful - does the patch below improve the
> situation?
Thank you for your quick response. Although I'm a bit busy and can't test
it now, I'll probably be able to test it and report the results by the end
of this week.
Satoru
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] scheduler: strange behavor with massive interactive processes
2007-03-27 5:04 ` Mike Galbraith
@ 2007-03-28 8:38 ` Mike Galbraith
2007-03-28 11:45 ` Ingo Molnar
2007-03-31 10:15 ` Satoru Takeuchi
0 siblings, 2 replies; 10+ messages in thread
From: Mike Galbraith @ 2007-03-28 8:38 UTC (permalink / raw)
To: Satoru Takeuchi; +Cc: Linux Kernel, Ingo Molnar
Greetings!
On Tue, 2007-03-27 at 07:04 +0200, Mike Galbraith wrote:
> On Tue, 2007-03-27 at 10:34 +0900, Satoru Takeuchi wrote:
> > When I was executing massive interactive processes, I found that some of them
> > occupy CPU time and the others hardly run.
> >
> > It seems that some of processes which occupy CPU time always has max effective
> > prio (default+5) and the others have max - 1. What happen here is...
> >
> > 1. If there are moderate number of max interactive processes, they can be
> > re-inserted into active queue without falling down its priority again and
> > again.
> > 2. In this case, the others seldom run, and can't get max effective priority
> > at next exhausting because scheduler considers them to sleep too long.
> > 3. Goto 1, OOPS!
>
> Ah, that's a new twist. Cool testcase. A mechanism which was put in
> specifically to prevent tasks from jumping straight to max interactive,
> and thus hurting truly interactive tasks, is starving these not truly
> interactive (but nothing says they couldn't be) tasks.
I puttered around with your testcase a bit, and didn't see interactive
tasks starving other interactive tasks so much as plain old interactive
tasks starving expired tasks, which they're supposed to be able to do,
but to a limited degree. Merely forcing array switches fixes that here,
however array switches can be _very_ rough on interactivity, so I tried
a hybrid of the forced array switch and...
> One way to prevent this may be to do something very simple like (I did
> this here a while back) upon timeslice completion (or possibly better,
> round-robin interval [whenever]), promote a task from a lower priority
> queue. Tasks are then always crawling up the ladder, and _will_ be
> inserted into the stream no matter what is going on in the system.
...the above, and it seems to work. Below is a rough (second) cut,
which tries to preserve interactivity, but also ensure that
non-interactive tasks _will_ receive cpu in a reasonable amount of time.
It only engages when we're not expiring non-interactive tasks, ie should
have no effect until we're exclusively running interactive tasks, but
then it should also prevent the interactive/interactive starvation
scenario as well. It doesn't attempt to enforce fairness, only prevent
terminal starvation. With this patch^W(mess?;) I got the numbers below,
whereas without it, the distribution was pretty ugly. I didn't try it
with many many attackers. (many many false interactive tasks is doomed
unless you detect and switch behavior)
Interactivity still seems to be fine with reasonable non-interactive
loads despite ~reserving more bandwidth for non-interactive tasks. Only
lightly tested, YMMV, and of course the standard guarantee applies ;)
-Mike
root@Homer: ./massive_intr 20 180
009417 00003450
009419 00002965
009423 00003068
009406 00002565
009403 00002964
009421 00002949
009422 00002314
009420 00002879
009418 00002730
009415 00003126
009405 00002737
009408 00003664
009424 00003355
009414 00002947
009416 00004340
009407 00003424
009412 00003491
009404 00003101
009413 00002553
009425 00003396
2.6.21-rc5
--- kernel/sched.c.org 2007-03-27 15:47:49.000000000 +0200
+++ kernel/sched.c 2007-03-28 08:53:17.000000000 +0200
@@ -234,7 +234,8 @@ struct rq {
*/
unsigned long nr_uninterruptible;
- unsigned long expired_timestamp;
+ unsigned long switch_timestamp;
+ unsigned long last_expired;
/* Cached timestamp set by update_cpu_clock() */
unsigned long long most_recent_timestamp;
struct task_struct *curr, *idle;
@@ -3051,9 +3052,9 @@ static inline int expired_starving(struc
{
if (rq->curr->static_prio > rq->best_expired_prio)
return 1;
- if (!STARVATION_LIMIT || !rq->expired_timestamp)
+ if (!STARVATION_LIMIT)
return 0;
- if (jiffies - rq->expired_timestamp > STARVATION_LIMIT * rq->nr_running)
+ if (jiffies - rq->switch_timestamp > STARVATION_LIMIT * rq->nr_running)
return 1;
return 0;
}
@@ -3131,8 +3132,75 @@ void account_steal_time(struct task_stru
cpustat->steal = cputime64_add(cpustat->steal, tmp);
}
+/*
+ * Promote and requeue the next lower priority task. If no task
+ * is available in the active array, switch to the expired array.
+ * @rq: runqueue to search.
+ * @prio: priority at which to begin search.
+ */
+static inline int promote_next_lower(struct rq *rq, int prio)
+{
+ struct prio_array *array = rq->active;
+ unsigned long *bitmap;
+ struct task_struct *p = NULL;
+ unsigned long long now = rq->most_recent_timestamp;
+ int idx = prio + 1, found_noninteractive = 0;
+
+repeat:
+ bitmap = array->bitmap;
+ idx = find_next_bit(bitmap, MAX_PRIO, idx);
+ if (idx < MAX_PRIO) {
+ struct list_head *queue = array->queue + idx;
+
+ p = list_entry(queue->next, struct task_struct, run_list);
+ if (!TASK_INTERACTIVE(p))
+ found_noninteractive = 1;
+
+ /* Skip non-starved queues. */
+ if (now < p->last_ran + JIFFIES_TO_NS(HZ)) {
+ idx++;
+ p = NULL;
+ goto repeat;
+ }
+ } else if (!found_noninteractive && array == rq->active){
+ /* Nobody home. Check the expired array. */
+ array = rq->expired;
+ idx = 0;
+ p = NULL;
+ goto repeat;
+ }
+
+ /* Found one, requeue it. */
+ if (p) {
+ dequeue_task(p, p->array);
+ if (array == rq->active)
+ p->prio--;
+ /*
+ * If we pulled a task from the expired array, correct
+ * expired task info.
+ */
+ else {
+ idx = sched_find_first_bit(array->bitmap);
+ if (idx < MAX_PRIO) {
+ /* FIXME this isn't perfect */
+ if (rq->best_expired_prio > idx)
+ rq->best_expired_prio = idx;
+ } else {
+ rq->best_expired_prio = MAX_PRIO;
+#if 0
+ rq->switch_timestamp = jiffies;
+#endif
+ }
+ }
+ enqueue_task(p, rq->active);
+ }
+ return p != NULL;
+}
+
static void task_running_tick(struct rq *rq, struct task_struct *p)
{
+ static unsigned long warn_me = INITIAL_JIFFIES;
+ static int promoted;
if (p->array != rq->active) {
/* Task has expired but was not scheduled yet */
set_tsk_need_resched(p);
@@ -3162,20 +3230,31 @@ static void task_running_tick(struct rq
goto out_unlock;
}
if (!--p->time_slice) {
+ int was_interactive = TASK_INTERACTIVE(p);
+
dequeue_task(p, rq->active);
set_tsk_need_resched(p);
p->prio = effective_prio(p);
p->time_slice = task_timeslice(p);
p->first_time_slice = 0;
- if (!rq->expired_timestamp)
- rq->expired_timestamp = jiffies;
if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
enqueue_task(p, rq->expired);
if (p->static_prio < rq->best_expired_prio)
rq->best_expired_prio = p->static_prio;
- } else
+ if (!was_interactive)
+ rq->last_expired = jiffies;
+ } else {
+ int limit = STARVATION_LIMIT;
enqueue_task(p, rq->active);
+
+ /*
+ * Start promoting lower priority tasks if we haven't
+ * expired a task within STARVATION_LIMIT tics.
+ */
+ if (time_after(jiffies, rq->last_expired + limit))
+ promoted += promote_next_lower(rq, p->prio);
+ }
} else {
/*
* Prevent a too long timeslice allowing a task to monopolize
@@ -3204,6 +3283,12 @@ static void task_running_tick(struct rq
}
out_unlock:
spin_unlock(&rq->lock);
+ if (time_after(jiffies, warn_me)) {
+ if (promoted)
+ printk(KERN_ERR "%d tasks promoted\n", promoted);
+ promoted = 0;
+ warn_me = jiffies + HZ;
+ }
}
/*
@@ -3356,7 +3441,7 @@ need_resched_nonpreemptible:
idle_balance(cpu, rq);
if (!rq->nr_running) {
next = rq->idle;
- rq->expired_timestamp = 0;
+ rq->switch_timestamp = jiffies;
goto switch_tasks;
}
}
@@ -3370,7 +3455,7 @@ need_resched_nonpreemptible:
rq->active = rq->expired;
rq->expired = array;
array = rq->active;
- rq->expired_timestamp = 0;
+ rq->switch_timestamp = jiffies;
rq->best_expired_prio = MAX_PRIO;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] scheduler: strange behavor with massive interactive processes
2007-03-28 8:38 ` Mike Galbraith
@ 2007-03-28 11:45 ` Ingo Molnar
2007-03-28 11:51 ` Mike Galbraith
2007-03-31 10:15 ` Satoru Takeuchi
1 sibling, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2007-03-28 11:45 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Satoru Takeuchi, Linux Kernel
* Mike Galbraith <efault@gmx.de> wrote:
> + struct task_struct *p = NULL;
(small nit: extra space at the end of line.)
> + rq->best_expired_prio = MAX_PRIO;
> +#if 0
> + rq->switch_timestamp = jiffies;
> +#endif
remove this chunk? (i dont think we want to touch switch_timestamp here)
Ingo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] scheduler: strange behavor with massive interactive processes
2007-03-28 11:45 ` Ingo Molnar
@ 2007-03-28 11:51 ` Mike Galbraith
0 siblings, 0 replies; 10+ messages in thread
From: Mike Galbraith @ 2007-03-28 11:51 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Satoru Takeuchi, Linux Kernel
(oops, wrong button, went without CCs. sorry for duplicate)
On Wed, 2007-03-28 at 13:45 +0200, Ingo Molnar wrote:
> * Mike Galbraith <efault@gmx.de> wrote:
>
> > + struct task_struct *p = NULL;
>
> (small nit: extra space at the end of line.)
>
> > + rq->best_expired_prio = MAX_PRIO;
> > +#if 0
> > + rq->switch_timestamp = jiffies;
> > +#endif
>
> remove this chunk? (i dont think we want to touch switch_timestamp here)
I'll clean it up, and see if I can do better.
-Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] scheduler: strange behavor with massive interactive processes
2007-03-28 1:16 ` Satoru Takeuchi
@ 2007-03-31 8:16 ` Satoru Takeuchi
0 siblings, 0 replies; 10+ messages in thread
From: Satoru Takeuchi @ 2007-03-31 8:16 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Satoru Takeuchi, Linux Kernel
Hi Ingo,
> > > Hi Ingo and all,
> > >
> > > When I was executing massive interactive processes, I found that some
> > > of them occupy CPU time and the others hardly run.
> >
> > yeah.
> >
> > > I also attach the test program which easily recreates this problem.
> >
> > thanks, this is really helpful - does the patch below improve the
> > situation?
I tested your patch and it seems to work well.
Test environment
================
- kernel: 2.6.21-rc5 with or without Ingo's patch
- others: same as my initial mail except for omitting nice 19 cases
Result (without Ingo's patch)
=============================
+---------+-----------+------+------+------+--------+
| # of | # of | avg | max | min | stdev |
| CPUs | processes | (*1) | (*2) | (*3) | (*4) |
+---------+-----------+------+------+------+--------+
| 1(i386) | 200 | 162 | 8258 | 1 | 1113 |
+---------+-----------+------+------+------+--------+
| | | 378 | 9314 | 2 | 1421 |
| 2(ia64) | 400 +------+------+------+--------+
| | | 189 |12544 | 1 | 1443 |
+---------+-----------+------+------+------+--------+
*1) average number of loops among all processes
*2) maximum number of loops among all processes
*3) minimum number of loops among all processes
*4) standard deviation
Result (with Ingo's patch)
==========================
+---------+-----------+------+------+------+--------+
| # of | # of | avg | max | min | stdev |
| CPUs | processes | | | | |
+---------+-----------+------+------+------+--------+
| 1(i386) | 200 | 153 | 232 | 128 | 7.67 |
+---------+-----------+------+------+------+--------+
| | | 376 | 451 | 291 | 17.6 |
| 2(ia64) | 400 +------+------+------+--------+
| | | 188 | 236 | 137 | 14.5 |
+---------+-----------+------+------+------+--------+
Although it is not perfectly fair, it is certain that this patch really improve
the situation in dramatic form. Thank you very much!
Satoru
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] scheduler: strange behavor with massive interactive processes
2007-03-28 8:38 ` Mike Galbraith
2007-03-28 11:45 ` Ingo Molnar
@ 2007-03-31 10:15 ` Satoru Takeuchi
2007-03-31 10:29 ` Mike Galbraith
1 sibling, 1 reply; 10+ messages in thread
From: Satoru Takeuchi @ 2007-03-31 10:15 UTC (permalink / raw)
To: Mike Galbraith; +Cc: Satoru Takeuchi, Linux Kernel, Ingo Molnar
Hi Mike,
> I puttered around with your testcase a bit, and didn't see interactive
> tasks starving other interactive tasks so much as plain old interactive
> tasks starving expired tasks, which they're supposed to be able to do,
I inserted a trace code observing all context switches into the kernel and
confirmed that less than 10 processes having max prio certainly run
continuously and the others (having max - 1 prio) can run only at the
beggining of the program or when runqueue are expired (the chance is about
once a 200 secs in the 200 [procs/CPU] case, and their CPU time is deprived
in only 1 ticks) on each CPUs.
> Interactivity still seems to be fine with reasonable non-interactive
> loads despite ~reserving more bandwidth for non-interactive tasks. Only
> lightly tested, YMMV, and of course the standard guarantee applies ;)
I've only seen your patch briefly and cant' make accurate comment yet. For
the time time being, however, I examined the test which is same as my initial
mail.
Test environment
================
- kernel: 2.6.21-rc5 with or without Mike's patch
- others: same as my initial mail except for omitting nice 19 cases
Result (without Mike's patch)
=============================
+---------+-----------+------+------+------+--------+
| # of | # of | avg | max | min | stdev |
| CPUs | processes | (*1) | (*2) | (*3) | (*4) |
+---------+-----------+------+------+------+--------+
| 1(i386) | 200 | 162 | 8258 | 1 | 1113 |
+---------+-----------+------+------+------+--------+
| | | 378 | 9314 | 2 | 1421 |
| 2(ia64) | 400 +------+------+------+--------+
| | | 189 |12544 | 1 | 1443 |
+---------+-----------+------+------+------+--------+
*1) average number of loops among all processes
*2) maximum number of loops among all processes
*3) minimum number of loops among all processes
*4) standard deviation
Result (with Mike's patch)
==========================
+---------+-----------+------+------+------+--------+
| # of | # of | avg | max | min | stdev |
| CPUs | processes | | | | |
+---------+-----------+------+------+------+--------+
| 1(i386) | 200 | 154 | 1114 | 1 | 210 |
+---------+-----------+------+------+------+--------+
| | | 373 | 1328 | 108 | 246 |
| 2(ia64) | 400 +------+------+------+--------+
| | | 186 | 1169 | 1 | 211 |
+---------+-----------+------+------+------+--------+
I also gatherd tha data, changing # of processors for the 1 CPU(i386):
+---------+-----------+------+------+------+--------+
| # of | # of | avg | max | min | stdev |
| CPUs | processes | | | | |
+---------+-----------+------+------+------+--------+
| | 25 | 1208 | 1787 | 987 | 237 |
| +-----------+------+------+------+--------+
| | 50 | 868 | 1631 | 559 | 275 |
| 1(i386) +-----------+------+------+------+--------+
| | 100 | 319 | 1017 | 25 | 232 |
| +-----------+------+------+------+--------+
| | 200(*1) | 154 | 1114 | 1 | 210 |
+---------+-----------+------+------+------+--------+
*1) Same as the above table, just for easily comparison
It seems to highly depend on # of processes and at present, Ingo's patch
looks better.
Thanks,
Satoru
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG] scheduler: strange behavor with massive interactive processes
2007-03-31 10:15 ` Satoru Takeuchi
@ 2007-03-31 10:29 ` Mike Galbraith
0 siblings, 0 replies; 10+ messages in thread
From: Mike Galbraith @ 2007-03-31 10:29 UTC (permalink / raw)
To: Satoru Takeuchi; +Cc: Linux Kernel, Ingo Molnar
On Sat, 2007-03-31 at 19:15 +0900, Satoru Takeuchi wrote:
> It seems to highly depend on # of processes and at present, Ingo's patch
> looks better.
Yeah, Ingo's patch forces the array switch where I try to avoid it if at
all possible. I'm looking ways to know for sure that you just have to
bite that bullet.
Thanks for testing.
-Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-03-31 10:29 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-27 1:34 [BUG] scheduler: strange behavor with massive interactive processes Satoru Takeuchi
2007-03-27 5:04 ` Mike Galbraith
2007-03-28 8:38 ` Mike Galbraith
2007-03-28 11:45 ` Ingo Molnar
2007-03-28 11:51 ` Mike Galbraith
2007-03-31 10:15 ` Satoru Takeuchi
2007-03-31 10:29 ` Mike Galbraith
2007-03-27 19:14 ` Ingo Molnar
2007-03-28 1:16 ` Satoru Takeuchi
2007-03-31 8:16 ` Satoru Takeuchi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).