LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Mel Gorman <mgorman@techsingularity.net>,
Rik van Riel <riel@surriel.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH 16/19] sched/numa: Detect if node actively handling migration
Date: Mon, 4 Jun 2018 15:30:25 +0530 [thread overview]
Message-ID: <1528106428-19992-17-git-send-email-srikar@linux.vnet.ibm.com> (raw)
In-Reply-To: <1528106428-19992-1-git-send-email-srikar@linux.vnet.ibm.com>
If a node is the destination for a task migration under numa balancing,
then any parallel movements to the node will be restricted. In such a
scenario, detect at the earliest and avoid evaluation for a task
movement.
While here, avoid task migration if the numa imbalance is very minimal.
Especially consider two tasks A and B racing with each other to find the
best cpu to swap. If task A already has found one task/cpu pair to
swap and trying to find a better cpu. Task B is yet to find a better
cpu/task to swap. Task A can race with task B and deprive it from
getting a task/cpu to swap.
Testcase Time: Min Max Avg StdDev
numa01.sh Real: 493.19 672.88 597.51 59.38
numa01.sh Sys: 150.09 245.48 207.76 34.26
numa01.sh User: 41928.51 53779.17 48747.06 3901.39
numa02.sh Real: 60.63 62.87 61.22 0.83
numa02.sh Sys: 16.64 27.97 20.25 4.06
numa02.sh User: 5222.92 5309.60 5254.03 29.98
numa03.sh Real: 821.52 902.15 863.60 32.41
numa03.sh Sys: 112.04 130.66 118.35 7.08
numa03.sh User: 62245.16 69165.14 66443.04 2450.32
numa04.sh Real: 414.53 519.57 476.25 37.00
numa04.sh Sys: 181.84 335.67 280.41 54.07
numa04.sh User: 33924.50 39115.39 37343.78 1934.26
numa05.sh Real: 408.30 441.45 417.90 12.05
numa05.sh Sys: 233.41 381.60 295.58 57.37
numa05.sh User: 33301.31 35972.50 34335.19 938.94
Testcase Time: Min Max Avg StdDev %Change
numa01.sh Real: 428.48 837.17 700.45 162.77 -14.6%
numa01.sh Sys: 78.64 247.70 164.45 58.32 26.33%
numa01.sh User: 37487.25 63728.06 54399.27 10088.13 -10.3%
numa02.sh Real: 60.07 62.65 61.41 0.85 -0.30%
numa02.sh Sys: 15.83 29.36 21.04 4.48 -3.75%
numa02.sh User: 5194.27 5280.60 5236.55 28.01 0.333%
numa03.sh Real: 814.33 881.93 849.69 27.06 1.637%
numa03.sh Sys: 111.45 134.02 125.28 7.69 -5.53%
numa03.sh User: 63007.36 68013.46 65590.46 2023.37 1.299%
numa04.sh Real: 412.19 438.75 424.43 9.28 12.20%
numa04.sh Sys: 232.97 315.77 268.98 26.98 4.249%
numa04.sh User: 33997.30 35292.88 34711.66 415.78 7.582%
numa05.sh Real: 394.88 449.45 424.30 22.53 -1.50%
numa05.sh Sys: 262.03 390.10 314.53 51.01 -6.02%
numa05.sh User: 33389.03 35684.40 34561.34 942.34 -0.65%
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
kernel/sched/fair.c | 37 +++++++++++++++++++++++++++----------
1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c388ecf..6851412 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1535,14 +1535,22 @@ static bool load_too_imbalanced(long src_load, long dst_load,
}
/*
+ * Maximum numa importance can be 1998 (2*999);
+ * SMALLIMP @ 30 would be close to 1998/64.
+ * Used to deter task migration.
+ */
+#define SMALLIMP 30
+
+/*
* This checks if the overall compute and NUMA accesses of the system would
* be improved if the source tasks was migrated to the target dst_cpu taking
* into account that it might be best if task running on the dst_cpu should
* be exchanged with the source task
*/
static void task_numa_compare(struct task_numa_env *env,
- long taskimp, long groupimp, bool move)
+ long taskimp, long groupimp, bool *move)
{
+ pg_data_t *pgdat = NODE_DATA(cpu_to_node(env->dst_cpu));
struct rq *dst_rq = cpu_rq(env->dst_cpu);
struct task_struct *cur;
long src_load, dst_load;
@@ -1554,6 +1562,9 @@ static void task_numa_compare(struct task_numa_env *env,
if (READ_ONCE(dst_rq->numa_migrate_on))
return;
+ if (*move && READ_ONCE(pgdat->active_node_migrate))
+ *move = false;
+
rcu_read_lock();
cur = task_rcu_dereference(&dst_rq->curr);
if (cur && ((cur->flags & PF_EXITING) || is_idle_task(cur)))
@@ -1567,10 +1578,10 @@ static void task_numa_compare(struct task_numa_env *env,
goto unlock;
if (!cur) {
- if (!move || imp <= env->best_imp)
- goto unlock;
- else
+ if (*move && moveimp >= env->best_imp)
goto assign;
+ else
+ goto unlock;
}
/*
@@ -1610,16 +1621,22 @@ static void task_numa_compare(struct task_numa_env *env,
task_weight(cur, env->dst_nid, dist);
}
- if (imp <= env->best_imp)
- goto unlock;
-
- if (move && moveimp > imp && moveimp > env->best_imp) {
- imp = moveimp - 1;
+ if (*move && moveimp > imp && moveimp > env->best_imp) {
+ imp = moveimp;
cur = NULL;
goto assign;
}
/*
+ * If the numa importance is less than SMALLIMP,
+ * task migration might only result in ping pong
+ * of tasks and also hurt performance due to cache
+ * misses.
+ */
+ if (imp < SMALLIMP || imp <= env->best_imp + SMALLIMP / 2)
+ goto unlock;
+
+ /*
* In the overloaded case, try and keep the load balanced.
*/
load = task_h_load(env->p) - task_h_load(cur);
@@ -1675,7 +1692,7 @@ static void task_numa_find_cpu(struct task_numa_env *env,
continue;
env->dst_cpu = cpu;
- task_numa_compare(env, taskimp, groupimp, move);
+ task_numa_compare(env, taskimp, groupimp, &move);
}
}
--
1.8.3.1
next prev parent reply other threads:[~2018-06-04 10:02 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-04 10:00 [PATCH 00/19] Fixes for sched/numa_balancing Srikar Dronamraju
2018-06-04 10:00 ` [PATCH 01/19] sched/numa: Remove redundant field Srikar Dronamraju
2018-06-04 14:53 ` Rik van Riel
2018-06-05 8:41 ` Mel Gorman
2018-06-04 10:00 ` [PATCH 02/19] sched/numa: Evaluate move once per node Srikar Dronamraju
2018-06-04 14:51 ` Rik van Riel
2018-06-04 15:45 ` Srikar Dronamraju
2018-06-04 10:00 ` [PATCH 03/19] sched/numa: Simplify load_too_imbalanced Srikar Dronamraju
2018-06-04 14:57 ` Rik van Riel
2018-06-05 8:46 ` Mel Gorman
2018-06-04 10:00 ` [PATCH 04/19] sched/numa: Set preferred_node based on best_cpu Srikar Dronamraju
2018-06-04 12:18 ` Peter Zijlstra
2018-06-04 12:53 ` Srikar Dronamraju
2018-06-04 12:23 ` Peter Zijlstra
2018-06-04 12:59 ` Srikar Dronamraju
2018-06-04 13:39 ` Peter Zijlstra
2018-06-04 13:48 ` Srikar Dronamraju
2018-06-04 14:37 ` Rik van Riel
2018-06-04 15:56 ` Srikar Dronamraju
2018-06-04 10:00 ` [PATCH 05/19] sched/numa: Use task faults only if numa_group is not yet setup Srikar Dronamraju
2018-06-04 12:24 ` Peter Zijlstra
2018-06-04 13:09 ` Srikar Dronamraju
2018-06-04 10:00 ` [PATCH 06/19] sched/debug: Reverse the order of printing faults Srikar Dronamraju
2018-06-04 16:28 ` Rik van Riel
2018-06-05 8:50 ` Mel Gorman
2018-06-04 10:00 ` [PATCH 07/19] sched/numa: Skip nodes that are at hoplimit Srikar Dronamraju
2018-06-04 16:27 ` Rik van Riel
2018-06-05 8:50 ` Mel Gorman
2018-06-04 10:00 ` [PATCH 08/19] sched/numa: Remove unused task_capacity from numa_stats Srikar Dronamraju
2018-06-04 16:28 ` Rik van Riel
2018-06-05 8:57 ` Mel Gorman
2018-06-04 10:00 ` [PATCH 09/19] sched/numa: Modify migrate_swap to accept additional params Srikar Dronamraju
2018-06-04 17:00 ` Rik van Riel
2018-06-05 8:58 ` Mel Gorman
2018-06-04 10:00 ` [PATCH 10/19] sched/numa: Stop multiple tasks from moving to the cpu at the same time Srikar Dronamraju
2018-06-04 17:57 ` Rik van Riel
2018-06-05 9:51 ` Mel Gorman
2018-06-04 10:00 ` [PATCH 11/19] sched/numa: Restrict migrating in parallel to the same node Srikar Dronamraju
2018-06-04 17:59 ` Rik van Riel
2018-06-05 9:53 ` Mel Gorman
2018-06-06 12:58 ` Srikar Dronamraju
2018-06-04 10:00 ` [PATCH 12/19] sched:numa Remove numa_has_capacity Srikar Dronamraju
2018-06-04 18:07 ` Rik van Riel
2018-06-04 10:00 ` [PATCH 13/19] mm/migrate: Use xchg instead of spinlock Srikar Dronamraju
2018-06-04 18:22 ` Rik van Riel
2018-06-04 19:28 ` Peter Zijlstra
2018-06-05 7:24 ` Srikar Dronamraju
2018-06-05 8:16 ` Peter Zijlstra
2018-06-04 10:00 ` [PATCH 14/19] sched/numa: Updation of scan period need not be in lock Srikar Dronamraju
2018-06-04 18:24 ` Rik van Riel
2018-06-04 10:00 ` [PATCH 15/19] sched/numa: Use group_weights to identify if migration degrades locality Srikar Dronamraju
2018-06-04 18:56 ` Rik van Riel
2018-06-04 10:00 ` Srikar Dronamraju [this message]
2018-06-04 20:05 ` [PATCH 16/19] sched/numa: Detect if node actively handling migration Rik van Riel
2018-06-05 3:56 ` Srikar Dronamraju
2018-06-05 13:07 ` Rik van Riel
2018-06-06 12:55 ` Srikar Dronamraju
2018-06-06 13:55 ` Rik van Riel
2018-06-06 15:32 ` Srikar Dronamraju
2018-06-06 17:06 ` Rik van Riel
2018-06-04 10:00 ` [PATCH 17/19] sched/numa: Pass destination cpu as a parameter to migrate_task_rq Srikar Dronamraju
2018-06-04 10:00 ` [PATCH 18/19] sched/numa: Reset scan rate whenever task moves across nodes Srikar Dronamraju
2018-06-04 20:08 ` Rik van Riel
2018-06-05 9:58 ` Mel Gorman
2018-06-06 13:47 ` Srikar Dronamraju
2018-06-04 10:00 ` [PATCH 19/19] sched/numa: Move task_placement closer to numa_migrate_preferred Srikar Dronamraju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1528106428-19992-17-git-send-email-srikar@linux.vnet.ibm.com \
--to=srikar@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=tglx@linutronix.de \
--subject='Re: [PATCH 16/19] sched/numa: Detect if node actively handling migration' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).