Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Qais Yousef <qais.yousef@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>,
Doug Anderson <dianders@chromium.org>,
Jonathan Corbet <corbet@lwn.net>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Luis Chamberlain <mcgrof@kernel.org>,
Kees Cook <keescook@chromium.org>,
Iurii Zaikin <yzaikin@google.com>,
Quentin Perret <qperret@google.com>,
Valentin Schneider <valentin.schneider@arm.com>,
Patrick Bellasi <patrick.bellasi@matbug.net>,
Pavan Kondeti <pkondeti@codeaurora.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v6 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
Date: Mon, 13 Jul 2020 13:21:25 +0200 [thread overview]
Message-ID: <20200713112125.GG10769@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20200706142839.26629-2-qais.yousef@arm.com>
On Mon, Jul 06, 2020 at 03:28:38PM +0100, Qais Yousef wrote:
> +static void __uclamp_sync_util_min_rt_default(struct task_struct *p)
> +{
> + unsigned int default_util_min;
> + struct uclamp_se *uc_se;
> +
> + WARN_ON_ONCE(!rcu_read_lock_held());
> +
> + if (!rt_task(p))
> + return;
> +
> + uc_se = &p->uclamp_req[UCLAMP_MIN];
> +
> + /* Only sync if user didn't override the default */
> + if (uc_se->user_defined)
> + return;
> +
> + /* Sync with smp_wmb() in uclamp_sync_util_min_rt_default() */
> + smp_rmb();
> + default_util_min = sysctl_sched_uclamp_util_min_rt_default;
> + uclamp_se_set(uc_se, default_util_min, false);
> +}
> +
> +static void uclamp_sync_util_min_rt_default(void)
> +{
> + struct task_struct *g, *p;
> +
> + /*
> + * Make sure the updated sysctl_sched_uclamp_util_min_rt_default which
> + * was just written is synchronized against any future read on another
> + * cpu.
> + */
> + smp_wmb();
> +
> + /*
> + * Wait for all updaters to observe the new change.
> + *
> + * There are 2 races to deal with here:
> + *
> + * 1. fork()->copy_process()
> + *
> + * If a task was concurrently forking, for_each_process_thread()
> + * will not see it, hence it could have copied the old value and
> + * we missed the opportunity to update it.
> + *
> + * This should be handled by sched_post_fork() where it'll ensure
> + * it performs the sync after the fork.
> + *
> + * 2. fork()->sched_post_fork()
> + * __setscheduler_uclamp()
> + *
> + * Both of these functions could read the old value but then get
> + * preempted, during which a user might write new value to
> + * sysctl_sched_uclamp_util_min_rt_default.
> + *
> + * // read sysctl_sched_uclamp_util_min_rt_default;
> + * // PREEMPT-OUT
> + * .
> + * . <-- sync happens here
> + * .
> + * // PREEMPT-IN
> + * // write p->uclamp_req[UCLAMP_MIN]
> + *
> + * That section is protected with rcu_read_lock(), so
> + * synchronize_rcu() will guarantee it has finished before we
> + * perform the update. Hence ensure that this sync happens after
> + * any concurrent sync which should guarantee correctness.
> + */
> + synchronize_rcu();
> +
> + rcu_read_lock();
> + for_each_process_thread(g, p)
> + __uclamp_sync_util_min_rt_default(p);
> + rcu_read_unlock();
> +}
It's monday, and I cannot get my brain working.. I cannot decipher the
comments you have with the smp_[rw]mb(), what actual ordering do they
enforce?
Also, your synchronize_rcu() relies on write_lock() beeing
non-preemptible, which isn't true on PREEMPT_RT.
The below seems simpler...
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1013,8 +1013,6 @@ static void __uclamp_sync_util_min_rt_de
unsigned int default_util_min;
struct uclamp_se *uc_se;
- WARN_ON_ONCE(!rcu_read_lock_held());
-
if (!rt_task(p))
return;
@@ -1024,8 +1022,6 @@ static void __uclamp_sync_util_min_rt_de
if (uc_se->user_defined)
return;
- /* Sync with smp_wmb() in uclamp_sync_util_min_rt_default() */
- smp_rmb();
default_util_min = sysctl_sched_uclamp_util_min_rt_default;
uclamp_se_set(uc_se, default_util_min, false);
}
@@ -1035,47 +1031,21 @@ static void uclamp_sync_util_min_rt_defa
struct task_struct *g, *p;
/*
- * Make sure the updated sysctl_sched_uclamp_util_min_rt_default which
- * was just written is synchronized against any future read on another
- * cpu.
- */
- smp_wmb();
-
- /*
- * Wait for all updaters to observe the new change.
- *
- * There are 2 races to deal with here:
- *
- * 1. fork()->copy_process()
- *
- * If a task was concurrently forking, for_each_process_thread()
- * will not see it, hence it could have copied the old value and
- * we missed the opportunity to update it.
- *
- * This should be handled by sched_post_fork() where it'll ensure
- * it performs the sync after the fork.
- *
- * 2. fork()->sched_post_fork()
- * __setscheduler_uclamp()
- *
- * Both of these functions could read the old value but then get
- * preempted, during which a user might write new value to
- * sysctl_sched_uclamp_util_min_rt_default.
- *
- * // read sysctl_sched_uclamp_util_min_rt_default;
- * // PREEMPT-OUT
- * .
- * . <-- sync happens here
- * .
- * // PREEMPT-IN
- * // write p->uclamp_req[UCLAMP_MIN]
- *
- * That section is protected with rcu_read_lock(), so
- * synchronize_rcu() will guarantee it has finished before we
- * perform the update. Hence ensure that this sync happens after
- * any concurrent sync which should guarantee correctness.
- */
- synchronize_rcu();
+ * copy_process() sysctl_uclamp
+ * uclamp_min_rt = X;
+ * write_lock(&tasklist_lock) read_lock(&tasklist_lock)
+ * // link thread smp_mb__after_spinlock()
+ * write_unlock(&tasklist_lock) read_unlock(&tasklist_lock);
+ * sched_post_fork() for_each_process_thread()
+ * __uclamp_sync_rt() __uclamp_sync_rt()
+ *
+ * Ensures that either sched_post_fork() will observe the new
+ * uclamp_min_rt or for_each_process_thread() will observe the new
+ * task.
+ */
+ read_lock(&tasklist_lock);
+ smp_mb__after_spinlock();
+ read_unlock(&tasklist_lock);
rcu_read_lock();
for_each_process_thread(g, p)
@@ -1408,6 +1378,9 @@ int sysctl_sched_uclamp_handler(struct c
uclamp_update_root_tg();
}
+ if (old_min_rt != sysctl_sched_uclamp_util_min_rt_default)
+ uclamp_sync_util_min_rt_default();
+
/*
* We update all RUNNABLE tasks only when task groups are in use.
* Otherwise, keep it simple and do just a lazy update at each next
@@ -1466,9 +1439,7 @@ static void __setscheduler_uclamp(struct
* at runtime.
*/
if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN)) {
- rcu_read_lock();
__uclamp_sync_util_min_rt_default(p);
- rcu_read_unlock();
} else {
uclamp_se_set(uc_se, uclamp_none(clamp_id), false);
}
@@ -1521,6 +1492,11 @@ static void __init init_uclamp_rq(struct
rq->uclamp_flags = 0;
}
+static void uclamp_post_fork(struct task_struct *p)
+{
+ __uclamp_sync_util_min_rt_default(p);
+}
+
static void __init init_uclamp(void)
{
struct uclamp_se uc_max = {};
next prev parent reply other threads:[~2020-07-13 11:21 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-06 14:28 [PATCH v6 0/2] sched/uclamp: new sysctl for default RT " Qais Yousef
2020-07-06 14:28 ` [PATCH v6 1/2] sched/uclamp: Add a new sysctl to control RT default " Qais Yousef
2020-07-06 15:49 ` Valentin Schneider
2020-07-07 9:34 ` Qais Yousef
2020-07-07 11:30 ` Valentin Schneider
2020-07-07 12:36 ` Qais Yousef
2020-07-08 11:05 ` Valentin Schneider
2020-07-08 13:08 ` Qais Yousef
2020-07-08 21:45 ` Valentin Schneider
2020-07-07 11:39 ` Valentin Schneider
2020-07-07 12:58 ` Qais Yousef
2020-07-13 11:21 ` Peter Zijlstra [this message]
2020-07-13 11:36 ` peterz
2020-07-13 12:12 ` Qais Yousef
2020-07-13 13:35 ` Peter Zijlstra
2020-07-13 14:27 ` Qais Yousef
2020-07-13 16:54 ` Peter Zijlstra
2020-07-13 18:09 ` Qais Yousef
2020-07-06 14:28 ` [PATCH v6 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200713112125.GG10769@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=corbet@lwn.net \
--cc=dianders@chromium.org \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=keescook@chromium.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=patrick.bellasi@matbug.net \
--cc=pkondeti@codeaurora.org \
--cc=qais.yousef@arm.com \
--cc=qperret@google.com \
--cc=rostedt@goodmis.org \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=yzaikin@google.com \
--subject='Re: [PATCH v6 1/2] sched/uclamp: Add a new sysctl to control RT default boost value' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).