LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/5] Documentation updates for v5.15
@ 2021-07-21 20:08 Paul E. McKenney
  2021-07-21 20:08 ` [PATCH rcu 1/5] Documentation/RCU: Fix emphasis markers Paul E. McKenney
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Paul E. McKenney @ 2021-07-21 20:08 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel

Hello!

This series provides documentation updates:

1.	Documentation/RCU: Fix emphasis markers, courtesy of Akira
	Yokosawa.

2.	Documentation/RCU: Fix nested inline markup, courtesy of Akira
	Yokosawa.

3.	Fix a typo in Documentation/RCU/stallwarn.rst, courtesy of
	Haocheng Xie.

4.	Add a quick quiz to explain further why we need
	smp_mb__after_unlock_lock(), courtesy of Frederic Weisbecker.

5.	Update stallwarn.rst with recent changes.

						Thanx, Paul

------------------------------------------------------------------------

 Documentation/RCU/stallwarn.rst                                         |   23 +++++--
 b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst |   29 ++++++++++
 b/Documentation/RCU/Design/Requirements/Requirements.rst                |    8 +-
 b/Documentation/RCU/checklist.rst                                       |   24 ++++----
 b/Documentation/RCU/rcu_dereference.rst                                 |    6 +-
 b/Documentation/RCU/stallwarn.rst                                       |    8 +-
 6 files changed, 69 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH rcu 1/5] Documentation/RCU: Fix emphasis markers
  2021-07-21 20:08 [PATCH tip/core/rcu 0/5] Documentation updates for v5.15 Paul E. McKenney
@ 2021-07-21 20:08 ` Paul E. McKenney
  2021-07-21 20:08 ` [PATCH rcu 2/5] Documentation/RCU: Fix nested inline markup Paul E. McKenney
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2021-07-21 20:08 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Akira Yokosawa,
	Paul E . McKenney

From: Akira Yokosawa <akiyks@gmail.com>

"-foo-" does not work as emphasis in ReST markdown.
Use "*foo*" instead.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/RCU/checklist.rst       | 24 ++++++++++++------------
 Documentation/RCU/rcu_dereference.rst |  6 +++---
 Documentation/RCU/stallwarn.rst       |  8 ++++----
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/Documentation/RCU/checklist.rst b/Documentation/RCU/checklist.rst
index 01cc21f17f7bd..f4545b7c9a63d 100644
--- a/Documentation/RCU/checklist.rst
+++ b/Documentation/RCU/checklist.rst
@@ -37,7 +37,7 @@ over a rather long period of time, but improvements are always welcome!
 
 1.	Does the update code have proper mutual exclusion?
 
-	RCU does allow -readers- to run (almost) naked, but -writers- must
+	RCU does allow *readers* to run (almost) naked, but *writers* must
 	still use some sort of mutual exclusion, such as:
 
 	a.	locking,
@@ -73,7 +73,7 @@ over a rather long period of time, but improvements are always welcome!
 	critical section is every bit as bad as letting them leak out
 	from under a lock.  Unless, of course, you have arranged some
 	other means of protection, such as a lock or a reference count
-	-before- letting them out of the RCU read-side critical section.
+	*before* letting them out of the RCU read-side critical section.
 
 3.	Does the update code tolerate concurrent accesses?
 
@@ -101,7 +101,7 @@ over a rather long period of time, but improvements are always welcome!
 	c.	Make updates appear atomic to readers.	For example,
 		pointer updates to properly aligned fields will
 		appear atomic, as will individual atomic primitives.
-		Sequences of operations performed under a lock will -not-
+		Sequences of operations performed under a lock will *not*
 		appear to be atomic to RCU readers, nor will sequences
 		of multiple atomic primitives.
 
@@ -333,7 +333,7 @@ over a rather long period of time, but improvements are always welcome!
 	for example) may be omitted.
 
 10.	Conversely, if you are in an RCU read-side critical section,
-	and you don't hold the appropriate update-side lock, you -must-
+	and you don't hold the appropriate update-side lock, you *must*
 	use the "_rcu()" variants of the list macros.  Failing to do so
 	will break Alpha, cause aggressive compilers to generate bad code,
 	and confuse people trying to read your code.
@@ -359,12 +359,12 @@ over a rather long period of time, but improvements are always welcome!
 	callback pending, then that RCU callback will execute on some
 	surviving CPU.	(If this was not the case, a self-spawning RCU
 	callback would prevent the victim CPU from ever going offline.)
-	Furthermore, CPUs designated by rcu_nocbs= might well -always-
+	Furthermore, CPUs designated by rcu_nocbs= might well *always*
 	have their RCU callbacks executed on some other CPUs, in fact,
 	for some  real-time workloads, this is the whole point of using
 	the rcu_nocbs= kernel boot parameter.
 
-13.	Unlike other forms of RCU, it -is- permissible to block in an
+13.	Unlike other forms of RCU, it *is* permissible to block in an
 	SRCU read-side critical section (demarked by srcu_read_lock()
 	and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
 	Please note that if you don't need to sleep in read-side critical
@@ -411,16 +411,16 @@ over a rather long period of time, but improvements are always welcome!
 14.	The whole point of call_rcu(), synchronize_rcu(), and friends
 	is to wait until all pre-existing readers have finished before
 	carrying out some otherwise-destructive operation.  It is
-	therefore critically important to -first- remove any path
+	therefore critically important to *first* remove any path
 	that readers can follow that could be affected by the
-	destructive operation, and -only- -then- invoke call_rcu(),
+	destructive operation, and *only then* invoke call_rcu(),
 	synchronize_rcu(), or friends.
 
 	Because these primitives only wait for pre-existing readers, it
 	is the caller's responsibility to guarantee that any subsequent
 	readers will execute safely.
 
-15.	The various RCU read-side primitives do -not- necessarily contain
+15.	The various RCU read-side primitives do *not* necessarily contain
 	memory barriers.  You should therefore plan for the CPU
 	and the compiler to freely reorder code into and out of RCU
 	read-side critical sections.  It is the responsibility of the
@@ -459,8 +459,8 @@ over a rather long period of time, but improvements are always welcome!
 	pass in a function defined within a loadable module, then it in
 	necessary to wait for all pending callbacks to be invoked after
 	the last invocation and before unloading that module.  Note that
-	it is absolutely -not- sufficient to wait for a grace period!
-	The current (say) synchronize_rcu() implementation is -not-
+	it is absolutely *not* sufficient to wait for a grace period!
+	The current (say) synchronize_rcu() implementation is *not*
 	guaranteed to wait for callbacks registered on other CPUs.
 	Or even on the current CPU if that CPU recently went offline
 	and came back online.
@@ -470,7 +470,7 @@ over a rather long period of time, but improvements are always welcome!
 	-	call_rcu() -> rcu_barrier()
 	-	call_srcu() -> srcu_barrier()
 
-	However, these barrier functions are absolutely -not- guaranteed
+	However, these barrier functions are absolutely *not* guaranteed
 	to wait for a grace period.  In fact, if there are no call_rcu()
 	callbacks waiting anywhere in the system, rcu_barrier() is within
 	its rights to return immediately.
diff --git a/Documentation/RCU/rcu_dereference.rst b/Documentation/RCU/rcu_dereference.rst
index f3e587acb4deb..0b418a5b243c5 100644
--- a/Documentation/RCU/rcu_dereference.rst
+++ b/Documentation/RCU/rcu_dereference.rst
@@ -43,7 +43,7 @@ Follow these rules to keep your RCU code working properly:
 	-	Set bits and clear bits down in the must-be-zero low-order
 		bits of that pointer.  This clearly means that the pointer
 		must have alignment constraints, for example, this does
-		-not- work in general for char* pointers.
+		*not* work in general for char* pointers.
 
 	-	XOR bits to translate pointers, as is done in some
 		classic buddy-allocator algorithms.
@@ -174,7 +174,7 @@ Follow these rules to keep your RCU code working properly:
 		Please see the "CONTROL DEPENDENCIES" section of
 		Documentation/memory-barriers.txt for more details.
 
-	-	The pointers are not equal -and- the compiler does
+	-	The pointers are not equal *and* the compiler does
 		not have enough information to deduce the value of the
 		pointer.  Note that the volatile cast in rcu_dereference()
 		will normally prevent the compiler from knowing too much.
@@ -360,7 +360,7 @@ in turn destroying the ordering between this load and the loads of the
 return values.  This can result in "p->b" returning pre-initialization
 garbage values.
 
-In short, rcu_dereference() is -not- optional when you are going to
+In short, rcu_dereference() is *not* optional when you are going to
 dereference the resulting pointer.
 
 
diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index 7148e9be08c34..1cc944aec46f2 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -32,7 +32,7 @@ warnings:
 
 -	Booting Linux using a console connection that is too slow to
 	keep up with the boot-time console-message rate.  For example,
-	a 115Kbaud serial console can be -way- too slow to keep up
+	a 115Kbaud serial console can be *way* too slow to keep up
 	with boot-time message rates, and will frequently result in
 	RCU CPU stall warning messages.  Especially if you have added
 	debug printk()s.
@@ -105,7 +105,7 @@ warnings:
 	leading the realization that the CPU had failed.
 
 The RCU, RCU-sched, and RCU-tasks implementations have CPU stall warning.
-Note that SRCU does -not- have CPU stall warnings.  Please note that
+Note that SRCU does *not* have CPU stall warnings.  Please note that
 RCU only detects CPU stalls when there is a grace period in progress.
 No grace period, no CPU stall warnings.
 
@@ -145,7 +145,7 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
 	this parameter is checked only at the beginning of a cycle.
 	So if you are 10 seconds into a 40-second stall, setting this
 	sysfs parameter to (say) five will shorten the timeout for the
-	-next- stall, or the following warning for the current stall
+	*next* stall, or the following warning for the current stall
 	(assuming the stall lasts long enough).  It will not affect the
 	timing of the next warning for the current stall.
 
@@ -202,7 +202,7 @@ causing stalls, and that the stall was affecting RCU-sched.  This message
 will normally be followed by stack dumps for each CPU.  Please note that
 PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
 the tasks will be indicated by PID, for example, "P3421".  It is even
-possible for an rcu_state stall to be caused by both CPUs -and- tasks,
+possible for an rcu_state stall to be caused by both CPUs *and* tasks,
 in which case the offending CPUs and tasks will all be called out in the list.
 
 CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH rcu 2/5] Documentation/RCU: Fix nested inline markup
  2021-07-21 20:08 [PATCH tip/core/rcu 0/5] Documentation updates for v5.15 Paul E. McKenney
  2021-07-21 20:08 ` [PATCH rcu 1/5] Documentation/RCU: Fix emphasis markers Paul E. McKenney
@ 2021-07-21 20:08 ` Paul E. McKenney
  2021-07-21 20:08 ` [PATCH rcu 3/5] docs: Fix a typo in Documentation/RCU/stallwarn.rst Paul E. McKenney
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2021-07-21 20:08 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Akira Yokosawa,
	Paul E . McKenney

From: Akira Yokosawa <akiyks@gmail.com>

To avoid the ``foo`` markup inside the `bar`__ hyperlink marker,
use the "replace" directive [1].

This should restore the intended appearance of the link.

Tested with sphinx versions 1.7.9 and 2.4.4.

[1]: https://docutils.sourceforge.io/docs/ref/rst/directives.html#replace

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/RCU/Design/Requirements/Requirements.rst | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst
index 38a39476fc248..45278e2974c04 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.rst
+++ b/Documentation/RCU/Design/Requirements/Requirements.rst
@@ -362,9 +362,8 @@ do_something_gp() uses rcu_dereference() to fetch from ``gp``:
       12 }
 
 The rcu_dereference() uses volatile casts and (for DEC Alpha) memory
-barriers in the Linux kernel. Should a `high-quality implementation of
-C11 ``memory_order_consume``
-[PDF] <http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf>`__
+barriers in the Linux kernel. Should a |high-quality implementation of
+C11 memory_order_consume [PDF]|_
 ever appear, then rcu_dereference() could be implemented as a
 ``memory_order_consume`` load. Regardless of the exact implementation, a
 pointer fetched by rcu_dereference() may not be used outside of the
@@ -374,6 +373,9 @@ element has been passed from RCU to some other synchronization
 mechanism, most commonly locking or `reference
 counting <https://www.kernel.org/doc/Documentation/RCU/rcuref.txt>`__.
 
+.. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF]
+.. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf
+
 In short, updaters use rcu_assign_pointer() and readers use
 rcu_dereference(), and these two RCU API elements work together to
 ensure that readers have a consistent view of newly added data elements.
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH rcu 3/5] docs: Fix a typo in Documentation/RCU/stallwarn.rst
  2021-07-21 20:08 [PATCH tip/core/rcu 0/5] Documentation updates for v5.15 Paul E. McKenney
  2021-07-21 20:08 ` [PATCH rcu 1/5] Documentation/RCU: Fix emphasis markers Paul E. McKenney
  2021-07-21 20:08 ` [PATCH rcu 2/5] Documentation/RCU: Fix nested inline markup Paul E. McKenney
@ 2021-07-21 20:08 ` Paul E. McKenney
  2021-07-21 20:08 ` [PATCH rcu 4/5] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock() Paul E. McKenney
  2021-07-21 20:08 ` [PATCH rcu 5/5] doc: Update stallwarn.rst with recent changes Paul E. McKenney
  4 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2021-07-21 20:08 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Haocheng Xie, Paul E . McKenney

From: Haocheng Xie <xiehaocheng.cn@gmail.com>

Add the missing ')' in the documentation.

Signed-off-by: Haocheng Xie <xiehaocheng.cn@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/RCU/stallwarn.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index 1cc944aec46f2..f1c49c626e934 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -224,7 +224,7 @@ is the number that had executed since boot at the time that this CPU
 last noted the beginning of a grace period, which might be the current
 (stalled) grace period, or it might be some earlier grace period (for
 example, if the CPU might have been in dyntick-idle mode for an extended
-time period.  The number after the "/" is the number that have executed
+time period).  The number after the "/" is the number that have executed
 since boot until the current time.  If this latter number stays constant
 across repeated stall-warning messages, it is possible that RCU's softirq
 handlers are no longer able to execute on this CPU.  This can happen if
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH rcu 4/5] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()
  2021-07-21 20:08 [PATCH tip/core/rcu 0/5] Documentation updates for v5.15 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2021-07-21 20:08 ` [PATCH rcu 3/5] docs: Fix a typo in Documentation/RCU/stallwarn.rst Paul E. McKenney
@ 2021-07-21 20:08 ` Paul E. McKenney
  2021-07-21 20:08 ` [PATCH rcu 5/5] doc: Update stallwarn.rst with recent changes Paul E. McKenney
  4 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2021-07-21 20:08 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Frederic Weisbecker,
	Neeraj Upadhyay, Uladzislau Rezki, Boqun Feng, Paul E . McKenney

From: Frederic Weisbecker <frederic@kernel.org>

Add some missing critical pieces of explanation to understand the need
for full memory barriers throughout the whole grace period state machine,
thanks to Paul's explanations.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
[ paulmck: Adjust code block per Akira Yokosawa. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 .../Tree-RCU-Memory-Ordering.rst              | 29 +++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
index 11cdab037bff6..eeb351296df11 100644
--- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
+++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
@@ -112,6 +112,35 @@ on PowerPC.
 The ``smp_mb__after_unlock_lock()`` invocations prevent this
 ``WARN_ON()`` from triggering.
 
++-----------------------------------------------------------------------+
+| **Quick Quiz**:                                                       |
++-----------------------------------------------------------------------+
+| But the chain of rcu_node-structure lock acquisitions guarantees      |
+| that new readers will see all of the updater's pre-grace-period       |
+| accesses and also guarantees that the updater's post-grace-period     |
+| accesses will see all of the old reader's accesses.  So why do we     |
+| need all of those calls to smp_mb__after_unlock_lock()?               |
++-----------------------------------------------------------------------+
+| **Answer**:                                                           |
++-----------------------------------------------------------------------+
+| Because we must provide ordering for RCU's polling grace-period       |
+| primitives, for example, get_state_synchronize_rcu() and              |
+| poll_state_synchronize_rcu().  Consider this code::                   |
+|                                                                       |
+|  CPU 0                                     CPU 1                      |
+|  ----                                      ----                       |
+|  WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)           |
+|  g = get_state_synchronize_rcu()           smp_mb()                   |
+|  while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)          |
+|          continue;                                                    |
+|  r0 = READ_ONCE(Y)                                                    |
+|                                                                       |
+| RCU guarantees that the outcome r0 == 0 && r1 == 0 will not           |
+| happen, even if CPU 1 is in an RCU extended quiescent state           |
+| (idle or offline) and thus won't interact directly with the RCU       |
+| core processing at all.                                               |
++-----------------------------------------------------------------------+
+
 This approach must be extended to include idle CPUs, which need
 RCU's grace-period memory ordering guarantee to extend to any
 RCU read-side critical sections preceding and following the current
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH rcu 5/5] doc: Update stallwarn.rst with recent changes
  2021-07-21 20:08 [PATCH tip/core/rcu 0/5] Documentation updates for v5.15 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2021-07-21 20:08 ` [PATCH rcu 4/5] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock() Paul E. McKenney
@ 2021-07-21 20:08 ` Paul E. McKenney
  4 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2021-07-21 20:08 UTC (permalink / raw)
  To: rcu
  Cc: linux-kernel, kernel-team, mingo, jiangshanlai, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, joel, Paul E. McKenney, Jeff Layton

This commit calls out the possibility of self-detected stalls, adds new
messages, and calls out the use for stack traces.

Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 Documentation/RCU/stallwarn.rst | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index f1c49c626e934..5036df24ae61c 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -189,8 +189,8 @@ rcupdate.rcu_task_stall_timeout
 Interpreting RCU's CPU Stall-Detector "Splats"
 ==============================================
 
-For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
-it will print a message similar to the following::
+For non-RCU-tasks flavors of RCU, when a CPU detects that some other
+CPU is stalling, it will print a message similar to the following::
 
 	INFO: rcu_sched detected stalls on CPUs/tasks:
 	2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
@@ -204,6 +204,8 @@ PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
 the tasks will be indicated by PID, for example, "P3421".  It is even
 possible for an rcu_state stall to be caused by both CPUs *and* tasks,
 in which case the offending CPUs and tasks will all be called out in the list.
+In some cases, CPUs will detect themselves stalling, which will result
+in a self-detected stall.
 
 CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
 the RCU core for the past three grace periods.  In contrast, CPU 16's "(0
@@ -283,7 +285,8 @@ If the relevant grace-period kthread has been unable to run prior to
 the stall warning, as was the case in the "All QSes seen" line above,
 the following additional line is printed::
 
-	kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
+	rcu_sched kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
+	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
 
 Starving the grace-period kthreads of CPU time can of course result
 in RCU CPU stall warnings even when all CPUs and tasks have passed
@@ -313,15 +316,21 @@ is the current ``TIMER_SOFTIRQ`` count on cpu 4.  If this value does not
 change on successive RCU CPU stall warnings, there is further reason to
 suspect a timer problem.
 
+These messages are usually followed by stack dumps of the CPUs and tasks
+involved in the stall.  These stack traces can help you locate the cause
+of the stall, keeping in mind that the CPU detecting the stall will have
+an interrupt frame that is mainly devoted to detecting the stall.
+
 
 Multiple Warnings From One Stall
 ================================
 
-If a stall lasts long enough, multiple stall-warning messages will be
-printed for it.  The second and subsequent messages are printed at
+If a stall lasts long enough, multiple stall-warning messages will
+be printed for it.  The second and subsequent messages are printed at
 longer intervals, so that the time between (say) the first and second
 message will be about three times the interval between the beginning
-of the stall and the first message.
+of the stall and the first message.  It can be helpful to compare the
+stack dumps for the different messages for the same stalled grace period.
 
 
 Stall Warnings for Expedited Grace Periods
-- 
2.31.1.189.g2e36527f23


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-07-21 20:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21 20:08 [PATCH tip/core/rcu 0/5] Documentation updates for v5.15 Paul E. McKenney
2021-07-21 20:08 ` [PATCH rcu 1/5] Documentation/RCU: Fix emphasis markers Paul E. McKenney
2021-07-21 20:08 ` [PATCH rcu 2/5] Documentation/RCU: Fix nested inline markup Paul E. McKenney
2021-07-21 20:08 ` [PATCH rcu 3/5] docs: Fix a typo in Documentation/RCU/stallwarn.rst Paul E. McKenney
2021-07-21 20:08 ` [PATCH rcu 4/5] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock() Paul E. McKenney
2021-07-21 20:08 ` [PATCH rcu 5/5] doc: Update stallwarn.rst with recent changes Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).