LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support
@ 2020-07-15  2:05 Leo Yan
  2020-07-15  2:05 ` [PATCH v2 1/6] sched_clock: Expose struct clock_read_data Leo Yan
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Leo Yan @ 2020-07-15  2:05 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Ahmed S. Darwish, Adrian Hunter, Kan Liang, linux-arm-kernel,
	linux-kernel
  Cc: Leo Yan

This patch set is rebased for Peter's patch set to support
cap_user_time/cap_user_time_short ABI for Arm64, and export Arm arch
timer counter related parameters from kernel to Perf tool.

In this version, there have two changes comparing to Peter's original
patch set [1]:

The first change is for calculation 'time_zero', in the old patch it
used the formula:

  userpg->time_zero -= (rd->epoch_cyc * rd->mult) >> rd->shift;

From the testing, if 'rd->epoch_cyc' is a big counter value, then it's
easily to cause overflow issue when multiply by the 'rd->mult'.  So in
this patch set, it changes to use quot/rem approach for the calculation
and can avoid overflow:

  quot = rd->epoch_cyc >> rd->shift;
  rem = rd->epoch_cyc & (((u64)1 << rd->shift) - 1);
  ns = quot * rd->mult + ((rem * rd->mult) >> rd->shift);
  userpg->time_zero -= ns;

The second change is to add new patch 'tools headers UAPI: Update
tools's copy of linux/perf_event.h', it's used to update perf tool
header so make sure the headers are consistent between kernel and user
space.

This patch set has been rebased on mainline kernel with the latest
commit 11ba468877bb ("Linux 5.8-rc5"); it has been verified with Perf
tool for Arm SPE timestamp enabling, the patch set for Arm SPE timestamp
enabling will be sent out separately.


[1] https://lkml.org/lkml/2020/5/12/481


Leo Yan (1):
  tools headers UAPI: Update tools's copy of linux/perf_event.h

Peter Zijlstra (5):
  sched_clock: Expose struct clock_read_data
  arm64: perf: Implement correct cap_user_time
  arm64: perf: Only advertise cap_user_time for arch_timer
  perf: Add perf_event_mmap_page::cap_user_time_short ABI
  arm64: perf: Add cap_user_time_short

 arch/arm64/kernel/perf_event.c        | 59 ++++++++++++++++++++-------
 include/linux/sched_clock.h           | 28 +++++++++++++
 include/uapi/linux/perf_event.h       | 23 +++++++++--
 kernel/time/sched_clock.c             | 41 ++++++-------------
 tools/include/uapi/linux/perf_event.h | 23 +++++++++--
 5 files changed, 126 insertions(+), 48 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/6] sched_clock: Expose struct clock_read_data
  2020-07-15  2:05 [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Leo Yan
@ 2020-07-15  2:05 ` Leo Yan
  2020-07-15  5:56   ` Ahmed S. Darwish
  2020-07-15  2:05 ` [PATCH v2 2/6] arm64: perf: Implement correct cap_user_time Leo Yan
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Leo Yan @ 2020-07-15  2:05 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Ahmed S. Darwish, Adrian Hunter, Kan Liang, linux-arm-kernel,
	linux-kernel

From: Peter Zijlstra <peterz@infradead.org>

In order to support perf_event_mmap_page::cap_time features, an
architecture needs, aside from a userspace readable counter register,
to expose the exact clock data so that userspace can convert the
counter register into a correct timestamp.

Provide struct clock_read_data and two (seqcount) helpers so that
architectures (arm64 in specific) can expose the numbers to userspace.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched_clock.h | 28 +++++++++++++++++++++++++
 kernel/time/sched_clock.c   | 41 ++++++++++++-------------------------
 2 files changed, 41 insertions(+), 28 deletions(-)

diff --git a/include/linux/sched_clock.h b/include/linux/sched_clock.h
index 0bb04a96a6d4..528718e4ed52 100644
--- a/include/linux/sched_clock.h
+++ b/include/linux/sched_clock.h
@@ -6,6 +6,34 @@
 #define LINUX_SCHED_CLOCK
 
 #ifdef CONFIG_GENERIC_SCHED_CLOCK
+/**
+ * struct clock_read_data - data required to read from sched_clock()
+ *
+ * @epoch_ns:		sched_clock() value at last update
+ * @epoch_cyc:		Clock cycle value at last update.
+ * @sched_clock_mask:   Bitmask for two's complement subtraction of non 64bit
+ *			clocks.
+ * @read_sched_clock:	Current clock source (or dummy source when suspended).
+ * @mult:		Multipler for scaled math conversion.
+ * @shift:		Shift value for scaled math conversion.
+ *
+ * Care must be taken when updating this structure; it is read by
+ * some very hot code paths. It occupies <=40 bytes and, when combined
+ * with the seqcount used to synchronize access, comfortably fits into
+ * a 64 byte cache line.
+ */
+struct clock_read_data {
+	u64 epoch_ns;
+	u64 epoch_cyc;
+	u64 sched_clock_mask;
+	u64 (*read_sched_clock)(void);
+	u32 mult;
+	u32 shift;
+};
+
+extern struct clock_read_data *sched_clock_read_begin(unsigned int *seq);
+extern int sched_clock_read_retry(unsigned int seq);
+
 extern void generic_sched_clock_init(void);
 
 extern void sched_clock_register(u64 (*read)(void), int bits,
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index fa3f800d7d76..0acaadc3156c 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -19,31 +19,6 @@
 
 #include "timekeeping.h"
 
-/**
- * struct clock_read_data - data required to read from sched_clock()
- *
- * @epoch_ns:		sched_clock() value at last update
- * @epoch_cyc:		Clock cycle value at last update.
- * @sched_clock_mask:   Bitmask for two's complement subtraction of non 64bit
- *			clocks.
- * @read_sched_clock:	Current clock source (or dummy source when suspended).
- * @mult:		Multipler for scaled math conversion.
- * @shift:		Shift value for scaled math conversion.
- *
- * Care must be taken when updating this structure; it is read by
- * some very hot code paths. It occupies <=40 bytes and, when combined
- * with the seqcount used to synchronize access, comfortably fits into
- * a 64 byte cache line.
- */
-struct clock_read_data {
-	u64 epoch_ns;
-	u64 epoch_cyc;
-	u64 sched_clock_mask;
-	u64 (*read_sched_clock)(void);
-	u32 mult;
-	u32 shift;
-};
-
 /**
  * struct clock_data - all data needed for sched_clock() (including
  *                     registration of a new clock source)
@@ -93,6 +68,17 @@ static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
 	return (cyc * mult) >> shift;
 }
 
+struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
+{
+	*seq = raw_read_seqcount(&cd.seq);
+	return cd.read_data + (*seq & 1);
+}
+
+int sched_clock_read_retry(unsigned int seq)
+{
+	return read_seqcount_retry(&cd.seq, seq);
+}
+
 unsigned long long notrace sched_clock(void)
 {
 	u64 cyc, res;
@@ -100,13 +86,12 @@ unsigned long long notrace sched_clock(void)
 	struct clock_read_data *rd;
 
 	do {
-		seq = raw_read_seqcount(&cd.seq);
-		rd = cd.read_data + (seq & 1);
+		rd = sched_clock_read_begin(&seq);
 
 		cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
 		      rd->sched_clock_mask;
 		res = rd->epoch_ns + cyc_to_ns(cyc, rd->mult, rd->shift);
-	} while (read_seqcount_retry(&cd.seq, seq));
+	} while (sched_clock_read_retry(seq));
 
 	return res;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 2/6] arm64: perf: Implement correct cap_user_time
  2020-07-15  2:05 [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Leo Yan
  2020-07-15  2:05 ` [PATCH v2 1/6] sched_clock: Expose struct clock_read_data Leo Yan
@ 2020-07-15  2:05 ` Leo Yan
  2020-07-15  8:38   ` Peter Zijlstra
  2020-07-15  2:05 ` [PATCH v2 3/6] arm64: perf: Only advertise cap_user_time for arch_timer Leo Yan
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Leo Yan @ 2020-07-15  2:05 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Ahmed S. Darwish, Adrian Hunter, Kan Liang, linux-arm-kernel,
	linux-kernel

From: Peter Zijlstra <peterz@infradead.org>

As reported by Leo; the existing implementation is broken when the
clock and counter don't intersect at 0.

Use the sched_clock's struct clock_read_data information to correctly
implement cap_user_time and cap_user_time_zero.

Note that the ARM64 counter is architecturally only guaranteed to be
56bit wide (implementations are allowed to be wider) and the existing
perf ABI cannot deal with wrap-around.

This implementation should also be faster than the old; seeing how we
don't need to recompute mult and shift all the time.

[leoyan: Use quot/rem to convert cyc to ns to avoid overflow]

Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/kernel/perf_event.c | 40 ++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 4d7879484cec..35c2c737d4af 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -19,6 +19,7 @@
 #include <linux/of.h>
 #include <linux/perf/arm_pmu.h>
 #include <linux/platform_device.h>
+#include <linux/sched_clock.h>
 #include <linux/smp.h>
 
 /* ARMv8 Cortex-A53 specific event types. */
@@ -1165,28 +1166,49 @@ device_initcall(armv8_pmu_driver_init)
 void arch_perf_update_userpage(struct perf_event *event,
 			       struct perf_event_mmap_page *userpg, u64 now)
 {
-	u32 freq;
-	u32 shift;
+	struct clock_read_data *rd;
+	unsigned int seq;
+	u64 quot, rem, ns;
 
 	/*
 	 * Internal timekeeping for enabled/running/stopped times
 	 * is always computed with the sched_clock.
 	 */
-	freq = arch_timer_get_rate();
 	userpg->cap_user_time = 1;
+	userpg->cap_user_time_zero = 1;
+
+	do {
+		rd = sched_clock_read_begin(&seq);
+
+		userpg->time_mult = rd->mult;
+		userpg->time_shift = rd->shift;
+		userpg->time_zero = rd->epoch_ns;
+
+		/*
+		 * This isn't strictly correct, the ARM64 counter can be
+		 * 'short' and then we get funnies when it wraps. The correct
+		 * thing would be to extend the perf ABI with a cycle and mask
+		 * value, but because wrapping on ARM64 is very rare in
+		 * practise this 'works'.
+		 */
+		quot = rd->epoch_cyc >> rd->shift;
+		rem = rd->epoch_cyc & (((u64)1 << rd->shift) - 1);
+		ns = quot * rd->mult + ((rem * rd->mult) >> rd->shift);
+		userpg->time_zero -= ns;
+
+	} while (sched_clock_read_retry(seq));
+
+	userpg->time_offset = userpg->time_zero - now;
 
-	clocks_calc_mult_shift(&userpg->time_mult, &shift, freq,
-			NSEC_PER_SEC, 0);
 	/*
 	 * time_shift is not expected to be greater than 31 due to
 	 * the original published conversion algorithm shifting a
 	 * 32-bit value (now specifies a 64-bit value) - refer
 	 * perf_event_mmap_page documentation in perf_event.h.
 	 */
-	if (shift == 32) {
-		shift = 31;
+	if (userpg->time_shift == 32) {
+		userpg->time_shift = 31;
 		userpg->time_mult >>= 1;
 	}
-	userpg->time_shift = (u16)shift;
-	userpg->time_offset = -now;
+
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 3/6] arm64: perf: Only advertise cap_user_time for arch_timer
  2020-07-15  2:05 [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Leo Yan
  2020-07-15  2:05 ` [PATCH v2 1/6] sched_clock: Expose struct clock_read_data Leo Yan
  2020-07-15  2:05 ` [PATCH v2 2/6] arm64: perf: Implement correct cap_user_time Leo Yan
@ 2020-07-15  2:05 ` Leo Yan
  2020-07-15  2:05 ` [PATCH v2 4/6] perf: Add perf_event_mmap_page::cap_user_time_short ABI Leo Yan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Leo Yan @ 2020-07-15  2:05 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Ahmed S. Darwish, Adrian Hunter, Kan Liang, linux-arm-kernel,
	linux-kernel

From: Peter Zijlstra <peterz@infradead.org>

When sched_clock is running on anything other than arch_timer, don't
advertise cap_user_time*.

Requested-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/kernel/perf_event.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 35c2c737d4af..76f6afd28b48 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -13,6 +13,8 @@
 #include <asm/sysreg.h>
 #include <asm/virt.h>
 
+#include <clocksource/arm_arch_timer.h>
+
 #include <linux/acpi.h>
 #include <linux/clocksource.h>
 #include <linux/kvm_host.h>
@@ -1170,16 +1172,15 @@ void arch_perf_update_userpage(struct perf_event *event,
 	unsigned int seq;
 	u64 quot, rem, ns;
 
-	/*
-	 * Internal timekeeping for enabled/running/stopped times
-	 * is always computed with the sched_clock.
-	 */
-	userpg->cap_user_time = 1;
-	userpg->cap_user_time_zero = 1;
+	userpg->cap_user_time = 0;
+	userpg->cap_user_time_zero = 0;
 
 	do {
 		rd = sched_clock_read_begin(&seq);
 
+		if (rd->read_sched_clock != arch_timer_read_counter)
+			return;
+
 		userpg->time_mult = rd->mult;
 		userpg->time_shift = rd->shift;
 		userpg->time_zero = rd->epoch_ns;
@@ -1211,4 +1212,10 @@ void arch_perf_update_userpage(struct perf_event *event,
 		userpg->time_mult >>= 1;
 	}
 
+	/*
+	 * Internal timekeeping for enabled/running/stopped times
+	 * is always computed with the sched_clock.
+	 */
+	userpg->cap_user_time = 1;
+	userpg->cap_user_time_zero = 1;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 4/6] perf: Add perf_event_mmap_page::cap_user_time_short ABI
  2020-07-15  2:05 [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Leo Yan
                   ` (2 preceding siblings ...)
  2020-07-15  2:05 ` [PATCH v2 3/6] arm64: perf: Only advertise cap_user_time for arch_timer Leo Yan
@ 2020-07-15  2:05 ` Leo Yan
  2020-07-15  2:05 ` [PATCH v2 5/6] arm64: perf: Add cap_user_time_short Leo Yan
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Leo Yan @ 2020-07-15  2:05 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Ahmed S. Darwish, Adrian Hunter, Kan Liang, linux-arm-kernel,
	linux-kernel

From: Peter Zijlstra <peterz@infradead.org>

In order to support short clock counters, provide an ABI extension.

As a whole:

    u64 time, delta, cyc = read_cycle_counter();

+   if (cap_user_time_short)
+	cyc = time_cycle + ((cyc - time_cycle) & time_mask);

    delta = mul_u64_u32_shr(cyc, time_mult, time_shift);

    if (cap_user_time_zero)
	time = time_zero + delta;

    delta += time_offset;

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/uapi/linux/perf_event.h | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7b2d6fc9e6ed..21a1edd08cbe 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -532,9 +532,10 @@ struct perf_event_mmap_page {
 				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
 
 				cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
-				cap_user_time		: 1, /* The time_* fields are used */
+				cap_user_time		: 1, /* The time_{shift,mult,offset} fields are used */
 				cap_user_time_zero	: 1, /* The time_zero field is used */
-				cap_____res		: 59;
+				cap_user_time_short	: 1, /* the time_{cycle,mask} fields are used */
+				cap_____res		: 58;
 		};
 	};
 
@@ -593,13 +594,29 @@ struct perf_event_mmap_page {
 	 *               ((rem * time_mult) >> time_shift);
 	 */
 	__u64	time_zero;
+
 	__u32	size;			/* Header size up to __reserved[] fields. */
+	__u32	__reserved_1;
+
+	/*
+	 * If cap_usr_time_short, the hardware clock is less than 64bit wide
+	 * and we must compute the 'cyc' value, as used by cap_usr_time, as:
+	 *
+	 *   cyc = time_cycles + ((cyc - time_cycles) & time_mask)
+	 *
+	 * NOTE: this form is explicitly chosen such that cap_usr_time_short
+	 *       is a correction on top of cap_usr_time, and code that doesn't
+	 *       know about cap_usr_time_short still works under the assumption
+	 *       the counter doesn't wrap.
+	 */
+	__u64	time_cycles;
+	__u64	time_mask;
 
 		/*
 		 * Hole for extension of the self monitor capabilities
 		 */
 
-	__u8	__reserved[118*8+4];	/* align to 1k. */
+	__u8	__reserved[116*8];	/* align to 1k. */
 
 	/*
 	 * Control data for the mmap() data buffer.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 5/6] arm64: perf: Add cap_user_time_short
  2020-07-15  2:05 [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Leo Yan
                   ` (3 preceding siblings ...)
  2020-07-15  2:05 ` [PATCH v2 4/6] perf: Add perf_event_mmap_page::cap_user_time_short ABI Leo Yan
@ 2020-07-15  2:05 ` Leo Yan
  2020-07-15  2:05 ` [PATCH v2 6/6] tools headers UAPI: Update tools's copy of linux/perf_event.h Leo Yan
  2020-07-15  5:17 ` [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Ahmed S. Darwish
  6 siblings, 0 replies; 23+ messages in thread
From: Leo Yan @ 2020-07-15  2:05 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Ahmed S. Darwish, Adrian Hunter, Kan Liang, linux-arm-kernel,
	linux-kernel

From: Peter Zijlstra <peterz@infradead.org>

This completes the ARM64 cap_user_time support.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/kernel/perf_event.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 76f6afd28b48..1e0f15305f67 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1174,6 +1174,7 @@ void arch_perf_update_userpage(struct perf_event *event,
 
 	userpg->cap_user_time = 0;
 	userpg->cap_user_time_zero = 0;
+	userpg->cap_user_time_short = 0;
 
 	do {
 		rd = sched_clock_read_begin(&seq);
@@ -1184,13 +1185,13 @@ void arch_perf_update_userpage(struct perf_event *event,
 		userpg->time_mult = rd->mult;
 		userpg->time_shift = rd->shift;
 		userpg->time_zero = rd->epoch_ns;
+		userpg->time_cycles = rd->epoch_cyc;
+		userpg->time_mask = rd->sched_clock_mask;
 
 		/*
-		 * This isn't strictly correct, the ARM64 counter can be
-		 * 'short' and then we get funnies when it wraps. The correct
-		 * thing would be to extend the perf ABI with a cycle and mask
-		 * value, but because wrapping on ARM64 is very rare in
-		 * practise this 'works'.
+		 * Subtract the cycle base, such that software that
+		 * doesn't know about cap_user_time_short still 'works'
+		 * assuming no wraps.
 		 */
 		quot = rd->epoch_cyc >> rd->shift;
 		rem = rd->epoch_cyc & (((u64)1 << rd->shift) - 1);
@@ -1218,4 +1219,5 @@ void arch_perf_update_userpage(struct perf_event *event,
 	 */
 	userpg->cap_user_time = 1;
 	userpg->cap_user_time_zero = 1;
+	userpg->cap_user_time_short = 1;
 }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 6/6] tools headers UAPI: Update tools's copy of linux/perf_event.h
  2020-07-15  2:05 [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Leo Yan
                   ` (4 preceding siblings ...)
  2020-07-15  2:05 ` [PATCH v2 5/6] arm64: perf: Add cap_user_time_short Leo Yan
@ 2020-07-15  2:05 ` Leo Yan
  2020-07-15  5:17 ` [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Ahmed S. Darwish
  6 siblings, 0 replies; 23+ messages in thread
From: Leo Yan @ 2020-07-15  2:05 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Ahmed S. Darwish, Adrian Hunter, Kan Liang, linux-arm-kernel,
	linux-kernel
  Cc: Leo Yan

To get the changes in the commit:

  "perf: Add perf_event_mmap_page::cap_user_time_short ABI"

This update is a prerequisite to add support for short clock counters
related ABI extension.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/include/uapi/linux/perf_event.h | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 7b2d6fc9e6ed..21a1edd08cbe 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -532,9 +532,10 @@ struct perf_event_mmap_page {
 				cap_bit0_is_deprecated	: 1, /* Always 1, signals that bit 0 is zero */
 
 				cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
-				cap_user_time		: 1, /* The time_* fields are used */
+				cap_user_time		: 1, /* The time_{shift,mult,offset} fields are used */
 				cap_user_time_zero	: 1, /* The time_zero field is used */
-				cap_____res		: 59;
+				cap_user_time_short	: 1, /* the time_{cycle,mask} fields are used */
+				cap_____res		: 58;
 		};
 	};
 
@@ -593,13 +594,29 @@ struct perf_event_mmap_page {
 	 *               ((rem * time_mult) >> time_shift);
 	 */
 	__u64	time_zero;
+
 	__u32	size;			/* Header size up to __reserved[] fields. */
+	__u32	__reserved_1;
+
+	/*
+	 * If cap_usr_time_short, the hardware clock is less than 64bit wide
+	 * and we must compute the 'cyc' value, as used by cap_usr_time, as:
+	 *
+	 *   cyc = time_cycles + ((cyc - time_cycles) & time_mask)
+	 *
+	 * NOTE: this form is explicitly chosen such that cap_usr_time_short
+	 *       is a correction on top of cap_usr_time, and code that doesn't
+	 *       know about cap_usr_time_short still works under the assumption
+	 *       the counter doesn't wrap.
+	 */
+	__u64	time_cycles;
+	__u64	time_mask;
 
 		/*
 		 * Hole for extension of the self monitor capabilities
 		 */
 
-	__u8	__reserved[118*8+4];	/* align to 1k. */
+	__u8	__reserved[116*8];	/* align to 1k. */
 
 	/*
 	 * Control data for the mmap() data buffer.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support
  2020-07-15  2:05 [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Leo Yan
                   ` (5 preceding siblings ...)
  2020-07-15  2:05 ` [PATCH v2 6/6] tools headers UAPI: Update tools's copy of linux/perf_event.h Leo Yan
@ 2020-07-15  5:17 ` Ahmed S. Darwish
  2020-07-15  6:29   ` Leo Yan
  6 siblings, 1 reply; 23+ messages in thread
From: Ahmed S. Darwish @ 2020-07-15  5:17 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Adrian Hunter, Kan Liang, linux-arm-kernel, linux-kernel

On Wed, Jul 15, 2020 at 10:05:06AM +0800, Leo Yan wrote:
...
>
> In this version, there have two changes comparing to Peter's original
> patch set [1]:
>
...
>
> [1] https://lkml.org/lkml/2020/5/12/481
>

Nitpick: please avoid using https://lkml.org:

  1) It's a non-official external service
  2) The opaque URLs it uses drop the most important info for uniquely
     identifying e-mails: the Message-Id.

Thus if the site one day goes down, and at times it did, the reference
is almost gone forever.

Use "https://lkml.kernel.org/r/<message-id>". The link becomes:

  https://lkml.kernel.org/r/20200512124058.833263033@infradead.org

thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] sched_clock: Expose struct clock_read_data
  2020-07-15  2:05 ` [PATCH v2 1/6] sched_clock: Expose struct clock_read_data Leo Yan
@ 2020-07-15  5:56   ` Ahmed S. Darwish
  2020-07-15  6:54     ` Leo Yan
  2020-07-15  8:12     ` Peter Zijlstra
  0 siblings, 2 replies; 23+ messages in thread
From: Ahmed S. Darwish @ 2020-07-15  5:56 UTC (permalink / raw)
  To: Leo Yan, Peter Zijlstra
  Cc: Will Deacon, Mark Rutland, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Catalin Marinas,
	Thomas Gleixner, Paul Cercueil, Ben Dooks (Codethink),
	Adrian Hunter, Kan Liang, jogness, linux-arm-kernel,
	linux-kernel

On Wed, Jul 15, 2020 at 10:05:07AM +0800, Leo Yan wrote:
> From: Peter Zijlstra <peterz@infradead.org>
>
...
>
> Provide struct clock_read_data and two (seqcount) helpers so that
> architectures (arm64 in specific) can expose the numbers to userspace.
>
...
>
> +struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
> +{
> +	*seq = raw_read_seqcount(&cd.seq);
> +	return cd.read_data + (*seq & 1);
> +}
> +
...

Hmm, this seqcount_t is actually a latch seqcount. I know the original
code also used raw_read_seqcount(), but while at it, let's use the
proper read API for seqcount_t latchers: raw_read_seqcount_latch().

raw_read_seqcount_latch() has no read memory barrier though, and a
suspicious claim that READ_ONCE() pairs with an smp_wmb() (??). But if
its implementation is wrong, let's fix it there instead.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support
  2020-07-15  5:17 ` [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Ahmed S. Darwish
@ 2020-07-15  6:29   ` Leo Yan
  0 siblings, 0 replies; 23+ messages in thread
From: Leo Yan @ 2020-07-15  6:29 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Adrian Hunter, Kan Liang, linux-arm-kernel, linux-kernel

Hi Ahmed,

On Wed, Jul 15, 2020 at 07:17:15AM +0200, Ahmed S. Darwish wrote:
> On Wed, Jul 15, 2020 at 10:05:06AM +0800, Leo Yan wrote:
> ...
> >
> > In this version, there have two changes comparing to Peter's original
> > patch set [1]:
> >
> ...
> >
> > [1] https://lkml.org/lkml/2020/5/12/481
> >
> 
> Nitpick: please avoid using https://lkml.org:
> 
>   1) It's a non-official external service
>   2) The opaque URLs it uses drop the most important info for uniquely
>      identifying e-mails: the Message-Id.
> 
> Thus if the site one day goes down, and at times it did, the reference
> is almost gone forever.
> 
> Use "https://lkml.kernel.org/r/<message-id>". The link becomes:
> 
>   https://lkml.kernel.org/r/20200512124058.833263033@infradead.org

Thanks for sharing good practice, later will follow this fashion for
using links.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] sched_clock: Expose struct clock_read_data
  2020-07-15  5:56   ` Ahmed S. Darwish
@ 2020-07-15  6:54     ` Leo Yan
  2020-07-15  7:21       ` Ahmed S. Darwish
  2020-07-15  8:12     ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Leo Yan @ 2020-07-15  6:54 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Peter Zijlstra, Will Deacon, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Adrian Hunter, Kan Liang, jogness, linux-arm-kernel,
	linux-kernel

On Wed, Jul 15, 2020 at 07:56:50AM +0200, Ahmed S. Darwish wrote:
> On Wed, Jul 15, 2020 at 10:05:07AM +0800, Leo Yan wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> >
> ...
> >
> > Provide struct clock_read_data and two (seqcount) helpers so that
> > architectures (arm64 in specific) can expose the numbers to userspace.
> >
> ...
> >
> > +struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
> > +{
> > +	*seq = raw_read_seqcount(&cd.seq);
> > +	return cd.read_data + (*seq & 1);
> > +}
> > +
> ...
> 
> Hmm, this seqcount_t is actually a latch seqcount. I know the original
> code also used raw_read_seqcount(), but while at it, let's use the
> proper read API for seqcount_t latchers: raw_read_seqcount_latch().

Good point.  To be honest, I think myself cannot give a good judgement
for memory barrier related thing :)

I read a bit the document for the latch technique [1], comparing to
raw_read_seqcount_latch(), the function raw_read_seqcount() contains
smp_rmb(), IIUC, the *read* memory barrier is used to support for
kcsan.

The usage for smp_rmb() and kcsan flow is like below:

  sched_clock_read_begin()
    `-> raw_read_seqcount()
          `-> smp_rmb()
                `-> kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX)

  sched_clock_read_retry()
    `-> read_seqcount_retry()
          `-> smp_rmb()
                `-> kcsan_atomic_next(0)

So the question is: should we support kcsan or not in this flow?

> raw_read_seqcount_latch() has no read memory barrier though, and a
> suspicious claim that READ_ONCE() pairs with an smp_wmb() (??). But if
> its implementation is wrong, let's fix it there instead.

I don't think we need pair with smp_wmb(), since it's mainly related
with data reading, so a smp_rmb() would be sufficient [2].

Thanks,
Leo

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/seqlock.h?h=v5.8-rc5#n321
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/seqlock.h?h=v5.8-rc5#n373

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] sched_clock: Expose struct clock_read_data
  2020-07-15  6:54     ` Leo Yan
@ 2020-07-15  7:21       ` Ahmed S. Darwish
  0 siblings, 0 replies; 23+ messages in thread
From: Ahmed S. Darwish @ 2020-07-15  7:21 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, Will Deacon, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Adrian Hunter, Kan Liang, jogness, linux-arm-kernel,
	linux-kernel

On Wed, Jul 15, 2020 at 02:54:07PM +0800, Leo Yan wrote:
> On Wed, Jul 15, 2020 at 07:56:50AM +0200, Ahmed S. Darwish wrote:
> > On Wed, Jul 15, 2020 at 10:05:07AM +0800, Leo Yan wrote:
> > > From: Peter Zijlstra <peterz@infradead.org>
> > >
> > ...
> > >
> > > Provide struct clock_read_data and two (seqcount) helpers so that
> > > architectures (arm64 in specific) can expose the numbers to userspace.
> > >
> > ...
> > >
> > > +struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
> > > +{
> > > +	*seq = raw_read_seqcount(&cd.seq);
> > > +	return cd.read_data + (*seq & 1);
> > > +}
> > > +
> > ...
> >
> > Hmm, this seqcount_t is actually a latch seqcount. I know the original
> > code also used raw_read_seqcount(), but while at it, let's use the
> > proper read API for seqcount_t latchers: raw_read_seqcount_latch().
>
> Good point.  To be honest, I think myself cannot give a good judgement
> for memory barrier related thing :)
>
> I read a bit the document for the latch technique [1], comparing to
> raw_read_seqcount_latch(), the function raw_read_seqcount() contains
> smp_rmb(), IIUC, the *read* memory barrier is used to support for
> kcsan.
>

The smp_rmb() has no relation whatsoever to KCSAN. It pairs with the
write memory barriers in the seqcount_t write path.

AFAIK, PeterZ is the author of this patch, so let's wait for his input
here.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] sched_clock: Expose struct clock_read_data
  2020-07-15  5:56   ` Ahmed S. Darwish
  2020-07-15  6:54     ` Leo Yan
@ 2020-07-15  8:12     ` Peter Zijlstra
  2020-07-15  8:14       ` peterz
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-07-15  8:12 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Leo Yan, Will Deacon, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Adrian Hunter, Kan Liang, jogness, linux-arm-kernel,
	linux-kernel

On Wed, Jul 15, 2020 at 07:56:50AM +0200, Ahmed S. Darwish wrote:
> On Wed, Jul 15, 2020 at 10:05:07AM +0800, Leo Yan wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> >
> ...
> >
> > Provide struct clock_read_data and two (seqcount) helpers so that
> > architectures (arm64 in specific) can expose the numbers to userspace.
> >
> ...
> >
> > +struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
> > +{
> > +	*seq = raw_read_seqcount(&cd.seq);
> > +	return cd.read_data + (*seq & 1);
> > +}
> > +
> ...
> 
> Hmm, this seqcount_t is actually a latch seqcount. I know the original
> code also used raw_read_seqcount(), but while at it, let's use the
> proper read API for seqcount_t latchers: raw_read_seqcount_latch().
> 
> raw_read_seqcount_latch() has no read memory barrier though, and a
> suspicious claim that READ_ONCE() pairs with an smp_wmb() (??). But if
> its implementation is wrong, let's fix it there instead.

It's supposed to be a dependent load, so READ_ONCE() is sufficient.
Except, of course, the C standard has other ideas, so a compiler is
allowed to wreck that, but they mostly don't :-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] sched_clock: Expose struct clock_read_data
  2020-07-15  8:12     ` Peter Zijlstra
@ 2020-07-15  8:14       ` peterz
  2020-07-15  9:23         ` Ahmed S. Darwish
  0 siblings, 1 reply; 23+ messages in thread
From: peterz @ 2020-07-15  8:14 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Leo Yan, Will Deacon, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Adrian Hunter, Kan Liang, jogness, linux-arm-kernel,
	linux-kernel

On Wed, Jul 15, 2020 at 10:12:22AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 15, 2020 at 07:56:50AM +0200, Ahmed S. Darwish wrote:
> > On Wed, Jul 15, 2020 at 10:05:07AM +0800, Leo Yan wrote:
> > > From: Peter Zijlstra <peterz@infradead.org>
> > >
> > ...
> > >
> > > Provide struct clock_read_data and two (seqcount) helpers so that
> > > architectures (arm64 in specific) can expose the numbers to userspace.
> > >
> > ...
> > >
> > > +struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
> > > +{
> > > +	*seq = raw_read_seqcount(&cd.seq);
> > > +	return cd.read_data + (*seq & 1);
> > > +}
> > > +
> > ...
> > 
> > Hmm, this seqcount_t is actually a latch seqcount. I know the original
> > code also used raw_read_seqcount(), but while at it, let's use the
> > proper read API for seqcount_t latchers: raw_read_seqcount_latch().
> > 
> > raw_read_seqcount_latch() has no read memory barrier though, and a
> > suspicious claim that READ_ONCE() pairs with an smp_wmb() (??). But if
> > its implementation is wrong, let's fix it there instead.
> 
> It's supposed to be a dependent load, so READ_ONCE() is sufficient.
> Except, of course, the C standard has other ideas, so a compiler is
> allowed to wreck that, but they mostly don't :-)

Also see:

  https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] arm64: perf: Implement correct cap_user_time
  2020-07-15  2:05 ` [PATCH v2 2/6] arm64: perf: Implement correct cap_user_time Leo Yan
@ 2020-07-15  8:38   ` Peter Zijlstra
  2020-07-15 15:39     ` Leo Yan
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-07-15  8:38 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, Mark Rutland, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Catalin Marinas,
	Thomas Gleixner, Paul Cercueil, Ben Dooks (Codethink),
	Ahmed S. Darwish, Adrian Hunter, Kan Liang, linux-arm-kernel,
	linux-kernel

On Wed, Jul 15, 2020 at 10:05:08AM +0800, Leo Yan wrote:

> [leoyan: Use quot/rem to convert cyc to ns to avoid overflow]

> +		quot = rd->epoch_cyc >> rd->shift;
> +		rem = rd->epoch_cyc & (((u64)1 << rd->shift) - 1);
> +		ns = quot * rd->mult + ((rem * rd->mult) >> rd->shift);
> +		userpg->time_zero -= ns;

I think we have mul_u64_u32_shr() for that.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] sched_clock: Expose struct clock_read_data
  2020-07-15  8:14       ` peterz
@ 2020-07-15  9:23         ` Ahmed S. Darwish
  2020-07-15  9:52           ` Peter Zijlstra
  2020-09-10 15:08           ` [tip: locking/core] time/sched_clock: Use raw_read_seqcount_latch() during suspend tip-bot2 for Ahmed S. Darwish
  0 siblings, 2 replies; 23+ messages in thread
From: Ahmed S. Darwish @ 2020-07-15  9:23 UTC (permalink / raw)
  To: peterz
  Cc: Leo Yan, Will Deacon, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Adrian Hunter, Kan Liang, jogness, linux-arm-kernel,
	linux-kernel

On Wed, Jul 15, 2020 at 10:14:43AM +0200, peterz@infradead.org wrote:
> On Wed, Jul 15, 2020 at 10:12:22AM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 15, 2020 at 07:56:50AM +0200, Ahmed S. Darwish wrote:
> > > On Wed, Jul 15, 2020 at 10:05:07AM +0800, Leo Yan wrote:
> > > > From: Peter Zijlstra <peterz@infradead.org>
> > > >
> > > ...
> > > >
> > > > Provide struct clock_read_data and two (seqcount) helpers so that
> > > > architectures (arm64 in specific) can expose the numbers to userspace.
> > > >
> > > ...
> > > >
> > > > +struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
> > > > +{
> > > > +	*seq = raw_read_seqcount(&cd.seq);
> > > > +	return cd.read_data + (*seq & 1);
> > > > +}
> > > > +
> > > ...
> > >
> > > Hmm, this seqcount_t is actually a latch seqcount. I know the original
> > > code also used raw_read_seqcount(), but while at it, let's use the
> > > proper read API for seqcount_t latchers: raw_read_seqcount_latch().
> > >
> > > raw_read_seqcount_latch() has no read memory barrier though, and a
> > > suspicious claim that READ_ONCE() pairs with an smp_wmb() (??). But if
> > > its implementation is wrong, let's fix it there instead.
> >
> > It's supposed to be a dependent load, so READ_ONCE() is sufficient.
> > Except, of course, the C standard has other ideas, so a compiler is
> > allowed to wreck that, but they mostly don't :-)
>
> Also see:
>
>   https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net

Oh, spot on.

Can we then please replace the raw_read_seqcount(), in the original
patch which started this discussion, with raw_read_seqcount_latch()?

I see that you already did something similar for timekeeping.c:
7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()").

Confession time: I have an internal patch series which introduces
seqcount_latch_t. Having a standardized set of accessors for the
seqcount latch read and write paths will make everything much more
streamlined :-)

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/6] sched_clock: Expose struct clock_read_data
  2020-07-15  9:23         ` Ahmed S. Darwish
@ 2020-07-15  9:52           ` Peter Zijlstra
  2020-07-15 11:59             ` [PATCH] time/sched_clock: Use raw_read_seqcount_latch() Ahmed S. Darwish
  2020-09-10 15:08           ` [tip: locking/core] time/sched_clock: Use raw_read_seqcount_latch() during suspend tip-bot2 for Ahmed S. Darwish
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-07-15  9:52 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Leo Yan, Will Deacon, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, Thomas Gleixner, Paul Cercueil,
	Ben Dooks (Codethink),
	Adrian Hunter, Kan Liang, jogness, linux-arm-kernel,
	linux-kernel

On Wed, Jul 15, 2020 at 11:23:45AM +0200, Ahmed S. Darwish wrote:
> 
> Can we then please replace the raw_read_seqcount(), in the original
> patch which started this discussion, with raw_read_seqcount_latch()?

Separate patch please, but ACK for making the change.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] time/sched_clock: Use raw_read_seqcount_latch()
  2020-07-15  9:52           ` Peter Zijlstra
@ 2020-07-15 11:59             ` Ahmed S. Darwish
  2020-07-15 15:29               ` Leo Yan
  0 siblings, 1 reply; 23+ messages in thread
From: Ahmed S. Darwish @ 2020-07-15 11:59 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Leo Yan
  Cc: Will Deacon, John Ogness, Sebastian A. Siewior, LKML, Ahmed S. Darwish

sched_clock uses seqcount_t latching to switch between two storage
places protected by the sequence counter. This allows it to have
interruptible, NMI-safe, seqcount_t write side critical sections.

Since 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"),
raw_read_seqcount_latch() became the standardized way for seqcount_t
latch read paths. Due to the dependent load, it also has one read
memory barrier less than the currently used raw_read_seqcount() API.

Use raw_read_seqcount_latch() for the seqcount_t latch read path.

Link: https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net
Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
References: 1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
---
 kernel/time/sched_clock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index fa3f800d7d76..ea007928d681 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -100,7 +100,7 @@ unsigned long long notrace sched_clock(void)
 	struct clock_read_data *rd;

 	do {
-		seq = raw_read_seqcount(&cd.seq);
+		seq = raw_read_seqcount_latch(&cd.seq);
 		rd = cd.read_data + (seq & 1);

 		cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
--
2.20.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] time/sched_clock: Use raw_read_seqcount_latch()
  2020-07-15 11:59             ` [PATCH] time/sched_clock: Use raw_read_seqcount_latch() Ahmed S. Darwish
@ 2020-07-15 15:29               ` Leo Yan
  2020-07-15 15:58                 ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Leo Yan @ 2020-07-15 15:29 UTC (permalink / raw)
  To: Ahmed S. Darwish
  Cc: Thomas Gleixner, Peter Zijlstra, Will Deacon, John Ogness,
	Sebastian A. Siewior, LKML

Hi Peter, Ahmed,

On Wed, Jul 15, 2020 at 01:59:01PM +0200, Ahmed S. Darwish wrote:
> sched_clock uses seqcount_t latching to switch between two storage
> places protected by the sequence counter. This allows it to have
> interruptible, NMI-safe, seqcount_t write side critical sections.
> 
> Since 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"),
> raw_read_seqcount_latch() became the standardized way for seqcount_t
> latch read paths. Due to the dependent load, it also has one read
> memory barrier less than the currently used raw_read_seqcount() API.
> 
> Use raw_read_seqcount_latch() for the seqcount_t latch read path.
> 
> Link: https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net
> Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
> References: 1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI")
> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> ---
>  kernel/time/sched_clock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
> index fa3f800d7d76..ea007928d681 100644
> --- a/kernel/time/sched_clock.c
> +++ b/kernel/time/sched_clock.c
> @@ -100,7 +100,7 @@ unsigned long long notrace sched_clock(void)
>  	struct clock_read_data *rd;
> 
>  	do {
> -		seq = raw_read_seqcount(&cd.seq);
> +		seq = raw_read_seqcount_latch(&cd.seq);

Understand this is doing the same thing with __ktime_get_fast_ns() and
I saw Peter acked to make change for this.

Just want to confirm, since this patch introduces conflict with the
patch set "arm64: perf: Proper cap_user_time* support" [1], I should
rebase the patch set on top of this patch, right?

Thanks,
Leo

[1] https://patchwork.kernel.org/cover/11664031/

>  		rd = cd.read_data + (seq & 1);
> 
>  		cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
> --
> 2.20.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/6] arm64: perf: Implement correct cap_user_time
  2020-07-15  8:38   ` Peter Zijlstra
@ 2020-07-15 15:39     ` Leo Yan
  0 siblings, 0 replies; 23+ messages in thread
From: Leo Yan @ 2020-07-15 15:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Mark Rutland, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Catalin Marinas,
	Thomas Gleixner, Paul Cercueil, Ben Dooks (Codethink),
	Ahmed S. Darwish, Adrian Hunter, Kan Liang, linux-arm-kernel,
	linux-kernel

Hi Peter,

On Wed, Jul 15, 2020 at 10:38:00AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 15, 2020 at 10:05:08AM +0800, Leo Yan wrote:
> 
> > [leoyan: Use quot/rem to convert cyc to ns to avoid overflow]
> 
> > +		quot = rd->epoch_cyc >> rd->shift;
> > +		rem = rd->epoch_cyc & (((u64)1 << rd->shift) - 1);
> > +		ns = quot * rd->mult + ((rem * rd->mult) >> rd->shift);
> > +		userpg->time_zero -= ns;
> 
> I think we have mul_u64_u32_shr() for that.

Will fix it in next spin.

Thanks for suggestion,
Leo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] time/sched_clock: Use raw_read_seqcount_latch()
  2020-07-15 15:29               ` Leo Yan
@ 2020-07-15 15:58                 ` Peter Zijlstra
  2020-07-16  5:22                   ` Leo Yan
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-07-15 15:58 UTC (permalink / raw)
  To: Leo Yan
  Cc: Ahmed S. Darwish, Thomas Gleixner, Will Deacon, John Ogness,
	Sebastian A. Siewior, LKML

On Wed, Jul 15, 2020 at 11:29:26PM +0800, Leo Yan wrote:
> Hi Peter, Ahmed,
> 
> On Wed, Jul 15, 2020 at 01:59:01PM +0200, Ahmed S. Darwish wrote:
> > sched_clock uses seqcount_t latching to switch between two storage
> > places protected by the sequence counter. This allows it to have
> > interruptible, NMI-safe, seqcount_t write side critical sections.
> > 
> > Since 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"),
> > raw_read_seqcount_latch() became the standardized way for seqcount_t
> > latch read paths. Due to the dependent load, it also has one read
> > memory barrier less than the currently used raw_read_seqcount() API.
> > 
> > Use raw_read_seqcount_latch() for the seqcount_t latch read path.
> > 
> > Link: https://lkml.kernel.org/r/20200625085745.GD117543@hirez.programming.kicks-ass.net
> > Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
> > References: 1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI")
> > Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
> > ---
> >  kernel/time/sched_clock.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
> > index fa3f800d7d76..ea007928d681 100644
> > --- a/kernel/time/sched_clock.c
> > +++ b/kernel/time/sched_clock.c
> > @@ -100,7 +100,7 @@ unsigned long long notrace sched_clock(void)
> >  	struct clock_read_data *rd;
> > 
> >  	do {
> > -		seq = raw_read_seqcount(&cd.seq);
> > +		seq = raw_read_seqcount_latch(&cd.seq);
> 
> Understand this is doing the same thing with __ktime_get_fast_ns() and
> I saw Peter acked to make change for this.
> 
> Just want to confirm, since this patch introduces conflict with the
> patch set "arm64: perf: Proper cap_user_time* support" [1], I should
> rebase the patch set on top of this patch, right?

Or rebase this patch on top of yours and include it, either way.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] time/sched_clock: Use raw_read_seqcount_latch()
  2020-07-15 15:58                 ` Peter Zijlstra
@ 2020-07-16  5:22                   ` Leo Yan
  0 siblings, 0 replies; 23+ messages in thread
From: Leo Yan @ 2020-07-16  5:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ahmed S. Darwish, Thomas Gleixner, Will Deacon, John Ogness,
	Sebastian A. Siewior, LKML

Hi Peter, Ahemd,

On Wed, Jul 15, 2020 at 05:58:50PM +0200, Peter Zijlstra wrote:

[...]

> > > diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
> > > index fa3f800d7d76..ea007928d681 100644
> > > --- a/kernel/time/sched_clock.c
> > > +++ b/kernel/time/sched_clock.c
> > > @@ -100,7 +100,7 @@ unsigned long long notrace sched_clock(void)
> > >  	struct clock_read_data *rd;
> > > 
> > >  	do {
> > > -		seq = raw_read_seqcount(&cd.seq);
> > > +		seq = raw_read_seqcount_latch(&cd.seq);
> > 
> > Understand this is doing the same thing with __ktime_get_fast_ns() and
> > I saw Peter acked to make change for this.
> > 
> > Just want to confirm, since this patch introduces conflict with the
> > patch set "arm64: perf: Proper cap_user_time* support" [1], I should
> > rebase the patch set on top of this patch, right?
> 
> Or rebase this patch on top of yours and include it, either way.

Have rebased this patch and included it in the patch set v3 for
"arm64: perf: Proper cap_user_time* support" [1].

Thanks!
Leo

[1] https://lore.kernel.org/linux-arm-kernel/20200716051130.4359-3-leo.yan@linaro.org/T/#u

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip: locking/core] time/sched_clock: Use raw_read_seqcount_latch() during suspend
  2020-07-15  9:23         ` Ahmed S. Darwish
  2020-07-15  9:52           ` Peter Zijlstra
@ 2020-09-10 15:08           ` tip-bot2 for Ahmed S. Darwish
  1 sibling, 0 replies; 23+ messages in thread
From: tip-bot2 for Ahmed S. Darwish @ 2020-09-10 15:08 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Ahmed S. Darwish, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     58faf20a086bd34f91983609e26eac3d5fe76be3
Gitweb:        https://git.kernel.org/tip/58faf20a086bd34f91983609e26eac3d5fe76be3
Author:        Ahmed S. Darwish <a.darwish@linutronix.de>
AuthorDate:    Thu, 27 Aug 2020 13:40:37 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:28 +02:00

time/sched_clock: Use raw_read_seqcount_latch() during suspend

sched_clock uses seqcount_t latching to switch between two storage
places protected by the sequence counter. This allows it to have
interruptible, NMI-safe, seqcount_t write side critical sections.

Since 7fc26327b756 ("seqlock: Introduce raw_read_seqcount_latch()"),
raw_read_seqcount_latch() became the standardized way for seqcount_t
latch read paths. Due to the dependent load, it has one read memory
barrier less than the currently used raw_read_seqcount() API.

Use raw_read_seqcount_latch() for the suspend path.

Commit aadd6e5caaac ("time/sched_clock: Use raw_read_seqcount_latch()")
missed changing that instance of raw_read_seqcount().

References: 1809bfa44e10 ("timers, sched/clock: Avoid deadlock during read from NMI")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
---
 kernel/time/sched_clock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 1c03eec..8c6b5fe 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -258,7 +258,7 @@ void __init generic_sched_clock_init(void)
  */
 static u64 notrace suspended_sched_clock_read(void)
 {
-	unsigned int seq = raw_read_seqcount(&cd.seq);
+	unsigned int seq = raw_read_seqcount_latch(&cd.seq);
 
 	return cd.read_data[seq & 1].epoch_cyc;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, back to index

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15  2:05 [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Leo Yan
2020-07-15  2:05 ` [PATCH v2 1/6] sched_clock: Expose struct clock_read_data Leo Yan
2020-07-15  5:56   ` Ahmed S. Darwish
2020-07-15  6:54     ` Leo Yan
2020-07-15  7:21       ` Ahmed S. Darwish
2020-07-15  8:12     ` Peter Zijlstra
2020-07-15  8:14       ` peterz
2020-07-15  9:23         ` Ahmed S. Darwish
2020-07-15  9:52           ` Peter Zijlstra
2020-07-15 11:59             ` [PATCH] time/sched_clock: Use raw_read_seqcount_latch() Ahmed S. Darwish
2020-07-15 15:29               ` Leo Yan
2020-07-15 15:58                 ` Peter Zijlstra
2020-07-16  5:22                   ` Leo Yan
2020-09-10 15:08           ` [tip: locking/core] time/sched_clock: Use raw_read_seqcount_latch() during suspend tip-bot2 for Ahmed S. Darwish
2020-07-15  2:05 ` [PATCH v2 2/6] arm64: perf: Implement correct cap_user_time Leo Yan
2020-07-15  8:38   ` Peter Zijlstra
2020-07-15 15:39     ` Leo Yan
2020-07-15  2:05 ` [PATCH v2 3/6] arm64: perf: Only advertise cap_user_time for arch_timer Leo Yan
2020-07-15  2:05 ` [PATCH v2 4/6] perf: Add perf_event_mmap_page::cap_user_time_short ABI Leo Yan
2020-07-15  2:05 ` [PATCH v2 5/6] arm64: perf: Add cap_user_time_short Leo Yan
2020-07-15  2:05 ` [PATCH v2 6/6] tools headers UAPI: Update tools's copy of linux/perf_event.h Leo Yan
2020-07-15  5:17 ` [PATCH v2 0/6] arm64: perf: Proper cap_user_time* support Ahmed S. Darwish
2020-07-15  6:29   ` Leo Yan

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lkml.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lkml.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lkml.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lkml.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lkml.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lkml.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lkml.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lkml.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lkml.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lkml.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lkml.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git