LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jason Cooper <jason@lakedaemon.net>,
	Russell King <linux@arm.linux.org.uk>,
	Will Deacon <will.deacon@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Stephen Boyd <sboyd@codeaurora.org>,
	John Stultz <john.stultz@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, patches@linaro.org,
	linaro-kernel@lists.linaro.org,
	Sumit Semwal <sumit.semwal@linaro.org>,
	Dirk Behme <dirk.behme@de.bosch.com>,
	Daniel Drake <drake@endlessm.com>,
	Dmitry Pervushin <dpervushin@gmail.com>,
	Tim Sander <tim@krieglstein.org>
Subject: Re: [PATCH 3.19-rc2 v15 4/8] sched_clock: Avoid deadlock during read from NMI
Date: Sat, 24 Jan 2015 23:40:46 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.11.1501242248170.5526@nanos> (raw)
In-Reply-To: <1422022952-31552-5-git-send-email-daniel.thompson@linaro.org>

On Fri, 23 Jan 2015, Daniel Thompson wrote:
> This patch fixes that problem by providing banked clock data in a
> similar manner to Thomas Gleixner's 4396e058c52e("timekeeping: Provide
> fast and NMI safe access to CLOCK_MONOTONIC").

By some definition of similar.

> -struct clock_data {
> -	ktime_t wrap_kt;
> +struct clock_data_banked {
>  	u64 epoch_ns;
>  	u64 epoch_cyc;
> -	seqcount_t seq;
> -	unsigned long rate;
> +	u64 (*read_sched_clock)(void);
> +	u64 sched_clock_mask;
>  	u32 mult;
>  	u32 shift;
>  	bool suspended;
>  };
>  
> +struct clock_data {
> +	ktime_t wrap_kt;
> +	seqcount_t seq;
> +	unsigned long rate;
> +	struct clock_data_banked bank[2];
> +};

....

> -static u64 __read_mostly (*read_sched_clock)(void) = jiffy_sched_clock_read;
> +static struct clock_data cd = {
> +	.bank = {
> +		[0] = {
> +			.mult	= NSEC_PER_SEC / HZ,
> +			.read_sched_clock = jiffy_sched_clock_read,
> +		},
> +	},
> +};

If you had carefully studied the changes which made it possible to do
the nmi safe clock monotonic accessor then you'd had noticed that I
went a great way to optimize the cache foot print first and then add
this new fangled thing.

So in the first place 'cd' lacks ____cacheline_aligned. It should have
been there before, but that's a different issue. You should have
noticed.

Secondly, I don't see any hint that you actually thought about the
cache foot print of the result struct clock_data. 

struct clock_data {
	ktime_t wrap_kt;
	seqcount_t seq;
	unsigned long rate;
	struct clock_data_banked bank[2];
};

wrap_kt and rate are completely irrelevant for the hotpath. The whole
thing up to the last member of bank[0] still fits into 64 byte on both
32 and 64bit, but that's not by design and not documented so anyone
who is aware of cache foot print issues will go WTF when the first
member of a hot path data structure is completely irrelevant.

>  static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
>  {
> @@ -58,50 +65,82 @@ static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
>  
>  unsigned long long notrace sched_clock(void)
>  {
> -	u64 epoch_ns;
> -	u64 epoch_cyc;
>  	u64 cyc;
>  	unsigned long seq;
> -
> -	if (cd.suspended)
> -		return cd.epoch_ns;
> +	struct clock_data_banked *b;
> +	u64 res;

So we now have

  	u64 cyc;
  	unsigned long seq;
	struct clock_data_banked *b;
	u64 res;

Let me try a different version of that:

	struct clock_data_banked *b;
  	unsigned long seq;
	u64 res, cyc;

Can you spot the difference in the reading experience?
 
>  	do {
> -		seq = raw_read_seqcount_begin(&cd.seq);
> -		epoch_cyc = cd.epoch_cyc;
> -		epoch_ns = cd.epoch_ns;
> +		seq = raw_read_seqcount(&cd.seq);
> +		b = cd.bank + (seq & 1);
> +		if (b->suspended) {
> +			res = b->epoch_ns;

So now we have read_sched_clock as a pointer in the bank. Why do you
still need b->suspended?

What's wrong with setting b->read_sched_clock to NULL at suspend and
restore the proper pointer on resume and use that as a conditional?
 
It would allow the compiler to generate better code, but that's
obviously not the goal here. Darn, this is hot path code and not some
random driver.

> +		} else {
> +			cyc = b->read_sched_clock();
> +			cyc = (cyc - b->epoch_cyc) & b->sched_clock_mask;
> +			res = b->epoch_ns + cyc_to_ns(cyc, b->mult, b->shift);

It would allow the following optimization as well:

   	 res = b->epoch_ns;
	 if (b->read_sched_clock) {
	    	...
	 }

If you think that compilers are smart enough to figure all that out
for you, you might get surprised. The more clear your code is the
better is the chance that the compiler gets it right. We have seen the
opposite of that as well, but that's clearly a compiler bug.

> +/*
> + * Start updating the banked clock data.
> + *
> + * sched_clock will never observe mis-matched data even if called from
> + * an NMI. We do this by maintaining an odd/even copy of the data and
> + * steering sched_clock to one or the other using a sequence counter.
> + * In order to preserve the data cache profile of sched_clock as much
> + * as possible the system reverts back to the even copy when the update
> + * completes; the odd copy is used *only* during an update.
> + *
> + * The caller is responsible for avoiding simultaneous updates.
> + */
> +static struct clock_data_banked *update_bank_begin(void)
> +{
> +	/* update the backup (odd) bank and steer readers towards it */
> +	memcpy(cd.bank + 1, cd.bank, sizeof(struct clock_data_banked));
> +	raw_write_seqcount_latch(&cd.seq);
> +
> +	return cd.bank;
> +}
> +
> +/*
> + * Finalize update of banked clock data.
> + *
> + * This is just a trivial switch back to the primary (even) copy.
> + */
> +static void update_bank_end(void)
> +{
> +	raw_write_seqcount_latch(&cd.seq);
>  }

What's wrong with having a master struct

struct master_data {
	struct clock_data_banked master_data;
	ktime_t wrap_kt;
	unsigned long rate;
	u64 (*real_read_sched_clock)(void);
};

Then you only have to care about the serialization of the master_data
update and then the hotpath data update would be the same simple
function as update_fast_timekeeper(). And it would have the same
ordering scheme and aside of that the resulting code would be simpler,
more intuitive to read and I'm pretty sure faster.

Thanks,

	tglx



  reply	other threads:[~2015-01-24 22:41 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-23 14:22 [PATCH 3.19-rc2 v15 0/8] irq/arm: Implement arch_trigger_all_cpu_backtrace Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 1/8] irqchip: gic: Optimize locking in gic_raise_softirq Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 2/8] irqchip: gic: Make gic_raise_softirq FIQ-safe Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 3/8] irqchip: gic: Introduce plumbing for IPI FIQ Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 4/8] sched_clock: Avoid deadlock during read from NMI Daniel Thompson
2015-01-24 22:40   ` Thomas Gleixner [this message]
2015-01-26 20:28     ` Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 5/8] printk: Simple implementation for NMI backtracing Daniel Thompson
2015-01-24 21:44   ` Thomas Gleixner
2015-01-26 17:21     ` Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 6/8] x86/nmi: Use common printk functions Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 7/8] ARM: Add support for on-demand backtrace of other CPUs Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 8/8] ARM: Fix on-demand backtrace triggered by IRQ Daniel Thompson
2015-02-03 19:06 ` [PATCH 3.19-rc6 v16 0/6] irq/arm: Implement arch_trigger_all_cpu_backtrace Daniel Thompson
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 1/6] irqchip: gic: Optimize locking in gic_raise_softirq Daniel Thompson
2015-02-26 20:31     ` Nicolas Pitre
2015-02-26 21:05       ` Daniel Thompson
2015-02-26 21:33         ` Nicolas Pitre
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 2/6] irqchip: gic: Make gic_raise_softirq FIQ-safe Daniel Thompson
2015-02-26 20:33     ` Nicolas Pitre
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 3/6] irqchip: gic: Introduce plumbing for IPI FIQ Daniel Thompson
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 4/6] printk: Simple implementation for NMI backtracing Daniel Thompson
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 5/6] x86/nmi: Use common printk functions Daniel Thompson
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 6/6] ARM: Add support for on-demand backtrace of other CPUs Daniel Thompson
2015-03-04 10:12 ` [PATCH 4.0-rc1 v17 0/6] irq/arm: Implement arch_trigger_all_cpu_backtrace Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 1/6] irqchip: gic: Optimize locking in gic_raise_softirq Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 2/6] irqchip: gic: Make gic_raise_softirq FIQ-safe Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 3/6] irqchip: gic: Introduce plumbing for IPI FIQ Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 4/6] printk: Simple implementation for NMI backtracing Daniel Thompson
2015-03-04 16:13     ` Joe Perches
2015-03-04 16:20       ` Steven Rostedt
2015-03-04 16:33         ` Daniel Thompson
2015-03-04 17:21           ` Joe Perches
2015-03-05 12:11             ` Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 5/6] x86/nmi: Use common printk functions Daniel Thompson
2015-03-05  0:54     ` Ingo Molnar
2015-03-05 12:29       ` Daniel Thompson
2015-03-05 19:46         ` Ingo Molnar
2015-03-06 19:02           ` Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 6/6] ARM: Add support for on-demand backtrace of other CPUs Daniel Thompson
2015-03-12 13:39 ` [PATCH 4.0-rc2 v18 0/6] irq/arm: Implement arch_trigger_all_cpu_backtrace Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 1/6] irqchip: gic: Optimize locking in gic_raise_softirq Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 2/6] irqchip: gic: Make gic_raise_softirq FIQ-safe Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 3/6] irqchip: gic: Introduce plumbing for IPI FIQ Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 4/6] printk: Simple implementation for NMI backtracing Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 5/6] x86/nmi: Use common printk functions Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 6/6] ARM: Add support for on-demand backtrace of other CPUs Daniel Thompson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.11.1501242248170.5526@nanos \
    --to=tglx@linutronix.de \
    --cc=catalin.marinas@arm.com \
    --cc=daniel.thompson@linaro.org \
    --cc=dirk.behme@de.bosch.com \
    --cc=dpervushin@gmail.com \
    --cc=drake@endlessm.com \
    --cc=jason@lakedaemon.net \
    --cc=john.stultz@linaro.org \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=marc.zyngier@arm.com \
    --cc=patches@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=sboyd@codeaurora.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tim@krieglstein.org \
    --cc=will.deacon@arm.com \
    --subject='Re: [PATCH 3.19-rc2 v15 4/8] sched_clock: Avoid deadlock during read from NMI' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).