LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* RFC: revert 43fa5460fe60
@ 2015-02-24  0:43 Jörn Engel
  2015-02-24  0:46 ` Jörn Engel
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jörn Engel @ 2015-02-24  0:43 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, Ingo Molnar, Gregory Haskins, linux-kernel

Hello Steven!

I came across a silly problem that tempted me to revert 43fa5460fe60.
We had a high-priority realtime thread woken, TIF_NEED_RESCHED was set
for the running thread and the realtime thread didn't run for >2s.
Problem was a system call that mapped a ton of device memory and never
hit a cond_resched() point.  Obvious solution is to fix the long-running
system call.

Applying that solution quickly turns into a game of whack-a-mole.  Not
the worst game in the world and all those moles surely deserve a solid
hit on the head.  But what is annoying in my case is that I had plenty
of idle cpus during the entire time and the high-priority thread was
allowed to run anywhere.  So if the thread had been moved to a different
runqueue immediately there would have been zero delay.  Sure, the cache
is semi-cold or the migration may even be cross-package.  That is a
tradeoff we are willing to make and we set the cpumask explicitly that
way.  We want this thread to run quickly, anywhere.

So we could check for currently idle cpus when waking realtime threads.
If there are any, immediately move the woken thread over.  Maybe have a
check whether the running thread on the current cpu is in a syscall and
retain current behaviour if not.

Now this is not quite the same as reverting 43fa5460fe60 and I would
like to verify the idea before I spend time on a patch you would never
consider merging anyway.

Jörn

--
As more agents are added, systems become more reliable in the total-effort
case, but less reliable in the weakest-link case.  What are the implications?
Well, software companies schould hire more software testers and fewer (but
more competent) programmers.
-- Ross Anderson

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: revert 43fa5460fe60
  2015-02-24  0:43 RFC: revert 43fa5460fe60 Jörn Engel
@ 2015-02-24  0:46 ` Jörn Engel
  2015-02-24  7:30 ` RFC: revert 43fa5460fe60 ("sched: Try not to migrate higher priority RT tasks") Ingo Molnar
  2015-02-24 15:33 ` RFC: revert 43fa5460fe60 Steven Rostedt
  2 siblings, 0 replies; 6+ messages in thread
From: Jörn Engel @ 2015-02-24  0:46 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel

Looks like Gregory Haskins' email bounces.  Should have figured as much.

Jörn

--
The only real mistake is the one from which we learn nothing.
-- John Powell

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: revert 43fa5460fe60 ("sched: Try not to migrate higher priority RT tasks")
  2015-02-24  0:43 RFC: revert 43fa5460fe60 Jörn Engel
  2015-02-24  0:46 ` Jörn Engel
@ 2015-02-24  7:30 ` Ingo Molnar
  2015-02-24 15:33 ` RFC: revert 43fa5460fe60 Steven Rostedt
  2 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2015-02-24  7:30 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Steven Rostedt, Peter Zijlstra, Ingo Molnar, Gregory Haskins,
	linux-kernel


* Jörn Engel <joern@purestorage.com> wrote:

> Hello Steven!
> 
> I came across a silly problem that tempted me to revert 43fa5460fe60.

So just to save everyone having to go to the Git tree to 
remember which patch that was:

  43fa5460fe60 ("sched: Try not to migrate higher priority RT tasks")

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: revert 43fa5460fe60
  2015-02-24  0:43 RFC: revert 43fa5460fe60 Jörn Engel
  2015-02-24  0:46 ` Jörn Engel
  2015-02-24  7:30 ` RFC: revert 43fa5460fe60 ("sched: Try not to migrate higher priority RT tasks") Ingo Molnar
@ 2015-02-24 15:33 ` Steven Rostedt
  2015-02-24 17:19   ` Jörn Engel
  2 siblings, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2015-02-24 15:33 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Steven Rostedt, Peter Zijlstra, Ingo Molnar, Gregory Haskins,
	linux-kernel

On Tue, 24 Feb 2015 09:55:09 -0500
Jörn Engel <joern@purestorage.com> wrote:

> Hello Steven!
> 
> I came across a silly problem that tempted me to revert 43fa5460fe60.
> We had a high-priority realtime thread woken, TIF_NEED_RESCHED was set
> for the running thread and the realtime thread didn't run for >2s.
> Problem was a system call that mapped a ton of device memory and never
> hit a cond_resched() point.  Obvious solution is to fix the long-running
> system call.

I'm assuming that you are running a non PREEMPT kernel
(PREEMPT_VOLUNTARY doesn't count). If that's the case, then it is very
much likely that you will hit long latencies. Regardless of this change
or not (you will continue to need to whack-a-mole here).

> 
> Applying that solution quickly turns into a game of whack-a-mole.  Not
> the worst game in the world and all those moles surely deserve a solid
> hit on the head.  But what is annoying in my case is that I had plenty
> of idle cpus during the entire time and the high-priority thread was
> allowed to run anywhere.  So if the thread had been moved to a different
> runqueue immediately there would have been zero delay.  Sure, the cache
> is semi-cold or the migration may even be cross-package.  That is a
> tradeoff we are willing to make and we set the cpumask explicitly that
> way.  We want this thread to run quickly, anywhere.
> 
> So we could check for currently idle cpus when waking realtime threads.
> If there are any, immediately move the woken thread over.  Maybe have a
> check whether the running thread on the current cpu is in a syscall and
> retain current behaviour if not.
> 
> Now this is not quite the same as reverting 43fa5460fe60 and I would
> like to verify the idea before I spend time on a patch you would never
> consider merging anyway.

The thing is, the patch you want to revert helped a lot for
CONFIG_PREEMPT kernels. It doesn't make sense to gut a change for a non
CONFIG_PREEMPT kernel as that already suffers huge latencies.

This is considering that you are indeed not running a CONFIG_PREEMPT
kernel, as you stated that it doesn't preempt until it hits a
cond_resched().

-- Steve



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: revert 43fa5460fe60
  2015-02-24 15:33 ` RFC: revert 43fa5460fe60 Steven Rostedt
@ 2015-02-24 17:19   ` Jörn Engel
  2015-02-24 17:41     ` Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: Jörn Engel @ 2015-02-24 17:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Steven Rostedt, Peter Zijlstra, Ingo Molnar, Gregory Haskins,
	linux-kernel

On Tue, Feb 24, 2015 at 10:33:44AM -0500, Steven Rostedt wrote:
> On Tue, 24 Feb 2015 09:55:09 -0500
> Jörn Engel <joern@purestorage.com> wrote:
> 
> > I came across a silly problem that tempted me to revert 43fa5460fe60.
> > We had a high-priority realtime thread woken, TIF_NEED_RESCHED was set
> > for the running thread and the realtime thread didn't run for >2s.
> > Problem was a system call that mapped a ton of device memory and never
> > hit a cond_resched() point.  Obvious solution is to fix the long-running
> > system call.
> 
> I'm assuming that you are running a non PREEMPT kernel
> (PREEMPT_VOLUNTARY doesn't count). If that's the case, then it is very
> much likely that you will hit long latencies. Regardless of this change
> or not (you will continue to need to whack-a-mole here).

Correct.

> > Applying that solution quickly turns into a game of whack-a-mole.  Not
> > the worst game in the world and all those moles surely deserve a solid
> > hit on the head.  But what is annoying in my case is that I had plenty
> > of idle cpus during the entire time and the high-priority thread was
> > allowed to run anywhere.  So if the thread had been moved to a different
> > runqueue immediately there would have been zero delay.  Sure, the cache
> > is semi-cold or the migration may even be cross-package.  That is a
> > tradeoff we are willing to make and we set the cpumask explicitly that
> > way.  We want this thread to run quickly, anywhere.
> > 
> > So we could check for currently idle cpus when waking realtime threads.
> > If there are any, immediately move the woken thread over.  Maybe have a
> > check whether the running thread on the current cpu is in a syscall and
> > retain current behaviour if not.
> > 
> > Now this is not quite the same as reverting 43fa5460fe60 and I would
> > like to verify the idea before I spend time on a patch you would never
> > consider merging anyway.
> 
> The thing is, the patch you want to revert helped a lot for
> CONFIG_PREEMPT kernels. It doesn't make sense to gut a change for a non
> CONFIG_PREEMPT kernel as that already suffers huge latencies.
> 
> This is considering that you are indeed not running a CONFIG_PREEMPT
> kernel, as you stated that it doesn't preempt until it hits a
> cond_resched().

Well, reverting was my first instinct, but for different reasons I think
it is wrong.  Simply reverting can result in the high priority thread
moving from one cpu with a running process to a different cpu with a
running process.  In both cases you may trip over a mole, so nothing
much is gained.

But if you know that the destination cpu is idle, you can avoid any
moles, give or take a small race window maybe.  The moles are still
present and you still need some debug tool to detect them and fix them
over time.  But as cpus increase over time, your chances of getting
lucky in spite of bad kernel code also increase.

Is that a worthwhile approach, at least for non PREEMPT?

Jörn

--
Schrödinger's cat is <BLINK>not</BLINK> dead.
-- Illiad

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RFC: revert 43fa5460fe60
  2015-02-24 17:19   ` Jörn Engel
@ 2015-02-24 17:41     ` Steven Rostedt
  0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2015-02-24 17:41 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Steven Rostedt, Peter Zijlstra, Ingo Molnar, Gregory Haskins,
	linux-kernel

On Tue, 24 Feb 2015 09:19:06 -0800
Jörn Engel <joern@purestorage.com> wrote:

> Well, reverting was my first instinct, but for different reasons I think
> it is wrong.  Simply reverting can result in the high priority thread
> moving from one cpu with a running process to a different cpu with a
> running process.  In both cases you may trip over a mole, so nothing
> much is gained.
> 
> But if you know that the destination cpu is idle, you can avoid any
> moles, give or take a small race window maybe.  The moles are still
> present and you still need some debug tool to detect them and fix them
> over time.  But as cpus increase over time, your chances of getting
> lucky in spite of bad kernel code also increase.
> 
> Is that a worthwhile approach, at least for non PREEMPT?

I don't know. Could probably add it if CONFIG_PREEMPT is not set. Just
check if an idle CPU is available in that case and move it, as non
PREEMPT kernels will have long latencies anyway.

-- Steve

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-02-24 17:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-24  0:43 RFC: revert 43fa5460fe60 Jörn Engel
2015-02-24  0:46 ` Jörn Engel
2015-02-24  7:30 ` RFC: revert 43fa5460fe60 ("sched: Try not to migrate higher priority RT tasks") Ingo Molnar
2015-02-24 15:33 ` RFC: revert 43fa5460fe60 Steven Rostedt
2015-02-24 17:19   ` Jörn Engel
2015-02-24 17:41     ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).