LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [git pull for -mm] CPU isolation extensions (updated2)
@ 2008-02-12  4:10 Max Krasnyansky
  2008-02-12  6:41 ` Nick Piggin
  2008-02-12 18:59 ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Max Krasnyansky @ 2008-02-12  4:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: torvalds, LKML, Ingo Molnar, Paul Jackson, Peter Zijlstra,
	gregkh, Rusty Russell

Andrew, looks like Linus decided not to pull this stuff.
Can we please put it into -mm then.

My tree is here
	git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git
Please use 'master' branch (or 'for-linus' they are identical).

There are no changes since last time I sent it. Details below.
Patches were sent out two days ago. I can resend them if needed.

Thanx
Max

----
 
Diffstat:
 Documentation/ABI/testing/sysfs-devices-system-cpu |   41 +++++++
 Documentation/cpu-isolation.txt                    |  113 +++++++++++++++++++++
 arch/x86/Kconfig                                   |    1 
 arch/x86/kernel/genapic_flat_64.c                  |    4 
 drivers/base/cpu.c                                 |   48 ++++++++
 include/linux/cpumask.h                            |    3 
 kernel/Kconfig.cpuisol                             |   42 +++++++
 kernel/Makefile                                    |    4 
 kernel/cpu.c                                       |   54 ++++++++++
 kernel/sched.c                                     |   36 ------
 kernel/stop_machine.c                              |    8 +
 kernel/workqueue.c                                 |   30 ++++-
 12 files changed, 337 insertions(+), 47 deletions(-)

This addresses all Andrew's comments for the last submission. Details here:
   http://marc.info/?l=linux-kernel&m=120236394012766&w=2

There are no code changes since last time, besides minor fix for moving on-stack array 
to __initdata as suggested by Andrew. Other stuff is just documentation updates. 

List of commits
   cpuisol: Make cpu isolation configrable and export isolated map
   cpuisol: Do not route IRQs to the CPUs isolated at boot
   cpuisol: Do not schedule workqueues on the isolated CPUs
   cpuisol: Move on-stack array used for boot cmd parsing into __initdata
   cpuisol: Documentation updates
   cpuisol: Minor updates to the Kconfig options
   cpuisol: Do not halt isolated CPUs with Stop Machine

I suggested by Ingo I'm CC'ing everyone who is even remotely connected/affected ;-)
 
Ingo, Peter - Scheduler.
   There are _no_ changes in this area besides moving cpu_*_map maps from kerne/sched.c 
   to kernel/cpu.c.
    
Paul - Cpuset
   Again there are _no_ changes in this area.
   For reasons why cpuset is not the right mechanism for cpu isolation see this thread
      http://marc.info/?l=linux-kernel&m=120180692331461&w=2

Rusty - Stop machine.
   After doing a bunch of testing last three days I actually downgraded stop machine 
   changes from [highly experimental] to simply [experimental]. Pleas see this thread 
   for more info: http://marc.info/?l=linux-kernel&m=120243837206248&w=2
   Short story is that I ran several insmod/rmmod workloads on live multi-core boxes 
   with stop machine _completely_ disabled and did no see any issues. Rusty did not get
   a chance to reply yet, I hopping that we'll be able to make "stop machine" completely
   optional for some configurations.

Gerg - ABI documentation.
   Nothing interesting here. I simply added 
	Documentation/ABI/testing/sysfs-devices-system-cpu
   and documented some of the attributes exposed in there.
   Suggested by Andrew.

I believe this is ready for the inclusion and my impression is that Andrew is ok with that. 
Most changes are very simple and do not affect existing behavior. As I mentioned before I've 
been using Workqueue and StopMachine changes in production for a couple of years now and have 
high confidence in them. Yet they are marked as experimental for now, just to be safe.

My original explanation is included below.

btw I'll be out skiing/snow boarding for the next 4 days and will have sporadic email access.
Will do my best to address question/concerns (if any) during that time.

Thanx
Max

----------------------------------------------------------------------------------------------
This patch series extends CPU isolation support. Yes, most people want to virtuallize 
CPUs these days and I want to isolate them  :) .

The primary idea here is to be able to use some CPU cores as the dedicated engines for running
user-space code with minimal kernel overhead/intervention, think of it as an SPE in the 
Cell processor. I'd like to be able to run a CPU intensive (%100) RT task on one of the 
processors without adversely affecting or being affected by the other system activities. 
System activities here include _kernel_ activities as well. 

I'm personally using this for hard realtime purposes. With CPU isolation it's very easy to 
achieve single digit usec worst case and around 200 nsec average response times on off-the-shelf
multi- processor/core systems (vanilla kernel plus these patches) even under extreme system load. 
I'm working with legal folks on releasing hard RT user-space framework for that.
I believe with the current multi-core CPU trend we will see more and more applications that 
explore this capability: RT gaming engines, simulators, hard RT apps, etc.

Hence the proposal is to extend current CPU isolation feature.
The new definition of the CPU isolation would be:
---
1. Isolated CPU(s) must not be subject to scheduler load balancing
  Users must explicitly bind threads in order to run on those CPU(s).

2. By default interrupts must not be routed to the isolated CPU(s)
  User must route interrupts (if any) to those CPUs explicitly.

3. In general kernel subsystems must avoid activity on the isolated CPU(s) as much as possible
  Includes workqueues, per CPU threads, etc.
  This feature is configurable and is disabled by default.  
---

I've been maintaining this stuff since around 2.6.18 and it's been running in production
environment for a couple of years now. It's been tested on all kinds of machines, from NUMA
boxes like HP xw9300/9400 to tiny uTCA boards like Mercury AXA110.
The messiest part used to be SLAB garbage collector changes. With the new SLUB all that mess 
goes away (ie no changes necessary). Also CFS seems to handle CPU hotplug much better than O(1) 
did (ie domains are recomputed dynamically) so that isolation can be done at any time (via sysfs). 
So this seems like a good time to merge. 

We've had scheduler support for CPU isolation ever since O(1) scheduler went it. In other words
#1 is already supported. These patches do not change/affect that functionality in any way. 
#2 is trivial one liner change to the IRQ init code. 
#3 is addressed by a couple of separate patches. The main problem here is that RT thread can prevent
kernel threads from running and machine gets stuck because other CPUs are waiting for those threads
to run and report back.

Folks involved in the scheduler/cpuset development provided a lot of feedback on the first series
of patches. I believe I managed to explain and clarify every aspect. 
Paul Jackson initially suggested to implement #2 and #3 using cpusets subsystem. Paul and I looked 
at it more closely and determined that exporting cpu_isolated_map instead is a better option. 

Last patch to the stop machine is potentially unsafe and is marked as highly experimental. Unfortunately 
it's currently the only option that allows dynamic module insertion/removal for above scenarios. 
If people still feel that it's toooo ugly I can revert that change and keep it in the separate tree 
for now.

Thanx
Max

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-12  4:10 [git pull for -mm] CPU isolation extensions (updated2) Max Krasnyansky
@ 2008-02-12  6:41 ` Nick Piggin
  2008-02-12  6:44   ` David Miller
  2008-02-12 18:59 ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2008-02-12  6:41 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Andrew Morton, torvalds, LKML, Ingo Molnar, Paul Jackson,
	Peter Zijlstra, gregkh, Rusty Russell

On Tuesday 12 February 2008 15:10, Max Krasnyansky wrote:

> Rusty - Stop machine.
>    After doing a bunch of testing last three days I actually downgraded
> stop machine changes from [highly experimental] to simply [experimental].
> Pleas see this thread for more info:
> http://marc.info/?l=linux-kernel&m=120243837206248&w=2 Short story is that
> I ran several insmod/rmmod workloads on live multi-core boxes with stop
> machine _completely_ disabled and did no see any issues. Rusty did not get
> a chance to reply yet, I hopping that we'll be able to make "stop machine"
> completely optional for some configurations.

stop machine is used for more than just module loading and unloading.
I don't think you can just disable it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-12  6:41 ` Nick Piggin
@ 2008-02-12  6:44   ` David Miller
  2008-02-13  3:32     ` Max Krasnyansky
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2008-02-12  6:44 UTC (permalink / raw)
  To: nickpiggin
  Cc: maxk, akpm, torvalds, linux-kernel, mingo, pj, a.p.zijlstra,
	gregkh, rusty

From: Nick Piggin <nickpiggin@yahoo.com.au>
Date: Tue, 12 Feb 2008 17:41:21 +1100

> stop machine is used for more than just module loading and unloading.
> I don't think you can just disable it.

Right, in particular it is used for CPU hotplug.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-12  4:10 [git pull for -mm] CPU isolation extensions (updated2) Max Krasnyansky
  2008-02-12  6:41 ` Nick Piggin
@ 2008-02-12 18:59 ` Peter Zijlstra
  2008-02-13  3:59   ` Max Krasnyansky
  2008-02-13  5:19   ` Steven Rostedt
  1 sibling, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2008-02-12 18:59 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: Andrew Morton, torvalds, LKML, Ingo Molnar, Paul Jackson, gregkh,
	Rusty Russell, Oleg Nesterov, Thomas Gleixner, Steven Rostedt


On Mon, 2008-02-11 at 20:10 -0800, Max Krasnyansky wrote:
> Andrew, looks like Linus decided not to pull this stuff.
> Can we please put it into -mm then.
> 
> My tree is here
> 	git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git
> Please use 'master' branch (or 'for-linus' they are identical).

I'm wondering why you insist on offering a git tree that bypasses the
regular maintainers. Why not post the patches and walk the normal route?

To me this feels rather aggressive, which makes me feel less inclined to
look at it.

> ----
>  
> Diffstat:
>  Documentation/ABI/testing/sysfs-devices-system-cpu |   41 +++++++
>  Documentation/cpu-isolation.txt                    |  113 +++++++++++++++++++++
>  arch/x86/Kconfig                                   |    1 
>  arch/x86/kernel/genapic_flat_64.c                  |    4 
>  drivers/base/cpu.c                                 |   48 ++++++++
>  include/linux/cpumask.h                            |    3 
>  kernel/Kconfig.cpuisol                             |   42 +++++++
>  kernel/Makefile                                    |    4 
>  kernel/cpu.c                                       |   54 ++++++++++
>  kernel/sched.c                                     |   36 ------
>  kernel/stop_machine.c                              |    8 +
>  kernel/workqueue.c                                 |   30 ++++-
>  12 files changed, 337 insertions(+), 47 deletions(-)
> 
> This addresses all Andrew's comments for the last submission. Details here:
>    http://marc.info/?l=linux-kernel&m=120236394012766&w=2
> 
> There are no code changes since last time, besides minor fix for moving on-stack array 
> to __initdata as suggested by Andrew. Other stuff is just documentation updates. 
> 
> List of commits
>    cpuisol: Make cpu isolation configrable and export isolated map
>    cpuisol: Do not route IRQs to the CPUs isolated at boot
>    cpuisol: Do not schedule workqueues on the isolated CPUs
>    cpuisol: Move on-stack array used for boot cmd parsing into __initdata
>    cpuisol: Documentation updates
>    cpuisol: Minor updates to the Kconfig options
>    cpuisol: Do not halt isolated CPUs with Stop Machine
> 
> I suggested by Ingo I'm CC'ing everyone who is even remotely connected/affected ;-)

You forgot Oleg, he does a lot of the workqueue work.

I'm worried by your approach to never start any workqueue on these cpus.
Like you said, it breaks Oprofile and others who depend on cpu local
workqueues being present.

Under normal circumstances these workqueues will not do any work,
someone needs to provide work for them. That is, workqueues are passive.

So I think your approach is the wrong way about. Instead of taking the
workqueue away, take away those that generate the work.

> Ingo, Peter - Scheduler.
>    There are _no_ changes in this area besides moving cpu_*_map maps from kerne/sched.c 
>    to kernel/cpu.c.

Ingo (and Thomas) do the genirq bits

The IRQ isolation in concept isn't wrong. But it seems to me that
arch/x86/kernel/genapic_flat_64.c isn't the best place to do this.
It just considers one architecture, if you do this, please make it work
across all.

> Paul - Cpuset
>    Again there are _no_ changes in this area.
>    For reasons why cpuset is not the right mechanism for cpu isolation see this thread
>       http://marc.info/?l=linux-kernel&m=120180692331461&w=2
> 
> Rusty - Stop machine.
>    After doing a bunch of testing last three days I actually downgraded stop machine 
>    changes from [highly experimental] to simply [experimental]. Pleas see this thread 
>    for more info: http://marc.info/?l=linux-kernel&m=120243837206248&w=2
>    Short story is that I ran several insmod/rmmod workloads on live multi-core boxes 
>    with stop machine _completely_ disabled and did no see any issues. Rusty did not get
>    a chance to reply yet, I hopping that we'll be able to make "stop machine" completely
>    optional for some configurations.

I too am thinking this is very wrong, stop machine is used by a lot of
things, including those that modify the kernel code. You really need to
replace all stop machine users with a more robust solution before you
can do this.

> Gerg - ABI documentation.
>    Nothing interesting here. I simply added 
> 	Documentation/ABI/testing/sysfs-devices-system-cpu
>    and documented some of the attributes exposed in there.
>    Suggested by Andrew.

Not having seen the latest patches; I'm still not fond of the isolation
interface.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-12  6:44   ` David Miller
@ 2008-02-13  3:32     ` Max Krasnyansky
  2008-02-13  4:11       ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Max Krasnyansky @ 2008-02-13  3:32 UTC (permalink / raw)
  To: David Miller
  Cc: nickpiggin, akpm, torvalds, linux-kernel, mingo, pj,
	a.p.zijlstra, gregkh, rusty

David Miller wrote:
> From: Nick Piggin <nickpiggin@yahoo.com.au>
> Date: Tue, 12 Feb 2008 17:41:21 +1100
> 
>> stop machine is used for more than just module loading and unloading.
>> I don't think you can just disable it.
> 
> Right, in particular it is used for CPU hotplug.
Ooops. Totally missed that. And a bunch of other places.

[maxk@duo2 cpuisol-2.6.git]$ git grep -l stop_machine_run
Documentation/cpu-hotplug.txt
arch/s390/kernel/kprobes.c
drivers/char/hw_random/intel-rng.c
include/linux/stop_machine.h
kernel/cpu.c
kernel/module.c
kernel/stop_machine.c
mm/page_alloc.c

I wonder why I did not see any issues when I disabled stop machine completely.
I mentioned in the other thread that I commented out the part that actually halts
the machine and ran it for several hours on my dual core laptop and on the quad
core server. Tried all kinds of workloads, which include constant module removal
and insertion, and cpu hotplug as well. It cannot be just luck :).

Clearly though, you guys are right. It cannot be simply disabled. Based on the 
above grep it's needed for CPU hotplug, mem hotplug, kprobes on s390 and intel
rng driver. Hopefully we can avoid it at least in module insertion/removal.

Max




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-12 18:59 ` Peter Zijlstra
@ 2008-02-13  3:59   ` Max Krasnyansky
  2008-02-13  5:19   ` Steven Rostedt
  1 sibling, 0 replies; 12+ messages in thread
From: Max Krasnyansky @ 2008-02-13  3:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, torvalds, LKML, Ingo Molnar, Paul Jackson, gregkh,
	Rusty Russell, Oleg Nesterov, Thomas Gleixner, Steven Rostedt

Peter Zijlstra wrote:
> On Mon, 2008-02-11 at 20:10 -0800, Max Krasnyansky wrote:
>> Andrew, looks like Linus decided not to pull this stuff.
>> Can we please put it into -mm then.
>>
>> My tree is here
>> 	git://git.kernel.org/pub/scm/linux/kernel/git/maxk/cpuisol-2.6.git
>> Please use 'master' branch (or 'for-linus' they are identical).
> 
> I'm wondering why you insist on offering a git tree that bypasses the
> regular maintainers. Why not post the patches and walk the normal route?
> 
> To me this feels rather aggressive, which makes me feel less inclined to
> look at it.
Peter, it may sound stupid but I'm honestly not sure what you mean. Please bear
with me I do not mean to sounds arrogant. I'm looking for advice here.
So here are some questions:

- First, who would the regular maintainer be in this case ?
I felt that cpu isolation can just sit in its own tree since it does not seem 
to belong to any existing stuff.
So far people suggested -mm and -shed.
I do not think it has much to do much with the -sched.
-mm seems more general purpose, since Linus did not pull it directly I asked 
Andrew to take this stuff into -mm. He was already ok with the patches when I 
sent original pull request to Linus.

- Is it not easier for a regular maintainer (whoever it turns out to be in this case)
to pull from GIT rather than use patches ?
In any case I did post patches along with pull request. So for example if Andrew 
prefers patches he could take those instead of the git. In fact if you look at my 
email I mentioned that if needed I can repost the patches.

- And last but not least I want to be able to just tell people who want to use CPU 
isolation "Go get get this tree and use it". Git it the best for that.

I can see how pull request to Linus may have been a bit aggressive. But then again
I posted patches (_without_ pull request). Got feedback from You, Paul and couple of 
other guys. Addressed/explained issues/questions. Posted patches again (_without_ 
pull request). Got _zero_ replies even though folks who replied to the first patchset
were replying to other things in the same timeframe. So I figured since I addressed 
everything you guys are happy, why not push it to Linus.

So what did I do wrong ?

Max






>> ----
>>  
>> Diffstat:
>>  Documentation/ABI/testing/sysfs-devices-system-cpu |   41 +++++++
>>  Documentation/cpu-isolation.txt                    |  113 +++++++++++++++++++++
>>  arch/x86/Kconfig                                   |    1 
>>  arch/x86/kernel/genapic_flat_64.c                  |    4 
>>  drivers/base/cpu.c                                 |   48 ++++++++
>>  include/linux/cpumask.h                            |    3 
>>  kernel/Kconfig.cpuisol                             |   42 +++++++
>>  kernel/Makefile                                    |    4 
>>  kernel/cpu.c                                       |   54 ++++++++++
>>  kernel/sched.c                                     |   36 ------
>>  kernel/stop_machine.c                              |    8 +
>>  kernel/workqueue.c                                 |   30 ++++-
>>  12 files changed, 337 insertions(+), 47 deletions(-)
>>
>> This addresses all Andrew's comments for the last submission. Details here:
>>    http://marc.info/?l=linux-kernel&m=120236394012766&w=2
>>
>> There are no code changes since last time, besides minor fix for moving on-stack array 
>> to __initdata as suggested by Andrew. Other stuff is just documentation updates. 
>>
>> List of commits
>>    cpuisol: Make cpu isolation configrable and export isolated map
>>    cpuisol: Do not route IRQs to the CPUs isolated at boot
>>    cpuisol: Do not schedule workqueues on the isolated CPUs
>>    cpuisol: Move on-stack array used for boot cmd parsing into __initdata
>>    cpuisol: Documentation updates
>>    cpuisol: Minor updates to the Kconfig options
>>    cpuisol: Do not halt isolated CPUs with Stop Machine
>>
>> I suggested by Ingo I'm CC'ing everyone who is even remotely connected/affected ;-)
> 
> You forgot Oleg, he does a lot of the workqueue work.
> 
> I'm worried by your approach to never start any workqueue on these cpus.
> Like you said, it breaks Oprofile and others who depend on cpu local
> workqueues being present.
> 
> Under normal circumstances these workqueues will not do any work,
> someone needs to provide work for them. That is, workqueues are passive.
> 
> So I think your approach is the wrong way about. Instead of taking the
> workqueue away, take away those that generate the work.
> 
>> Ingo, Peter - Scheduler.
>>    There are _no_ changes in this area besides moving cpu_*_map maps from kerne/sched.c 
>>    to kernel/cpu.c.
> 
> Ingo (and Thomas) do the genirq bits
> 
> The IRQ isolation in concept isn't wrong. But it seems to me that
> arch/x86/kernel/genapic_flat_64.c isn't the best place to do this.
> It just considers one architecture, if you do this, please make it work
> across all.
> 
>> Paul - Cpuset
>>    Again there are _no_ changes in this area.
>>    For reasons why cpuset is not the right mechanism for cpu isolation see this thread
>>       http://marc.info/?l=linux-kernel&m=120180692331461&w=2
>>
>> Rusty - Stop machine.
>>    After doing a bunch of testing last three days I actually downgraded stop machine 
>>    changes from [highly experimental] to simply [experimental]. Pleas see this thread 
>>    for more info: http://marc.info/?l=linux-kernel&m=120243837206248&w=2
>>    Short story is that I ran several insmod/rmmod workloads on live multi-core boxes 
>>    with stop machine _completely_ disabled and did no see any issues. Rusty did not get
>>    a chance to reply yet, I hopping that we'll be able to make "stop machine" completely
>>    optional for some configurations.
> 
> I too am thinking this is very wrong, stop machine is used by a lot of
> things, including those that modify the kernel code. You really need to
> replace all stop machine users with a more robust solution before you
> can do this.
> 
>> Gerg - ABI documentation.
>>    Nothing interesting here. I simply added 
>> 	Documentation/ABI/testing/sysfs-devices-system-cpu
>>    and documented some of the attributes exposed in there.
>>    Suggested by Andrew.
> 
> Not having seen the latest patches; I'm still not fond of the isolation
> interface.
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-13  3:32     ` Max Krasnyansky
@ 2008-02-13  4:11       ` Nick Piggin
  2008-02-13  6:06         ` Max Krasnyansky
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2008-02-13  4:11 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: David Miller, akpm, torvalds, linux-kernel, mingo, pj,
	a.p.zijlstra, gregkh, rusty

On Wednesday 13 February 2008 14:32, Max Krasnyansky wrote:
> David Miller wrote:
> > From: Nick Piggin <nickpiggin@yahoo.com.au>
> > Date: Tue, 12 Feb 2008 17:41:21 +1100
> >
> >> stop machine is used for more than just module loading and unloading.
> >> I don't think you can just disable it.
> >
> > Right, in particular it is used for CPU hotplug.
>
> Ooops. Totally missed that. And a bunch of other places.
>
> [maxk@duo2 cpuisol-2.6.git]$ git grep -l stop_machine_run
> Documentation/cpu-hotplug.txt
> arch/s390/kernel/kprobes.c
> drivers/char/hw_random/intel-rng.c
> include/linux/stop_machine.h
> kernel/cpu.c
> kernel/module.c
> kernel/stop_machine.c
> mm/page_alloc.c
>
> I wonder why I did not see any issues when I disabled stop machine
> completely. I mentioned in the other thread that I commented out the part
> that actually halts the machine and ran it for several hours on my dual
> core laptop and on the quad core server. Tried all kinds of workloads,
> which include constant module removal and insertion, and cpu hotplug as
> well. It cannot be just luck :).

It really is. With subtle races, it can take a lot more than a few
hours. Consider that we have subtle races still in the kernel now,
which are almost never or rarely hit in maybe 10,000 hours * every
single person who has been using the current kernel for the past
year.

For a less theoretical example -- when I was writing the RCU radix
tree code, I tried to run directed stress tests on a 64 CPU Altix
machine (which found no bugs). Then I ran it on a dedicated test
harness that could actually do a lot more than the existing kernel
users are able to, and promptly found a couple more bugs (on a 2
CPU system).

But your primary defence against concurrency bugs _has_ to be
knowing the code and all its interactions.


> Clearly though, you guys are right. It cannot be simply disabled. Based on
> the above grep it's needed for CPU hotplug, mem hotplug, kprobes on s390
> and intel rng driver. Hopefully we can avoid it at least in module
> insertion/removal.

Yes, reducing the number of users by going through their code and
showing that it is safe, is the right way to do this. Also, you
could avoid module insertion/removal?

FWIW, I think the idea of trying to turn Linux into giving hard
realtime guarantees is just insane. If that is what you want, you
would IMO be much better off to spend effort with something like
improving adeos and communicatoin/administration between Linux and
the hard-rt kernel.

But don't let me dissuade you from making these good improvements
to Linux as well :) Just that it isn't really going to be hard-rt
in general.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-12 18:59 ` Peter Zijlstra
  2008-02-13  3:59   ` Max Krasnyansky
@ 2008-02-13  5:19   ` Steven Rostedt
  2008-02-13  5:47     ` Max Krasnyansky
  1 sibling, 1 reply; 12+ messages in thread
From: Steven Rostedt @ 2008-02-13  5:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Krasnyansky, Andrew Morton, torvalds, LKML, Ingo Molnar,
	Paul Jackson, gregkh, Rusty Russell, Oleg Nesterov,
	Thomas Gleixner


On Tue, 12 Feb 2008, Peter Zijlstra wrote:
> >
> > Rusty - Stop machine.
> >    After doing a bunch of testing last three days I actually downgraded stop machine
> >    changes from [highly experimental] to simply [experimental]. Pleas see this thread
> >    for more info: http://marc.info/?l=linux-kernel&m=120243837206248&w=2
> >    Short story is that I ran several insmod/rmmod workloads on live multi-core boxes
> >    with stop machine _completely_ disabled and did no see any issues. Rusty did not get
> >    a chance to reply yet, I hopping that we'll be able to make "stop machine" completely
> >    optional for some configurations.
>

This part really scares me. The comment that you say you have run several
insmod/rmmod workloads without kstop_machine doesn't mean that it is still
safe. A lot of races that things like this protect may only happen under
load once a month. But the fact that it happens at all is reason to have
the protection.

Before taking out any protection, please analyze it in detail and report
your findings why something is not needed. Not just some general hand
waving and "it doesn't crash on my box".

Besides that, kstop_machine may be used by other features that can have an
impact.

Again, if you have a system that cant handle things like kstop_machine,
than don't do things that require a kstop_machine run. All modules should
be loaded, and no new modules should be added when the system is
performing critical work. I see no reason for disabling kstop_machine.

-- Steve


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-13  5:19   ` Steven Rostedt
@ 2008-02-13  5:47     ` Max Krasnyansky
  0 siblings, 0 replies; 12+ messages in thread
From: Max Krasnyansky @ 2008-02-13  5:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Andrew Morton, torvalds, LKML, Ingo Molnar,
	Paul Jackson, gregkh, Rusty Russell, Oleg Nesterov,
	Thomas Gleixner



Steven Rostedt wrote:
> On Tue, 12 Feb 2008, Peter Zijlstra wrote:
>>> Rusty - Stop machine.
>>>    After doing a bunch of testing last three days I actually downgraded stop machine
>>>    changes from [highly experimental] to simply [experimental]. Pleas see this thread
>>>    for more info: http://marc.info/?l=linux-kernel&m=120243837206248&w=2
>>>    Short story is that I ran several insmod/rmmod workloads on live multi-core boxes
>>>    with stop machine _completely_ disabled and did no see any issues. Rusty did not get
>>>    a chance to reply yet, I hopping that we'll be able to make "stop machine" completely
>>>    optional for some configurations.
> 
> This part really scares me. The comment that you say you have run several
> insmod/rmmod workloads without kstop_machine doesn't mean that it is still
> safe. A lot of races that things like this protect may only happen under
> load once a month. But the fact that it happens at all is reason to have
> the protection.
> 
> Before taking out any protection, please analyze it in detail and report
> your findings why something is not needed. Not just some general hand
> waving and "it doesn't crash on my box".
Sure. I did not say lets disable it. I was hopping we could and I wanted to see what Rusty 
Russell has to say about this.

> Besides that, kstop_machine may be used by other features that can have an
> impact.
Yes it is. I missed a few. Nick and Dave already pointed out CPU hotplug. 
I looked around and found more users. So disabling stop machine completely is definitely out.

> Again, if you have a system that cant handle things like kstop_machine,
> than don't do things that require a kstop_machine run. All modules should
> be loaded, and no new modules should be added when the system is
> performing critical work. I see no reason for disabling kstop_machine.
I'm considering that option. So far it does not seem practical. At least the way we use those
machines at this point. If we can prove that at least not halting isolation CPUs is safe that'd
be better.

Max

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-13  4:11       ` Nick Piggin
@ 2008-02-13  6:06         ` Max Krasnyansky
  2008-02-13  6:22           ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Max Krasnyansky @ 2008-02-13  6:06 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Miller, akpm, torvalds, linux-kernel, mingo, pj,
	a.p.zijlstra, gregkh, rusty

Nick Piggin wrote:
> On Wednesday 13 February 2008 14:32, Max Krasnyansky wrote:
>> David Miller wrote:
>>> From: Nick Piggin <nickpiggin@yahoo.com.au>
>>> Date: Tue, 12 Feb 2008 17:41:21 +1100
>>>
>>>> stop machine is used for more than just module loading and unloading.
>>>> I don't think you can just disable it.
>>> Right, in particular it is used for CPU hotplug.
>> Ooops. Totally missed that. And a bunch of other places.
>>
>> [maxk@duo2 cpuisol-2.6.git]$ git grep -l stop_machine_run
>> Documentation/cpu-hotplug.txt
>> arch/s390/kernel/kprobes.c
>> drivers/char/hw_random/intel-rng.c
>> include/linux/stop_machine.h
>> kernel/cpu.c
>> kernel/module.c
>> kernel/stop_machine.c
>> mm/page_alloc.c
>>
>> I wonder why I did not see any issues when I disabled stop machine
>> completely. I mentioned in the other thread that I commented out the part
>> that actually halts the machine and ran it for several hours on my dual
>> core laptop and on the quad core server. Tried all kinds of workloads,
>> which include constant module removal and insertion, and cpu hotplug as
>> well. It cannot be just luck :).
> 
> It really is. With subtle races, it can take a lot more than a few
> hours. Consider that we have subtle races still in the kernel now,
> which are almost never or rarely hit in maybe 10,000 hours * every
> single person who has been using the current kernel for the past
> year.
> 
> For a less theoretical example -- when I was writing the RCU radix
> tree code, I tried to run directed stress tests on a 64 CPU Altix
> machine (which found no bugs). Then I ran it on a dedicated test
> harness that could actually do a lot more than the existing kernel
> users are able to, and promptly found a couple more bugs (on a 2
> CPU system).
> 
> But your primary defence against concurrency bugs _has_ to be
> knowing the code and all its interactions.
100% agree. 
btw For modules though it does not seem like luck (ie that it worked fine for me). 
I mean subsystems are supposed to cleanly register/unregister anyway. But I can of
course be wrong. We'll see what Rusty says.    

>> Clearly though, you guys are right. It cannot be simply disabled. Based on
>> the above grep it's needed for CPU hotplug, mem hotplug, kprobes on s390
>> and intel rng driver. Hopefully we can avoid it at least in module
>> insertion/removal.
> 
> Yes, reducing the number of users by going through their code and
> showing that it is safe, is the right way to do this. Also, you
> could avoid module insertion/removal?
I could. But it'd be nice if I did not have to :)

> FWIW, I think the idea of trying to turn Linux into giving hard
> realtime guarantees is just insane. If that is what you want, you
> would IMO be much better off to spend effort with something like
> improving adeos and communicatoin/administration between Linux and
> the hard-rt kernel.
> 
> But don't let me dissuade you from making these good improvements
> to Linux as well :) Just that it isn't really going to be hard-rt
> in general.
Actually that's the cool thing about CPU isolation. Get rid of all latency sources
from the CPU(s) and you get youself as hard-RT as it gets. 
I mean I _already_ have multi-core hard-RT systems that show ~1.2 usec worst case and 
~200nsec average latency. I do not even need Adeos/Xenomai or Preemp-RT just a few
very small patches. And it can be used for non RT stuff too.
 
Max




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-13  6:06         ` Max Krasnyansky
@ 2008-02-13  6:22           ` Nick Piggin
  2008-02-13 17:11             ` Max Krasnyansky
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2008-02-13  6:22 UTC (permalink / raw)
  To: Max Krasnyansky
  Cc: David Miller, akpm, torvalds, linux-kernel, mingo, pj,
	a.p.zijlstra, gregkh, rusty

On Wednesday 13 February 2008 17:06, Max Krasnyansky wrote:
> Nick Piggin wrote:

> > But don't let me dissuade you from making these good improvements
> > to Linux as well :) Just that it isn't really going to be hard-rt
> > in general.
>
> Actually that's the cool thing about CPU isolation. Get rid of all latency
> sources from the CPU(s) and you get youself as hard-RT as it gets.

Hmm, maybe. Removing all sources of latency from the CPU kind of
implies that you have to audit the whole kernel for source of
latency.

> I mean I _already_ have multi-core hard-RT systems that show ~1.2 usec
> worst case and ~200nsec average latency. I do not even need Adeos/Xenomai
> or Preemp-RT just a few very small patches. And it can be used for non RT
> stuff too.

OK, but you then are very restricted in what you can do, and easily
can break it especially if you run any userspace on that CPU. If
you just run a kernel module that, after setup, doesn't use any
other kernel resources except interrupt handling, then you might be
OK (depending on whether even interrupt handling can run into
contended locks)...

If you started doing very much more, then you can easily run into
trouble.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [git pull for -mm] CPU isolation extensions (updated2)
  2008-02-13  6:22           ` Nick Piggin
@ 2008-02-13 17:11             ` Max Krasnyansky
  0 siblings, 0 replies; 12+ messages in thread
From: Max Krasnyansky @ 2008-02-13 17:11 UTC (permalink / raw)
  To: Nick Piggin
  Cc: David Miller, akpm, torvalds, linux-kernel, mingo, pj,
	a.p.zijlstra, gregkh, rusty



Nick Piggin wrote:
> On Wednesday 13 February 2008 17:06, Max Krasnyansky wrote:
>> Nick Piggin wrote:
> 
>>> But don't let me dissuade you from making these good improvements
>>> to Linux as well :) Just that it isn't really going to be hard-rt
>>> in general.
>> Actually that's the cool thing about CPU isolation. Get rid of all latency
>> sources from the CPU(s) and you get youself as hard-RT as it gets.
> 
> Hmm, maybe. Removing all sources of latency from the CPU kind of
> implies that you have to audit the whole kernel for source of
> latency.
That's exactly where cpu isolation comes in. It makes sure that an isolated 
CPU is excluded from: 
1. HW interrupts. This means no softirq, etc.
2. Things like workqueues, stop machine, etc. This typically means no timers, etc.
3. Scheduler load balancing (we had support for that for awhile now).

All that's left on that CPU is the scheduler tick and IPIs. And those are just fine.
At that point it's up to the app to use or not to use kernel services.
In other words no auditing is required. It's the RT preempt that needs to audit in
order to be general purpose RT.
 
>> I mean I _already_ have multi-core hard-RT systems that show ~1.2 usec
>> worst case and ~200nsec average latency. I do not even need Adeos/Xenomai
>> or Preemp-RT just a few very small patches. And it can be used for non RT
>> stuff too.
> 
> OK, but you then are very restricted in what you can do, and easily
> can break it especially if you run any userspace on that CPU. If
> you just run a kernel module that, after setup, doesn't use any
> other kernel resources except interrupt handling, then you might be
> OK (depending on whether even interrupt handling can run into
> contended locks)...
> 
> If you started doing very much more, then you can easily run into
> trouble.
Yes I'm definitely not selling it as general purpose. And no, it's not just kernel code 
it's a pure user-space code. Carefully designed user-space code that is.
The model is pretty simple. Lets say you have a dual cpu/core box. The app can be 
partitioned like this:
- CPU0 handles HW irqs, runs general services, etc and soft-RT threads
- CPU1 runs hard-RT threads or a special engine. For the description of the engine
see http://marc.info/?l=linux-kernel&m=120232425515556&w=2 

hard-RT threads do not need any system call besides pthread_ mutex and signals (those
are perfectly fine).
They can use direct HW access (if needed). ie Memory mapping something and acessing
it without the syscalls (see libe1000.sf.net for example).
Communication between hard-RT and soft-RT thread is lock-less (single reader/single writer
queues, etc).
It may sound fairly limited but you'd be surprised how much you can do. It's relatively
easy to design the app that way once you get a hang of it :). I'm working with our legal 
folks on releasing user-space framework and afore mentioned engine with a bunch of examples.

Max

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-02-13 17:13 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-12  4:10 [git pull for -mm] CPU isolation extensions (updated2) Max Krasnyansky
2008-02-12  6:41 ` Nick Piggin
2008-02-12  6:44   ` David Miller
2008-02-13  3:32     ` Max Krasnyansky
2008-02-13  4:11       ` Nick Piggin
2008-02-13  6:06         ` Max Krasnyansky
2008-02-13  6:22           ` Nick Piggin
2008-02-13 17:11             ` Max Krasnyansky
2008-02-12 18:59 ` Peter Zijlstra
2008-02-13  3:59   ` Max Krasnyansky
2008-02-13  5:19   ` Steven Rostedt
2008-02-13  5:47     ` Max Krasnyansky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).