LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [-mm patch] i386: enable 4k stacks by default
@ 2007-04-28 19:19 Adrian Bunk
  2007-04-28 21:18 ` Zan Lynx
  0 siblings, 1 reply; 51+ messages in thread
From: Adrian Bunk @ 2007-04-28 19:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Grant Coady

4k stacks have become a well-tested feature used fore a long time in
Fedora and even in RHEL 4.

Defaulting to 4k stacks in -mm kernel will give some more testing
coverage and should show whether there are problems left.

Keeping the option for now should make the people happy who want to use
the experimental -mm kernel but don't trust the well-tested 4k stacks.

Additionally, make it more obvious that available stack space is not 
being halved.

Signed-off-by: Grant Coady <gcoady@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>

---

This is the original patch that does not the opposite of what it should do.

This patch has been sent on:
- 14 Jan 2006
- 5 Jan 2006

 Kconfig.debug |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

--- linux-2.6.15a/arch/i386/Kconfig.debug	2005-10-28 10:02:08.000000000 +1000
+++ linux-2.6.15b/arch/i386/Kconfig.debug	2006-01-05 09:39:22.000000000 +1100
@@ -53,14 +53,15 @@
 	  of memory corruptions.
 
 config 4KSTACKS
-	bool "Use 4Kb for kernel stacks instead of 8Kb"
-	depends on DEBUG_KERNEL
+	bool "Use 4Kb + 4Kb for kernel stacks instead of 8Kb" if DEBUG_KERNEL
+	default y
 	help
 	  If you say Y here the kernel will use a 4Kb stacksize for the
 	  kernel stack attached to each process/thread. This facilitates
 	  running more threads on a system and also reduces the pressure
 	  on the VM subsystem for higher order allocations. This option
-	  will also use IRQ stacks to compensate for the reduced stackspace.
+	  will also use separate 4Kb IRQ stacks to compensate for the 
+	  reduced stackspace.
 
 config X86_FIND_SMP_CONFIG
 	bool


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-28 19:19 [-mm patch] i386: enable 4k stacks by default Adrian Bunk
@ 2007-04-28 21:18 ` Zan Lynx
  2007-04-30  3:58   ` David Chinner
  2007-04-30  8:55   ` Neil Brown
  0 siblings, 2 replies; 51+ messages in thread
From: Zan Lynx @ 2007-04-28 21:18 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

On Sat, 2007-04-28 at 21:19 +0200, Adrian Bunk wrote:
> 4k stacks have become a well-tested feature used fore a long time in
> Fedora and even in RHEL 4.

So has anyone fixed the bugs involving ext3 and LVM snapshots on top of
DM mirror?

Our 32-bit Fedora Core 5 dual Opteron server at work crashed once a week
while running rdiff-backup off a snapshot.  Installing a 64-bit kernel
stopped the crashes, and I believe the 64-bit still uses the bigger
stacks.
-- 
Zan Lynx <zlynx@acm.org>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-28 21:18 ` Zan Lynx
@ 2007-04-30  3:58   ` David Chinner
  2007-04-30  8:17     ` Alan Cox
  2007-04-30  8:55   ` Neil Brown
  1 sibling, 1 reply; 51+ messages in thread
From: David Chinner @ 2007-04-30  3:58 UTC (permalink / raw)
  To: Zan Lynx; +Cc: Adrian Bunk, Linux Kernel

On Sat, Apr 28, 2007 at 03:18:38PM -0600, Zan Lynx wrote:
> On Sat, 2007-04-28 at 21:19 +0200, Adrian Bunk wrote:
> > 4k stacks have become a well-tested feature used fore a long time in
> > Fedora and even in RHEL 4.
> 
> So has anyone fixed the bugs involving ext3 and LVM snapshots on top of
> DM mirror?

I doubt it.  Every time this comes up the problem of stacked I/O
configuration being able to reliably blow the 4k stack limit comes
up. Usually it's XFS that is blamed, but it seems ext3 and reiser
can both suffer from the same problem.  And now we can add things
like unionfs/ecryptfs into the stack as well....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30  3:58   ` David Chinner
@ 2007-04-30  8:17     ` Alan Cox
  2007-04-30 10:26       ` Andi Kleen
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Cox @ 2007-04-30  8:17 UTC (permalink / raw)
  To: David Chinner; +Cc: Zan Lynx, Adrian Bunk, Linux Kernel

> I doubt it.  Every time this comes up the problem of stacked I/O
> configuration being able to reliably blow the 4k stack limit comes
> up. Usually it's XFS that is blamed, but it seems ext3 and reiser
> can both suffer from the same problem.  And now we can add things
> like unionfs/ecryptfs into the stack as well....

The 8K stack selection does not change this, it just means that if there
is a problem it'll randomly blow up when the heavy IRQ user and the heavy
non-IRQ user occur at the right moments. If this still occurs for some
combinations then the fix would be 8K + 4K IRQ stack, not just to use 8K
stack

Alan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-28 21:18 ` Zan Lynx
  2007-04-30  3:58   ` David Chinner
@ 2007-04-30  8:55   ` Neil Brown
  2007-04-30  8:59     ` Christoph Hellwig
  1 sibling, 1 reply; 51+ messages in thread
From: Neil Brown @ 2007-04-30  8:55 UTC (permalink / raw)
  To: Zan Lynx; +Cc: Adrian Bunk, Linux Kernel

On Saturday April 28, zlynx@acm.org wrote:
> On Sat, 2007-04-28 at 21:19 +0200, Adrian Bunk wrote:
> > 4k stacks have become a well-tested feature used fore a long time in
> > Fedora and even in RHEL 4.
> 
> So has anyone fixed the bugs involving ext3 and LVM snapshots on top of
> DM mirror?

Well -mm has a patch which makes stacked block devices much less of an
issue, but this hasn't made it -linus yet - I think the dm developer
isn't happy that dm works properly with it.

> 
> Our 32-bit Fedora Core 5 dual Opteron server at work crashed once a week
> while running rdiff-backup off a snapshot.  Installing a 64-bit kernel
> stopped the crashes, and I believe the 64-bit still uses the bigger
> stacks.

I wonder if FC-4 has the generic_make_request patch as well as 4K
stacks.  If does, then there is obviously something else to be fixed.
If it doesn't, then maybe that is all that is needed.

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/broken-out/md-dm-reduce-stack-usage-with-stacked-block-devices.patch

NeilBrown

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30  8:55   ` Neil Brown
@ 2007-04-30  8:59     ` Christoph Hellwig
  2007-04-30 11:30       ` Jens Axboe
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2007-04-30  8:59 UTC (permalink / raw)
  To: Neil Brown; +Cc: Zan Lynx, Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 06:55:52PM +1000, Neil Brown wrote:
> On Saturday April 28, zlynx@acm.org wrote:
> > On Sat, 2007-04-28 at 21:19 +0200, Adrian Bunk wrote:
> > > 4k stacks have become a well-tested feature used fore a long time in
> > > Fedora and even in RHEL 4.
> > 
> > So has anyone fixed the bugs involving ext3 and LVM snapshots on top of
> > DM mirror?
> 
> Well -mm has a patch which makes stacked block devices much less of an
> issue, but this hasn't made it -linus yet - I think the dm developer
> isn't happy that dm works properly with it.

I get a little tired about this objection.  If the particular dm code
was racy before and this can in theory make it worse it's their problem,
and putting in perfectly fine code will give them an incentive to
fix their year old race.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30  8:17     ` Alan Cox
@ 2007-04-30 10:26       ` Andi Kleen
  2007-04-30 10:48         ` Christoph Hellwig
  0 siblings, 1 reply; 51+ messages in thread
From: Andi Kleen @ 2007-04-30 10:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> If this still occurs for some
> combinations then the fix would be 8K + 4K IRQ stack, not just to use 8K
> stack

Yes i've been thinking for some time doing that would be a good idea.

-Andi

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30 10:26       ` Andi Kleen
@ 2007-04-30 10:48         ` Christoph Hellwig
  2007-04-30 12:13           ` Andi Kleen
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2007-04-30 10:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alan Cox, David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 12:26:42PM +0200, Andi Kleen wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
> 
> > If this still occurs for some
> > combinations then the fix would be 8K + 4K IRQ stack, not just to use 8K
> > stack
> 
> Yes i've been thinking for some time doing that would be a good idea.

Yes, the non-irqstack case should definitively go away.  And 8k
kernel stacks isn't that little given how much most 64bit architectures
have.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30  8:59     ` Christoph Hellwig
@ 2007-04-30 11:30       ` Jens Axboe
  2007-04-30 23:24         ` Neil Brown
  0 siblings, 1 reply; 51+ messages in thread
From: Jens Axboe @ 2007-04-30 11:30 UTC (permalink / raw)
  To: Christoph Hellwig, Neil Brown, Zan Lynx, Adrian Bunk, Linux Kernel

On Mon, Apr 30 2007, Christoph Hellwig wrote:
> On Mon, Apr 30, 2007 at 06:55:52PM +1000, Neil Brown wrote:
> > On Saturday April 28, zlynx@acm.org wrote:
> > > On Sat, 2007-04-28 at 21:19 +0200, Adrian Bunk wrote:
> > > > 4k stacks have become a well-tested feature used fore a long time in
> > > > Fedora and even in RHEL 4.
> > > 
> > > So has anyone fixed the bugs involving ext3 and LVM snapshots on top of
> > > DM mirror?
> > 
> > Well -mm has a patch which makes stacked block devices much less of an
> > issue, but this hasn't made it -linus yet - I think the dm developer
> > isn't happy that dm works properly with it.
> 
> I get a little tired about this objection.  If the particular dm code
> was racy before and this can in theory make it worse it's their problem,
> and putting in perfectly fine code will give them an incentive to
> fix their year old race.

Agree, Neil would you mind if I merged the patch?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30 10:48         ` Christoph Hellwig
@ 2007-04-30 12:13           ` Andi Kleen
  2007-04-30 17:38             ` William Lee Irwin III
  0 siblings, 1 reply; 51+ messages in thread
From: Andi Kleen @ 2007-04-30 12:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andi Kleen, Alan Cox, David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

Christoph Hellwig <hch@infradead.org> writes:

> On Mon, Apr 30, 2007 at 12:26:42PM +0200, Andi Kleen wrote:
> > Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
> > 
> > > If this still occurs for some
> > > combinations then the fix would be 8K + 4K IRQ stack, not just to use 8K
> > > stack
> > 
> > Yes i've been thinking for some time doing that would be a good idea.
> 
> Yes, the non-irqstack case should definitively go away.  And 8k
> kernel stacks isn't that little given how much most 64bit architectures
> have.

Actually looking at the code it would need some fixes first:

/*
 * These should really be __section__(".bss.page_aligned") as well, but
 * gcc's 3.0 and earlier don't handle that correctly.
 */
static char softirq_stack[NR_CPUS * THREAD_SIZE]
                __attribute__((__aligned__(THREAD_SIZE)));

static char hardirq_stack[NR_CPUS * THREAD_SIZE]
                __attribute__((__aligned__(THREAD_SIZE)));


With 8K stacks and NR_CPUS==128 that would be 2MB statically reserved. Yuck.
Really needs to be dynamically allocated. I'll take a look once the .22 
big merge is done.

-Andi

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30 12:13           ` Andi Kleen
@ 2007-04-30 17:38             ` William Lee Irwin III
  2007-04-30 17:40               ` [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
                                 ` (7 more replies)
  0 siblings, 8 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 17:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 02:13:16PM +0200, Andi Kleen wrote:
> Actually looking at the code it would need some fixes first:
> /*
>  * These should really be __section__(".bss.page_aligned") as well, but
>  * gcc's 3.0 and earlier don't handle that correctly.
>  */
> static char softirq_stack[NR_CPUS * THREAD_SIZE]
>                 __attribute__((__aligned__(THREAD_SIZE)));
> 
> static char hardirq_stack[NR_CPUS * THREAD_SIZE]
>                 __attribute__((__aligned__(THREAD_SIZE)));
> 
> With 8K stacks and NR_CPUS==128 that would be 2MB statically reserved. Yuck.
> Really needs to be dynamically allocated. I'll take a look once the .22 
> big merge is done.

Here's what I did for i386 for someone concerned about blowing the stack.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:38             ` William Lee Irwin III
@ 2007-04-30 17:40               ` William Lee Irwin III
  2007-04-30 18:10                 ` Christoph Hellwig
  2007-04-30 18:25                 ` Adrian Bunk
  2007-04-30 17:43               ` [2/6] add config option to vmalloc stacks " William Lee Irwin III
                                 ` (6 subsequent siblings)
  7 siblings, 2 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 17:40 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> Here's what I did for i386 for someone concerned about blowing the stack.

Make more stack sizes configurable, adding options for deeper stacks.
This is largely for differential diagnosis in cases where stack overflows
are suspected of silently corrupting memory or causing other problems not
detected immediately.

Signed-off-by: William Irwin <wli@holomorphy.com>


Index: stack-paranoia/arch/i386/Kconfig.debug
===================================================================
--- stack-paranoia.orig/arch/i386/Kconfig.debug	2007-04-30 10:26:15.863869853 -0700
+++ stack-paranoia/arch/i386/Kconfig.debug	2007-04-30 10:31:43.878562345 -0700
@@ -56,6 +56,10 @@
 	  portion of the kernel code won't be covered by a 2MB TLB anymore.
 	  If in doubt, say "N".
 
+choice
+	prompt "Stack size"
+	default 8KSTACKS
+
 config 4KSTACKS
 	bool "Use 4Kb for kernel stacks instead of 8Kb"
 	depends on DEBUG_KERNEL
@@ -66,6 +70,37 @@
 	  on the VM subsystem for higher order allocations. This option
 	  will also use IRQ stacks to compensate for the reduced stackspace.
 
+config 8KSTACKS
+	bool "Use 8KB for kernel stacks"
+	help
+	  If you say Y here, the kernel will use its standard stacksize.
+
+config 16KSTACKS
+	bool "Use 16KB for kernel stacks"
+	help
+	  If you say Y here, the kernel will use a 16KB stacksize.
+	  This impedes running more threads on a system and also increases
+	  the pressure on the VM subsystem for higher order allocations.
+	  This option will also use IRQ stacks to for additional safety.
+
+config 32KSTACKS
+	bool "Use 32KB for kernel stacks"
+	help
+	  If you say Y here, the kernel will use a 32KB stacksize.
+	  This impedes running more threads on a system and also increases
+	  the pressure on the VM subsystem for higher order allocations.
+	  This option will also use IRQ stacks to for additional safety.
+
+config 64KSTACKS
+	bool "Use 64KB for kernel stacks"
+	help
+	  If you say Y here, the kernel will use a 64KB stacksize.
+	  This impedes running more threads on a system and also increases
+	  the pressure on the VM subsystem for higher order allocations.
+	  This option will also use IRQ stacks to for additional safety.
+
+endchoice
+
 config X86_FIND_SMP_CONFIG
 	bool
 	depends on X86_LOCAL_APIC || X86_VOYAGER
Index: stack-paranoia/include/asm-i386/module.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/module.h	2007-04-30 10:26:31.992788987 -0700
+++ stack-paranoia/include/asm-i386/module.h	2007-04-30 10:31:43.882562573 -0700
@@ -64,8 +64,14 @@
 
 #ifdef CONFIG_4KSTACKS
 #define MODULE_STACKSIZE "4KSTACKS "
-#else
-#define MODULE_STACKSIZE ""
+#elif defined(CONFIG_8KSTACKS)
+#define MODULE_STACKSIZE "8KSTACKS "
+#elif defined(CONFIG_16KSTACKS)
+#define MODULE_STACKSIZE "16KSTACKS "
+#elif defined(CONFIG_32KSTACKS)
+#define MODULE_STACKSIZE "32KSTACKS "
+#elif defined(CONFIG_64KSTACKS)
+#define MODULE_STACKSIZE "64KSTACKS "
 #endif
 
 #define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY MODULE_STACKSIZE
Index: stack-paranoia/include/asm-i386/thread_info.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/thread_info.h	2007-04-30 10:26:32.104795370 -0700
+++ stack-paranoia/include/asm-i386/thread_info.h	2007-04-30 10:31:43.882562573 -0700
@@ -54,9 +54,17 @@
 
 #define PREEMPT_ACTIVE		0x10000000
 #ifdef CONFIG_4KSTACKS
-#define THREAD_SIZE            (4096)
-#else
+#define THREAD_SIZE		(4096)
+#elif defined(CONFIG_8KSTACKS)
 #define THREAD_SIZE		(8192)
+#elif defined(CONFIG_16KSTACKS)
+#define THREAD_SIZE		(16384)
+#elif defined(CONFIG_32KSTACKS)
+#define THREAD_SIZE		(32768)
+#elif defined(CONFIG_64KSTACKS)
+#define THREAD_SIZE		(65536)
+#else
+#error stack size unconfigured
 #endif
 
 #define STACK_WARN             (THREAD_SIZE/8)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:38             ` William Lee Irwin III
  2007-04-30 17:40               ` [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
@ 2007-04-30 17:43               ` William Lee Irwin III
  2007-04-30 18:11                 ` Christoph Hellwig
  2007-05-04  5:35                 ` Joseph Fannin
  2007-04-30 17:44               ` [3/6] make IRQ stacks independently configurable " William Lee Irwin III
                                 ` (5 subsequent siblings)
  7 siblings, 2 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 17:43 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> Here's what I did for i386 for someone concerned about blowing the stack.

Add a config option to vmalloc() task stacks so that stack overflows are
detected without fail, and with a fatal failure mode at that.

Signed-off-by: William Irwin <wli@holomorphy.com>


Index: stack-paranoia/arch/i386/Kconfig.debug
===================================================================
--- stack-paranoia.orig/arch/i386/Kconfig.debug	2007-04-30 10:31:43.878562345 -0700
+++ stack-paranoia/arch/i386/Kconfig.debug	2007-04-30 10:32:56.182682722 -0700
@@ -35,6 +35,15 @@
 
 	  This option will slow down process creation somewhat.
 
+config VMALLOC_STACK
+	bool "vmalloc() the stack"
+	depends on DEBUG_KERNEL
+	help
+	  Allocates the stack physically discontiguously and from high
+	  memory. Furthermore an unmapped guard page follows the stack.
+	  This is not for end-users. It's intended to trigger fatal
+	  system errors under various forms of stack abuse.
+
 comment "Page alloc debug is incompatible with Software Suspend on i386"
 	depends on DEBUG_KERNEL && SOFTWARE_SUSPEND
 
Index: stack-paranoia/arch/i386/kernel/process.c
===================================================================
--- stack-paranoia.orig/arch/i386/kernel/process.c	2007-04-30 10:26:15.979876464 -0700
+++ stack-paranoia/arch/i386/kernel/process.c	2007-04-30 10:32:56.178682494 -0700
@@ -25,6 +25,7 @@
 #include <linux/stddef.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/workqueue.h>
 #include <linux/user.h>
 #include <linux/a.out.h>
 #include <linux/interrupt.h>
@@ -322,6 +323,58 @@
 	show_trace(NULL, regs, &regs->esp);
 }
 
+#ifdef CONFIG_VMALLOC_STACK
+struct thread_info *alloc_thread_info(struct task_struct *unused)
+{
+	int i;
+	struct page *pages[THREAD_SIZE/PAGE_SIZE], **tmp = pages;
+	struct vm_struct *area;
+
+	/*
+	 * passing VM_IOREMAP for the sake of alignment is why
+	 * all this is done by hand.
+	 */
+	area = get_vm_area(THREAD_SIZE, VM_IOREMAP);
+	if (!area)
+		return NULL;
+	for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i) {
+		pages[i] = alloc_page(GFP_HIGHUSER);
+		if (!pages[i])
+			goto out_free_pages;
+	}
+	/* implicitly transfer page refcounts to the vm_struct */
+	if (map_vm_area(area, PAGE_KERNEL, &tmp))
+		goto out_remove_area;
+	/* it may be worth poisoning, save thread_info proper */
+	return (struct thread_info *)area->addr;
+out_remove_area:
+	remove_vm_area(area);
+out_free_pages:
+	do {
+		__free_page(pages[--i]);
+	} while (i >= 0);
+	return NULL;
+}
+
+static void work_free_thread_info(struct work_struct *work)
+{
+	int i;
+	void *p = work;
+
+	for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i)
+		__free_page(vmalloc_to_page(p + PAGE_SIZE*i));
+	vfree(p);
+}
+
+void free_thread_info(struct thread_info *info)
+{
+	struct work_struct *work = (struct work_struct *)info;
+
+	INIT_WORK(work, work_free_thread_info);
+	schedule_work(work);
+}
+#endif
+
 /*
  * This gets run with %ebx containing the
  * function to call, and %edx containing
Index: stack-paranoia/include/asm-i386/module.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/module.h	2007-04-30 10:31:43.882562573 -0700
+++ stack-paranoia/include/asm-i386/module.h	2007-04-30 10:32:56.182682722 -0700
@@ -74,6 +74,13 @@
 #define MODULE_STACKSIZE "64KSTACKS "
 #endif
 
-#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY MODULE_STACKSIZE
+#ifdef CONFIG_VMALLOC_STACK
+#define MODULE_VMALLOC_STACK "VMALLOCSTACKS "
+#else
+#define MODULE_VMALLOC_STACK ""
+#endif
+
+#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY MODULE_STACKSIZE \
+		MODULE_VMALLOC_STACK
 
 #endif /* _ASM_I386_MODULE_H */
Index: stack-paranoia/include/asm-i386/thread_info.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/thread_info.h	2007-04-30 10:31:43.882562573 -0700
+++ stack-paranoia/include/asm-i386/thread_info.h	2007-04-30 10:32:56.182682722 -0700
@@ -102,6 +102,11 @@
 }
 
 /* thread information allocation */
+#ifdef CONFIG_VMALLOC_STACK
+struct task_struct;
+struct thread_info *alloc_thread_info(struct task_struct *);
+void free_thread_info(struct thread_info *);
+#else /* !CONFIG_VMALLOC_STACK */
 #ifdef CONFIG_DEBUG_STACK_USAGE
 #define alloc_thread_info(tsk) kzalloc(THREAD_SIZE, GFP_KERNEL)
 #else
@@ -109,6 +114,7 @@
 #endif
 
 #define free_thread_info(info)	kfree(info)
+#endif /* !CONFIG_VMALLOC_STACK */
 
 #else /* !__ASSEMBLY__ */
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [3/6] make IRQ stacks independently configurable (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:38             ` William Lee Irwin III
  2007-04-30 17:40               ` [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
  2007-04-30 17:43               ` [2/6] add config option to vmalloc stacks " William Lee Irwin III
@ 2007-04-30 17:44               ` William Lee Irwin III
  2007-04-30 18:11                 ` Christoph Hellwig
  2007-04-30 17:45               ` [4/6] go BUG on vmallocspace in __pa() " William Lee Irwin III
                                 ` (4 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 17:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> Here's what I did for i386 for someone concerned about blowing the stack.

Divorce IRQ stack configuration from CONFIG_4KSTACKS, as it's believed to
be an important safety measure regardless of stack size by some users.

Signed-off-by: William Irwin <wli@holomorphy.com>


Index: stack-paranoia/arch/i386/Kconfig.debug
===================================================================
--- stack-paranoia.orig/arch/i386/Kconfig.debug	2007-04-30 10:32:56.182682722 -0700
+++ stack-paranoia/arch/i386/Kconfig.debug	2007-04-30 10:34:07.058721717 -0700
@@ -35,6 +35,14 @@
 
 	  This option will slow down process creation somewhat.
 
+config IRQSTACKS
+	bool "IRQ stacks"
+	depends on DEBUG_KERNEL
+	help
+	  Arranges for per-cpu interrupt stacks so that interrupts
+	  and interrupt-time processing don't consume space on task
+	  stacks.
+
 config VMALLOC_STACK
 	bool "vmalloc() the stack"
 	depends on DEBUG_KERNEL
@@ -76,8 +84,7 @@
 	  If you say Y here the kernel will use a 4Kb stacksize for the
 	  kernel stack attached to each process/thread. This facilitates
 	  running more threads on a system and also reduces the pressure
-	  on the VM subsystem for higher order allocations. This option
-	  will also use IRQ stacks to compensate for the reduced stackspace.
+	  on the VM subsystem for higher order allocations.
 
 config 8KSTACKS
 	bool "Use 8KB for kernel stacks"
@@ -90,7 +97,6 @@
 	  If you say Y here, the kernel will use a 16KB stacksize.
 	  This impedes running more threads on a system and also increases
 	  the pressure on the VM subsystem for higher order allocations.
-	  This option will also use IRQ stacks to for additional safety.
 
 config 32KSTACKS
 	bool "Use 32KB for kernel stacks"
@@ -98,15 +104,14 @@
 	  If you say Y here, the kernel will use a 32KB stacksize.
 	  This impedes running more threads on a system and also increases
 	  the pressure on the VM subsystem for higher order allocations.
-	  This option will also use IRQ stacks to for additional safety.
 
 config 64KSTACKS
 	bool "Use 64KB for kernel stacks"
+	depends on !IRQSTACKS
 	help
 	  If you say Y here, the kernel will use a 64KB stacksize.
 	  This impedes running more threads on a system and also increases
 	  the pressure on the VM subsystem for higher order allocations.
-	  This option will also use IRQ stacks to for additional safety.
 
 endchoice
 
Index: stack-paranoia/arch/i386/kernel/irq.c
===================================================================
--- stack-paranoia.orig/arch/i386/kernel/irq.c	2007-04-30 10:26:15.967875780 -0700
+++ stack-paranoia/arch/i386/kernel/irq.c	2007-04-30 10:34:07.058721717 -0700
@@ -47,7 +47,7 @@
 #endif
 }
 
-#ifdef CONFIG_4KSTACKS
+#ifdef CONFIG_IRQSTACKS
 /*
  * per-CPU IRQ handling contexts (thread information and stack)
  */
@@ -71,7 +71,7 @@
 	/* high bit used in ret_from_ code */
 	int irq = ~regs->orig_eax;
 	struct irq_desc *desc = irq_desc + irq;
-#ifdef CONFIG_4KSTACKS
+#ifdef CONFIG_IRQSTACKS
 	union irq_ctx *curctx, *irqctx;
 	u32 *isp;
 #endif
@@ -99,7 +99,7 @@
 	}
 #endif
 
-#ifdef CONFIG_4KSTACKS
+#ifdef CONFIG_IRQSTACKS
 
 	curctx = (union irq_ctx *) current_thread_info();
 	irqctx = hardirq_ctx[smp_processor_id()];
@@ -144,7 +144,7 @@
 	return 1;
 }
 
-#ifdef CONFIG_4KSTACKS
+#ifdef CONFIG_IRQSTACKS
 
 /*
  * These should really be __section__(".bss.page_aligned") as well, but
Index: stack-paranoia/include/asm-i386/irq.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/irq.h	2007-04-30 10:26:31.904783972 -0700
+++ stack-paranoia/include/asm-i386/irq.h	2007-04-30 10:34:07.058721717 -0700
@@ -24,7 +24,7 @@
 # define ARCH_HAS_NMI_WATCHDOG		/* See include/linux/nmi.h */
 #endif
 
-#ifdef CONFIG_4KSTACKS
+#ifdef CONFIG_IRQSTACKS
   extern void irq_ctx_init(int cpu);
   extern void irq_ctx_exit(int cpu);
 # define __ARCH_HAS_DO_SOFTIRQ
Index: stack-paranoia/include/asm-i386/module.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/module.h	2007-04-30 10:32:56.182682722 -0700
+++ stack-paranoia/include/asm-i386/module.h	2007-04-30 10:34:07.058721717 -0700
@@ -80,7 +80,13 @@
 #define MODULE_VMALLOC_STACK ""
 #endif
 
+#ifdef CONFIG_IRQSTACKS
+#define MODULE_IRQSTACKS "IRQSTACKS "
+#else
+#define MODULE_IRQSTACKS ""
+#endif
+
 #define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY MODULE_STACKSIZE \
-		MODULE_VMALLOC_STACK
+		MODULE_VMALLOC_STACK MODULE_IRQSTACKS
 
 #endif /* _ASM_I386_MODULE_H */

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [4/6] go BUG on vmallocspace in __pa() (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:38             ` William Lee Irwin III
                                 ` (2 preceding siblings ...)
  2007-04-30 17:44               ` [3/6] make IRQ stacks independently configurable " William Lee Irwin III
@ 2007-04-30 17:45               ` William Lee Irwin III
  2007-04-30 18:52                 ` Andi Kleen
  2007-05-02 22:31                 ` [4/6] go BUG on vmallocspace in __pa() Jeremy Fitzhardinge
  2007-04-30 17:46               ` [5/6] dynamically allocate IRQ stacks (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
                                 ` (3 subsequent siblings)
  7 siblings, 2 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 17:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> Here's what I did for i386 for someone concerned about blowing the stack.

Add checks to __pa() so it goes BUG() on vmallocspace addresses.

Signed-off-by: William Irwin <wli@holomorphy.com>


Index: stack-paranoia/arch/i386/kernel/doublefault.c
===================================================================
--- stack-paranoia.orig/arch/i386/kernel/doublefault.c	2007-04-30 10:26:15.959875324 -0700
+++ stack-paranoia/arch/i386/kernel/doublefault.c	2007-04-30 10:34:55.201465216 -0700
@@ -62,5 +62,5 @@
 	.ss		= __KERNEL_DS,
 	.ds		= __USER_DS,
 
-	.__cr3		= __pa(swapper_pg_dir)
+	.__cr3		= (unsigned long)swapper_pg_dir - PAGE_OFFSET,
 };
Index: stack-paranoia/arch/i386/mm/pgtable.c
===================================================================
--- stack-paranoia.orig/arch/i386/mm/pgtable.c	2007-04-30 10:26:16.075881935 -0700
+++ stack-paranoia/arch/i386/mm/pgtable.c	2007-04-30 10:34:55.201465216 -0700
@@ -181,6 +181,15 @@
 #endif
 }
 
+unsigned long __kvaddr_to_paddr(unsigned long kvaddr)
+{
+	if (high_memory)
+		BUG_ON(kvaddr >= VMALLOC_START);
+	else
+		BUG_ON(kvaddr >= (unsigned long)__va(MAXMEM));
+	return kvaddr - PAGE_OFFSET;
+}
+
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
 	return (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
Index: stack-paranoia/include/asm-i386/page.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/page.h	2007-04-30 10:26:32.012790127 -0700
+++ stack-paranoia/include/asm-i386/page.h	2007-04-30 10:34:55.197464988 -0700
@@ -118,11 +118,17 @@
 #define __PAGE_OFFSET		((unsigned long)CONFIG_PAGE_OFFSET)
 #endif
 
-
 #define PAGE_OFFSET		((unsigned long)__PAGE_OFFSET)
+
+#ifdef __ASSEMBLY__
+#define __pa(x)			((unsigned long)(x)-PAGE_OFFSET)
+#else
+extern unsigned long __kvaddr_to_paddr(unsigned long);
+#define __pa(x)			__kvaddr_to_paddr((unsigned long)(x))
+#endif
+
 #define VMALLOC_RESERVE		((unsigned long)__VMALLOC_RESERVE)
 #define MAXMEM			(-__PAGE_OFFSET-__VMALLOC_RESERVE)
-#define __pa(x)			((unsigned long)(x)-PAGE_OFFSET)
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
 #define __pa_symbol(x)          __pa(RELOC_HIDE((unsigned long)(x),0))

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [5/6] dynamically allocate IRQ stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:38             ` William Lee Irwin III
                                 ` (3 preceding siblings ...)
  2007-04-30 17:45               ` [4/6] go BUG on vmallocspace in __pa() " William Lee Irwin III
@ 2007-04-30 17:46               ` William Lee Irwin III
  2007-04-30 19:49                 ` Zwane Mwaikambo
  2007-04-30 17:47               ` [6/6] arrange for a guard page on cpu 0's IRQ stack " William Lee Irwin III
                                 ` (2 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 17:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> Here's what I did for i386 for someone concerned about blowing the stack.

Fix up the conflict between IRQ stacks and deep stacks by dynamically
allocating IRQ stacks.

Signed-off-by: William Irwin <wli@holomorphy.com>


Index: stack-paranoia/arch/i386/Kconfig.debug
===================================================================
--- stack-paranoia.orig/arch/i386/Kconfig.debug	2007-04-30 10:34:07.058721717 -0700
+++ stack-paranoia/arch/i386/Kconfig.debug	2007-04-30 10:36:14.553987258 -0700
@@ -107,7 +107,6 @@
 
 config 64KSTACKS
 	bool "Use 64KB for kernel stacks"
-	depends on !IRQSTACKS
 	help
 	  If you say Y here, the kernel will use a 64KB stacksize.
 	  This impedes running more threads on a system and also increases
Index: stack-paranoia/arch/i386/kernel/irq.c
===================================================================
--- stack-paranoia.orig/arch/i386/kernel/irq.c	2007-04-30 10:34:07.058721717 -0700
+++ stack-paranoia/arch/i386/kernel/irq.c	2007-04-30 10:36:14.553987258 -0700
@@ -17,9 +17,11 @@
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/delay.h>
+#include <linux/bootmem.h>
 
 #include <asm/apic.h>
 #include <asm/uaccess.h>
+#include <asm/pgtable.h>
 
 DEFINE_PER_CPU(irq_cpustat_t, irq_stat) ____cacheline_internodealigned_in_smp;
 EXPORT_PER_CPU_SYMBOL(irq_stat);
@@ -56,8 +58,8 @@
 	u32                     stack[THREAD_SIZE/sizeof(u32)];
 };
 
-static union irq_ctx *hardirq_ctx[NR_CPUS] __read_mostly;
-static union irq_ctx *softirq_ctx[NR_CPUS] __read_mostly;
+static DEFINE_PER_CPU(union irq_ctx *, hardirq_ctx);
+static DEFINE_PER_CPU(union irq_ctx *, softirq_ctx);
 #endif
 
 /*
@@ -102,7 +104,7 @@
 #ifdef CONFIG_IRQSTACKS
 
 	curctx = (union irq_ctx *) current_thread_info();
-	irqctx = hardirq_ctx[smp_processor_id()];
+	irqctx = per_cpu(hardirq_ctx, smp_processor_id());
 
 	/*
 	 * this is where we switch to the IRQ stack. However, if we are
@@ -150,11 +152,44 @@
  * These should really be __section__(".bss.page_aligned") as well, but
  * gcc's 3.0 and earlier don't handle that correctly.
  */
-static char softirq_stack[NR_CPUS * THREAD_SIZE]
-		__attribute__((__aligned__(THREAD_SIZE)));
+static DEFINE_PER_CPU(char *, softirq_stack);
+static DEFINE_PER_CPU(char *, hardirq_stack);
 
-static char hardirq_stack[NR_CPUS * THREAD_SIZE]
-		__attribute__((__aligned__(THREAD_SIZE)));
+#ifdef CONFIG_VMALLOC_STACK
+static void * __init __alloc_irqstack(int cpu)
+{
+	int i;
+	struct page *pages[THREAD_SIZE/PAGE_SIZE], **tmp = pages;
+	struct vm_struct *area;
+
+	if (!cpu)
+		return __alloc_bootmem(THREAD_SIZE, THREAD_SIZE,
+						__pa(MAX_DMA_ADDRESS));
+
+	/* failures here are unrecoverable anyway */
+	area = get_vm_area(THREAD_SIZE, VM_IOREMAP);
+	for (i = 0; i < ARRAY_SIZE(pages); ++i)
+		pages[i] = alloc_page(GFP_HIGHUSER);
+	map_vm_area(area, PAGE_KERNEL, &tmp);
+	return area->addr;
+}
+#else /* !CONFIG_VMALLOC_STACK */
+static void * __init __alloc_irqstack(int cpu)
+{
+	if (!cpu)
+		return __alloc_bootmem(THREAD_SIZE, THREAD_SIZE,
+						__pa(MAX_DMA_ADDRESS));
+
+	return (void *)__get_free_pages(GFP_KERNEL,
+					ilog2(THREAD_SIZE/PAGE_SIZE));
+}
+#endif /* !CONFIG_VMALLOC_STACK */
+
+static void __init alloc_irqstacks(int cpu)
+{
+	per_cpu(softirq_stack, cpu) = __alloc_irqstack(cpu);
+	per_cpu(hardirq_stack, cpu) = __alloc_irqstack(cpu);
+}
 
 /*
  * allocate per-cpu stacks for hardirq and for softirq processing
@@ -163,34 +198,36 @@
 {
 	union irq_ctx *irqctx;
 
-	if (hardirq_ctx[cpu])
+	if (per_cpu(hardirq_ctx, cpu))
 		return;
 
-	irqctx = (union irq_ctx*) &hardirq_stack[cpu*THREAD_SIZE];
+	alloc_irqstacks(cpu);
+
+	irqctx = (union irq_ctx*)per_cpu(hardirq_stack, cpu);
 	irqctx->tinfo.task              = NULL;
 	irqctx->tinfo.exec_domain       = NULL;
 	irqctx->tinfo.cpu               = cpu;
 	irqctx->tinfo.preempt_count     = HARDIRQ_OFFSET;
 	irqctx->tinfo.addr_limit        = MAKE_MM_SEG(0);
 
-	hardirq_ctx[cpu] = irqctx;
+	per_cpu(hardirq_ctx, cpu) = irqctx;
 
-	irqctx = (union irq_ctx*) &softirq_stack[cpu*THREAD_SIZE];
+	irqctx = (union irq_ctx*)per_cpu(softirq_stack, cpu);
 	irqctx->tinfo.task              = NULL;
 	irqctx->tinfo.exec_domain       = NULL;
 	irqctx->tinfo.cpu               = cpu;
 	irqctx->tinfo.preempt_count     = 0;
 	irqctx->tinfo.addr_limit        = MAKE_MM_SEG(0);
 
-	softirq_ctx[cpu] = irqctx;
+	per_cpu(softirq_ctx, cpu) = irqctx;
 
 	printk("CPU %u irqstacks, hard=%p soft=%p\n",
-		cpu,hardirq_ctx[cpu],softirq_ctx[cpu]);
+		cpu, per_cpu(hardirq_ctx, cpu), per_cpu(softirq_ctx, cpu));
 }
 
 void irq_ctx_exit(int cpu)
 {
-	hardirq_ctx[cpu] = NULL;
+	per_cpu(hardirq_ctx, cpu) = NULL;
 }
 
 extern asmlinkage void __do_softirq(void);
@@ -209,7 +246,7 @@
 
 	if (local_softirq_pending()) {
 		curctx = current_thread_info();
-		irqctx = softirq_ctx[smp_processor_id()];
+		irqctx = per_cpu(softirq_ctx, smp_processor_id());
 		irqctx->tinfo.task = curctx->task;
 		irqctx->tinfo.previous_esp = current_stack_pointer;
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [6/6] arrange for a guard page on cpu 0's IRQ stack (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:38             ` William Lee Irwin III
                                 ` (4 preceding siblings ...)
  2007-04-30 17:46               ` [5/6] dynamically allocate IRQ stacks (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
@ 2007-04-30 17:47               ` William Lee Irwin III
  2007-04-30 18:22               ` [-mm patch] i386: enable 4k stacks by default Jan Engelhardt
  2007-04-30 18:51               ` Andi Kleen
  7 siblings, 0 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 17:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> Here's what I did for i386 for someone concerned about blowing the stack.

vmap() cpu 0's IRQ stack to ensure a guard page for it.

Signed-off-by: William Irwin <wli@holomorphy.com>


Index: stack-paranoia/arch/i386/kernel/irq.c
===================================================================
--- stack-paranoia.orig/arch/i386/kernel/irq.c	2007-04-30 10:36:14.553987258 -0700
+++ stack-paranoia/arch/i386/kernel/irq.c	2007-04-30 10:37:09.909141769 -0700
@@ -17,6 +17,7 @@
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/delay.h>
+#include <linux/mm.h>
 #include <linux/bootmem.h>
 
 #include <asm/apic.h>
@@ -156,6 +157,41 @@
 static DEFINE_PER_CPU(char *, hardirq_stack);
 
 #ifdef CONFIG_VMALLOC_STACK
+static void * __init irq_remap_stack(void *stack)
+{
+	int i;
+	struct page *pages[THREAD_SIZE/PAGE_SIZE];
+
+	for (i = 0; i < ARRAY_SIZE(pages); ++i)
+		pages[i] =  virt_to_page(stack + PAGE_SIZE*i);
+	return vmap(pages, THREAD_SIZE/PAGE_SIZE, VM_IOREMAP, PAGE_KERNEL);
+}
+
+static int __init irq_guard_cpu0(void)
+{
+	unsigned long flags;
+	void *tmp;
+
+	tmp = irq_remap_stack(per_cpu(softirq_stack, 0));
+	if (!tmp)
+		return -ENOMEM;
+	else {
+		local_irq_save(flags);
+		per_cpu(softirq_stack, 0) = tmp;
+		local_irq_restore(flags);
+	}
+	tmp = irq_remap_stack(per_cpu(hardirq_stack, 0));
+	if (!tmp)
+		return -ENOMEM;
+	else {
+		local_irq_save(flags);
+		per_cpu(hardirq_stack, 0) = tmp;
+		local_irq_restore(flags);
+	}
+	return 0;
+}
+core_initcall(irq_guard_cpu0);
+
 static void * __init __alloc_irqstack(int cpu)
 {
 	int i;

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:40               ` [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
@ 2007-04-30 18:10                 ` Christoph Hellwig
  2007-04-30 18:13                   ` William Lee Irwin III
  2007-04-30 18:25                 ` Adrian Bunk
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2007-04-30 18:10 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:40:47AM -0700, William Lee Irwin III wrote:
> On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> > Here's what I did for i386 for someone concerned about blowing the stack.
> 
> Make more stack sizes configurable, adding options for deeper stacks.
> This is largely for differential diagnosis in cases where stack overflows
> are suspected of silently corrupting memory or causing other problems not
> detected immediately.

I don't think we want anything bigger than 8, if you really need that
for debuggin it's easy enough to hack up on demand.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:43               ` [2/6] add config option to vmalloc stacks " William Lee Irwin III
@ 2007-04-30 18:11                 ` Christoph Hellwig
  2007-04-30 18:25                   ` Jan Engelhardt
  2007-04-30 19:09                   ` William Lee Irwin III
  2007-05-04  5:35                 ` Joseph Fannin
  1 sibling, 2 replies; 51+ messages in thread
From: Christoph Hellwig @ 2007-04-30 18:11 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:43:10AM -0700, William Lee Irwin III wrote:
> On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> > Here's what I did for i386 for someone concerned about blowing the stack.
> 
> Add a config option to vmalloc() task stacks so that stack overflows are
> detected without fail, and with a fatal failure mode at that.

Whee, this sounds like a really helpful although costly debugging option.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [3/6] make IRQ stacks independently configurable (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:44               ` [3/6] make IRQ stacks independently configurable " William Lee Irwin III
@ 2007-04-30 18:11                 ` Christoph Hellwig
  2007-04-30 18:14                   ` William Lee Irwin III
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2007-04-30 18:11 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:44:05AM -0700, William Lee Irwin III wrote:
> On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> > Here's what I did for i386 for someone concerned about blowing the stack.
> 
> Divorce IRQ stack configuration from CONFIG_4KSTACKS, as it's believed to
> be an important safety measure regardless of stack size by some users.

We should just make irqstacks the default and kill the non-irqstack code.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 18:10                 ` Christoph Hellwig
@ 2007-04-30 18:13                   ` William Lee Irwin III
  0 siblings, 0 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 18:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andi Kleen, Alan Cox, David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:40:47AM -0700, William Lee Irwin III wrote:
>> Make more stack sizes configurable, adding options for deeper stacks.
>> This is largely for differential diagnosis in cases where stack overflows
>> are suspected of silently corrupting memory or causing other problems not
>> detected immediately.

On Mon, Apr 30, 2007 at 07:10:32PM +0100, Christoph Hellwig wrote:
> I don't think we want anything bigger than 8, if you really need that
> for debuggin it's easy enough to hack up on demand.

These were instrumentation patches of such a form. The point of all
this was mostly centered around 5/6, which bootmem allocates the
various IRQ stacks as mentioned earlier.


- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [3/6] make IRQ stacks independently configurable (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 18:11                 ` Christoph Hellwig
@ 2007-04-30 18:14                   ` William Lee Irwin III
  0 siblings, 0 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 18:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andi Kleen, Alan Cox, David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:44:05AM -0700, William Lee Irwin III wrote:
>> Divorce IRQ stack configuration from CONFIG_4KSTACKS, as it's believed to
>> be an important safety measure regardless of stack size by some users.

On Mon, Apr 30, 2007 at 07:11:24PM +0100, Christoph Hellwig wrote:
> We should just make irqstacks the default and kill the non-irqstack code.

For mainline that sounds like a great idea to me.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30 17:38             ` William Lee Irwin III
                                 ` (5 preceding siblings ...)
  2007-04-30 17:47               ` [6/6] arrange for a guard page on cpu 0's IRQ stack " William Lee Irwin III
@ 2007-04-30 18:22               ` Jan Engelhardt
  2007-04-30 18:35                 ` William Lee Irwin III
  2007-04-30 18:51               ` Andi Kleen
  7 siblings, 1 reply; 51+ messages in thread
From: Jan Engelhardt @ 2007-04-30 18:22 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel


On Apr 30 2007 10:38, William Lee Irwin III wrote:
>> static char softirq_stack[NR_CPUS * THREAD_SIZE]
>>                 __attribute__((__aligned__(THREAD_SIZE)));
>> 
>> static char hardirq_stack[NR_CPUS * THREAD_SIZE]
>>                 __attribute__((__aligned__(THREAD_SIZE)));
>> 
>> With 8K stacks and NR_CPUS==128 that would be 2MB statically reserved. Yuck.
>> Really needs to be dynamically allocated. I'll take a look once the .22 
>> big merge is done.
>
>Here's what I did for i386 for someone concerned about blowing the stack.

If we really need it, then maybe a variable like CONFIG_LOG_BUF_SHIFT
could be hacked up, resulting in CONFIG_THREAD_SIZE_SHIFT, defaulting to 3
(=8KB).


Jan
-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 18:11                 ` Christoph Hellwig
@ 2007-04-30 18:25                   ` Jan Engelhardt
  2007-04-30 19:09                   ` William Lee Irwin III
  1 sibling, 0 replies; 51+ messages in thread
From: Jan Engelhardt @ 2007-04-30 18:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: William Lee Irwin III, Andi Kleen, Alan Cox, David Chinner,
	Zan Lynx, Adrian Bunk, Linux Kernel


On Apr 30 2007 19:11, Christoph Hellwig wrote:
>On Mon, Apr 30, 2007 at 10:43:10AM -0700, William Lee Irwin III wrote:
>> On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
>> > Here's what I did for i386 for someone concerned about blowing the stack.
>> 
>> Add a config option to vmalloc() task stacks so that stack overflows are
>> detected without fail, and with a fatal failure mode at that.
>
>Whee, this sounds like a really helpful although costly debugging option.

Options are there to be disabled, making the cost ideally zero for 
people who do not want a specific feature.


Jan
-- 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:40               ` [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
  2007-04-30 18:10                 ` Christoph Hellwig
@ 2007-04-30 18:25                 ` Adrian Bunk
  2007-04-30 18:32                   ` William Lee Irwin III
  1 sibling, 1 reply; 51+ messages in thread
From: Adrian Bunk @ 2007-04-30 18:25 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Linux Kernel

On Mon, Apr 30, 2007 at 10:40:47AM -0700, William Lee Irwin III wrote:
> On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> > Here's what I did for i386 for someone concerned about blowing the stack.
> 
> Make more stack sizes configurable, adding options for deeper stacks.
> This is largely for differential diagnosis in cases where stack overflows
> are suspected of silently corrupting memory or causing other problems not
> detected immediately.
> 
> Signed-off-by: William Irwin <wli@holomorphy.com>
> 
> 
> Index: stack-paranoia/arch/i386/Kconfig.debug
> ===================================================================
> --- stack-paranoia.orig/arch/i386/Kconfig.debug	2007-04-30 10:26:15.863869853 -0700
> +++ stack-paranoia/arch/i386/Kconfig.debug	2007-04-30 10:31:43.878562345 -0700
> @@ -56,6 +56,10 @@
>  	  portion of the kernel code won't be covered by a 2MB TLB anymore.
>  	  If in doubt, say "N".
>  
> +choice
> +	prompt "Stack size"
> +	default 8KSTACKS
> +
>  config 4KSTACKS
>  	bool "Use 4Kb for kernel stacks instead of 8Kb"
>  	depends on DEBUG_KERNEL
> @@ -66,6 +70,37 @@
>  	  on the VM subsystem for higher order allocations. This option
>  	  will also use IRQ stacks to compensate for the reduced stackspace.
>  
> +config 8KSTACKS
> +	bool "Use 8KB for kernel stacks"
> +	help
> +	  If you say Y here, the kernel will use its standard stacksize.
>...

These are no questions a user should ever see, and it's impossible that 
a user will be able to choose the right answer - that's an 
implementation detail, and it has to be set correctly automatically 
without bothering the user.

Pro of bigger stacks:
- kernel developers have to care less about stack usage

Pros of smaller stacks:
- less memory usage
- higher order allocations are more likely to fail with fragmented
  memory

We are currently at "works with 8k stacks and without separate IRQ stacks"
and at "works in most situations with 4k stacks", so nothing > 8k should 
be required.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 18:25                 ` Adrian Bunk
@ 2007-04-30 18:32                   ` William Lee Irwin III
  0 siblings, 0 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 18:32 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Linux Kernel

On Mon, Apr 30, 2007 at 08:25:38PM +0200, Adrian Bunk wrote:
> These are no questions a user should ever see, and it's impossible that 
> a user will be able to choose the right answer - that's an 
> implementation detail, and it has to be set correctly automatically 
> without bothering the user.
> Pro of bigger stacks:
> - kernel developers have to care less about stack usage
> Pros of smaller stacks:
> - less memory usage
> - higher order allocations are more likely to fail with fragmented
>   memory
> We are currently at "works with 8k stacks and without separate IRQ stacks"
> and at "works in most situations with 4k stacks", so nothing > 8k should 
> be required.

freitag mentioned that IRQ stacks should be bootmem allocated, so I
dredged up a series of debug patches I had that did so. This is in no
way intended as a suggestion we should e.g. merge config options as
in 1/6 for 64KB stacks.

This is all sort of meant to be cherrypicked mostly for 5/6's bootmem
allocation of IRQ stacks. Otherwise it's just a set of debug patches
I wrote. 1/6 is merely needed for 5/6 to eventually apply. 5/6 can't
really be used verbatim because it does various handling for vmalloc()
of stacks and depends on the patches in the series prior to it, but
it is something in the way of bootmem allocation of IRQ stacks being
out there already.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30 18:22               ` [-mm patch] i386: enable 4k stacks by default Jan Engelhardt
@ 2007-04-30 18:35                 ` William Lee Irwin III
  0 siblings, 0 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 18:35 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Apr 30 2007 10:38, William Lee Irwin III wrote:
>> Here's what I did for i386 for someone concerned about blowing the stack.

On Mon, Apr 30, 2007 at 08:22:43PM +0200, Jan Engelhardt wrote:
> If we really need it, then maybe a variable like CONFIG_LOG_BUF_SHIFT
> could be hacked up, resulting in CONFIG_THREAD_SIZE_SHIFT, defaulting to 3
> (=8KB).

I wouldn't bother with that. This is all just a sample of apparently
working code that bootmem allocates IRQ stacks.

It's a set of debug patches so there's no intention of any of this
being merged as-posted.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30 17:38             ` William Lee Irwin III
                                 ` (6 preceding siblings ...)
  2007-04-30 18:22               ` [-mm patch] i386: enable 4k stacks by default Jan Engelhardt
@ 2007-04-30 18:51               ` Andi Kleen
  7 siblings, 0 replies; 51+ messages in thread
From: Andi Kleen @ 2007-04-30 18:51 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> On Mon, Apr 30, 2007 at 02:13:16PM +0200, Andi Kleen wrote:
> > Actually looking at the code it would need some fixes first:
> > /*
> >  * These should really be __section__(".bss.page_aligned") as well, but
> >  * gcc's 3.0 and earlier don't handle that correctly.
> >  */
> > static char softirq_stack[NR_CPUS * THREAD_SIZE]
> >                 __attribute__((__aligned__(THREAD_SIZE)));
> > 
> > static char hardirq_stack[NR_CPUS * THREAD_SIZE]
> >                 __attribute__((__aligned__(THREAD_SIZE)));
> > 
> > With 8K stacks and NR_CPUS==128 that would be 2MB statically reserved. Yuck.
> > Really needs to be dynamically allocated. I'll take a look once the .22 
> > big merge is done.
> 
> Here's what I did for i386 for someone concerned about blowing the stack.

Looks all good, except for the larger than 8k stack options.
I'll merge it up once the big merge is out of the way.

-Andi

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [4/6] go BUG on vmallocspace in __pa() (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:45               ` [4/6] go BUG on vmallocspace in __pa() " William Lee Irwin III
@ 2007-04-30 18:52                 ` Andi Kleen
  2007-04-30 18:58                   ` William Lee Irwin III
  2007-04-30 19:20                   ` Alan Cox
  2007-05-02 22:31                 ` [4/6] go BUG on vmallocspace in __pa() Jeremy Fitzhardinge
  1 sibling, 2 replies; 51+ messages in thread
From: Andi Kleen @ 2007-04-30 18:52 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:45:10AM -0700, William Lee Irwin III wrote:
> On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> > Here's what I did for i386 for someone concerned about blowing the stack.
> 
> Add checks to __pa() so it goes BUG() on vmallocspace addresses.

Sorry I think that's too costly to do. __pa is pretty common

-Andi

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [4/6] go BUG on vmallocspace in __pa() (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 18:52                 ` Andi Kleen
@ 2007-04-30 18:58                   ` William Lee Irwin III
  2007-04-30 19:20                   ` Alan Cox
  1 sibling, 0 replies; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 18:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
>>> Here's what I did for i386 for someone concerned about blowing the stack.

On Mon, Apr 30, 2007 at 10:45:10AM -0700, William Lee Irwin III wrote:
>> Add checks to __pa() so it goes BUG() on vmallocspace addresses.

On Mon, Apr 30, 2007 at 08:52:42PM +0200, Andi Kleen wrote:
> Sorry I think that's too costly to do. __pa is pretty common

The original intention essentially ignored efficiency. I've no
objection to dropping whatever pieces don't fit (or, for that matter,
whatever sort of NIH). The hope in posting all this was that some piece
might be found useful as an example or guide to doing the real patches
for mainline. If any of it is useful as it stands, that's good too.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 18:11                 ` Christoph Hellwig
  2007-04-30 18:25                   ` Jan Engelhardt
@ 2007-04-30 19:09                   ` William Lee Irwin III
  2007-04-30 19:15                     ` Christoph Hellwig
  1 sibling, 1 reply; 51+ messages in thread
From: William Lee Irwin III @ 2007-04-30 19:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andi Kleen, Alan Cox, David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:43:10AM -0700, William Lee Irwin III wrote:
>> Add a config option to vmalloc() task stacks so that stack overflows are
>> detected without fail, and with a fatal failure mode at that.

On Mon, Apr 30, 2007 at 07:11:04PM +0100, Christoph Hellwig wrote:
> Whee, this sounds like a really helpful although costly debugging option.

If desired, I can redo it standalone.

It should be clear from the remainder of the series of debug patches
that it's readily extensible to IRQ stacks for similar purposes, as
IRQ stacks are similarly vmalloc()'d by later patches in the series.

The patch to make __pa() BUG() on vmallocspace addresses may also be
worthy of consideration in conjunction with vmalloc()'ing the stack.
That particular patch was suggested by freitag.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 19:09                   ` William Lee Irwin III
@ 2007-04-30 19:15                     ` Christoph Hellwig
  2007-04-30 19:23                       ` Bill Irwin
                                         ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Christoph Hellwig @ 2007-04-30 19:15 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Christoph Hellwig, Andi Kleen, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

So if you want to invest some time into getting this into mergeable
shape I'd suggest you redo the patch series in the following way:

 patch 1: dynamic allocated irq stacks
 patch 2: make irqstacks unconditional, but allow selecting 4/8k stacks
 patch 3: introduce a DEBUG_STACK option that enables vmalloc'ed stacks
 	  and adds the __pa check.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [4/6] go BUG on vmallocspace in __pa() (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 18:52                 ` Andi Kleen
  2007-04-30 18:58                   ` William Lee Irwin III
@ 2007-04-30 19:20                   ` Alan Cox
  2007-04-30 19:26                     ` Bill Irwin
  1 sibling, 1 reply; 51+ messages in thread
From: Alan Cox @ 2007-04-30 19:20 UTC (permalink / raw)
  To: Andi Kleen
  Cc: William Lee Irwin III, Andi Kleen, Christoph Hellwig,
	David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

On Mon, 30 Apr 2007 20:52:42 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> On Mon, Apr 30, 2007 at 10:45:10AM -0700, William Lee Irwin III wrote:
> > On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
> > > Here's what I did for i386 for someone concerned about blowing the stack.
> > 
> > Add checks to __pa() so it goes BUG() on vmallocspace addresses.
> 
> Sorry I think that's too costly to do. __pa is pretty common

But not too costly to do if it is done solely with vmalloc the stack for
debug purposes. The bigger problem with the vmalloc approach is there are
still offenders who DMA off the kernel stack on i386 although I'd hope
they are all ancient... we'll find out with this anyway

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 19:15                     ` Christoph Hellwig
@ 2007-04-30 19:23                       ` Bill Irwin
  2007-04-30 22:04                       ` Bill Irwin
  2007-05-01 22:36                       ` Matt Mackall
  2 siblings, 0 replies; 51+ messages in thread
From: Bill Irwin @ 2007-04-30 19:23 UTC (permalink / raw)
  To: Christoph Hellwig, Andi Kleen, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 08:15:11PM +0100, Christoph Hellwig wrote:
> So if you want to invest some time into getting this into mergeable
> shape I'd suggest you redo the patch series in the following way:
>  patch 1: dynamic allocated irq stacks
>  patch 2: make irqstacks unconditional, but allow selecting 4/8k stacks
>  patch 3: introduce a DEBUG_STACK option that enables vmalloc'ed stacks
>  	  and adds the __pa check.

This was actually done for work purposes, hence the change of email
addresses.

I can redo it in such a manner, provided it's not duplicating mergework
Andi Kleen intends to do or otherwise includes pieces (like the __pa()
check) he'd prefer to drop.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [4/6] go BUG on vmallocspace in __pa() (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 19:20                   ` Alan Cox
@ 2007-04-30 19:26                     ` Bill Irwin
  0 siblings, 0 replies; 51+ messages in thread
From: Bill Irwin @ 2007-04-30 19:26 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andi Kleen, Christoph Hellwig, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:38:19AM -0700, William Lee Irwin III wrote:
>>>> Here's what I did for i386 for someone concerned about blowing the
>>>> stack.

On Mon, Apr 30, 2007 at 10:45:10AM -0700, William Lee Irwin III wrote:
>>> Add checks to __pa() so it goes BUG() on vmallocspace addresses.

On Mon, 30 Apr 2007 20:52:42 +0200 Andi Kleen <andi@firstfloor.org> wrote:
>> Sorry I think that's too costly to do. __pa is pretty common

On Mon, Apr 30, 2007 at 08:20:59PM +0100, Alan Cox wrote:
> But not too costly to do if it is done solely with vmalloc the stack
> for
> debug purposes. The bigger problem with the vmalloc approach is there
> are
> still offenders who DMA off the kernel stack on i386 although I'd hope
> they are all ancient... we'll find out with this anyway

Sorry about the email address switch. This is actually work-related.

The stack vmalloc() and __pa() patches were partially intended to catch
or otherwise deliberately break such offenders and go BUG() on them. The
__pa() check in particular is exclusively for the purpose of catching
them. The primary motive being the stack vmalloc() patches remained, of
course, establishing a guard page to trap stack overflows.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [5/6] dynamically allocate IRQ stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:46               ` [5/6] dynamically allocate IRQ stacks (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
@ 2007-04-30 19:49                 ` Zwane Mwaikambo
  2007-04-30 20:03                   ` Bill Irwin
  2007-04-30 20:07                   ` Andi Kleen
  0 siblings, 2 replies; 51+ messages in thread
From: Zwane Mwaikambo @ 2007-04-30 19:49 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, 30 Apr 2007, William Lee Irwin III wrote:

> -static char softirq_stack[NR_CPUS * THREAD_SIZE]
> -		__attribute__((__aligned__(THREAD_SIZE)));
> +static DEFINE_PER_CPU(char *, softirq_stack);
> +static DEFINE_PER_CPU(char *, hardirq_stack);
>  
> -static char hardirq_stack[NR_CPUS * THREAD_SIZE]
> -		__attribute__((__aligned__(THREAD_SIZE)));
> +#ifdef CONFIG_VMALLOC_STACK
> +static void * __init __alloc_irqstack(int cpu)

How about just using DEFINE_PER_CPU and allowing the percpu code 
dynamically allocate.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [5/6] dynamically allocate IRQ stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 19:49                 ` Zwane Mwaikambo
@ 2007-04-30 20:03                   ` Bill Irwin
  2007-04-30 20:07                   ` Andi Kleen
  1 sibling, 0 replies; 51+ messages in thread
From: Bill Irwin @ 2007-04-30 20:03 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, 30 Apr 2007, William Lee Irwin III wrote:
>> -static char softirq_stack[NR_CPUS * THREAD_SIZE]
>> -		__attribute__((__aligned__(THREAD_SIZE)));
>> +static DEFINE_PER_CPU(char *, softirq_stack);
>> +static DEFINE_PER_CPU(char *, hardirq_stack);
>>  
>> -static char hardirq_stack[NR_CPUS * THREAD_SIZE]
>> -		__attribute__((__aligned__(THREAD_SIZE)));
>> +#ifdef CONFIG_VMALLOC_STACK
>> +static void * __init __alloc_irqstack(int cpu)

On Mon, Apr 30, 2007 at 12:49:04PM -0700, Zwane Mwaikambo wrote:
> How about just using DEFINE_PER_CPU and allowing the percpu code 
> dynamically allocate.

The indirection was needed to vmalloc()/vmap() the IRQ stacks. It
should probably be decided explicitly whether vmalloc() of the IRQ
stacks is considered desirable for any mainline merge of these things.

This code is by no means sacred to me, so I have no particular
preference with respect to the suggested tradeoffs or reworks. Whatever
people want out of all this for mainline (even if nothing) is fine by me.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [5/6] dynamically allocate IRQ stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 19:49                 ` Zwane Mwaikambo
  2007-04-30 20:03                   ` Bill Irwin
@ 2007-04-30 20:07                   ` Andi Kleen
  1 sibling, 0 replies; 51+ messages in thread
From: Andi Kleen @ 2007-04-30 20:07 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: William Lee Irwin III, Andi Kleen, Christoph Hellwig, Alan Cox,
	David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 12:49:04PM -0700, Zwane Mwaikambo wrote:
> On Mon, 30 Apr 2007, William Lee Irwin III wrote:
> 
> > -static char softirq_stack[NR_CPUS * THREAD_SIZE]
> > -		__attribute__((__aligned__(THREAD_SIZE)));
> > +static DEFINE_PER_CPU(char *, softirq_stack);
> > +static DEFINE_PER_CPU(char *, hardirq_stack);
> >  
> > -static char hardirq_stack[NR_CPUS * THREAD_SIZE]
> > -		__attribute__((__aligned__(THREAD_SIZE)));
> > +#ifdef CONFIG_VMALLOC_STACK
> > +static void * __init __alloc_irqstack(int cpu)
> 
> How about just using DEFINE_PER_CPU and allowing the percpu code 
> dynamically allocate.

That would require a true possible map first.

-Andi

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 19:15                     ` Christoph Hellwig
  2007-04-30 19:23                       ` Bill Irwin
@ 2007-04-30 22:04                       ` Bill Irwin
  2007-05-01 22:36                       ` Matt Mackall
  2 siblings, 0 replies; 51+ messages in thread
From: Bill Irwin @ 2007-04-30 22:04 UTC (permalink / raw)
  To: Christoph Hellwig, Andi Kleen, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel
  Cc: William Irwin

On Mon, Apr 30, 2007 at 08:15:11PM +0100, Christoph Hellwig wrote:
> So if you want to invest some time into getting this into mergeable
> shape I'd suggest you redo the patch series in the following way:
>  patch 1: dynamic allocated irq stacks
>  patch 2: make irqstacks unconditional, but allow selecting 4/8k stacks
>  patch 3: introduce a DEBUG_STACK option that enables vmalloc'ed stacks
>  	  and adds the __pa check.

I've not heard anything back on this from freitag, so I've redone the
patch series precisely as you've outlined above.

I'm taking the updated patch series through a battery of testboots
before posting for merging purposes.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30 11:30       ` Jens Axboe
@ 2007-04-30 23:24         ` Neil Brown
  2007-05-01  8:01           ` Jens Axboe
  0 siblings, 1 reply; 51+ messages in thread
From: Neil Brown @ 2007-04-30 23:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Zan Lynx, Adrian Bunk, Linux Kernel

On Monday April 30, jens.axboe@oracle.com wrote:
> On Mon, Apr 30 2007, Christoph Hellwig wrote:
> > On Mon, Apr 30, 2007 at 06:55:52PM +1000, Neil Brown wrote:
> > > On Saturday April 28, zlynx@acm.org wrote:
> > > > On Sat, 2007-04-28 at 21:19 +0200, Adrian Bunk wrote:
> > > > > 4k stacks have become a well-tested feature used fore a long time in
> > > > > Fedora and even in RHEL 4.
> > > > 
> > > > So has anyone fixed the bugs involving ext3 and LVM snapshots on top of
> > > > DM mirror?
> > > 
> > > Well -mm has a patch which makes stacked block devices much less of an
> > > issue, but this hasn't made it -linus yet - I think the dm developer
> > > isn't happy that dm works properly with it.
> > 
> > I get a little tired about this objection.  If the particular dm code
> > was racy before and this can in theory make it worse it's their problem,
> > and putting in perfectly fine code will give them an incentive to
> > fix their year old race.
> 
> Agree, Neil would you mind if I merged the patch?

I am perfectly happy with this patch being merged - yes.

NeilBrown

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [-mm patch] i386: enable 4k stacks by default
  2007-04-30 23:24         ` Neil Brown
@ 2007-05-01  8:01           ` Jens Axboe
  0 siblings, 0 replies; 51+ messages in thread
From: Jens Axboe @ 2007-05-01  8:01 UTC (permalink / raw)
  To: Neil Brown; +Cc: Christoph Hellwig, Zan Lynx, Adrian Bunk, Linux Kernel

On Tue, May 01 2007, Neil Brown wrote:
> On Monday April 30, jens.axboe@oracle.com wrote:
> > On Mon, Apr 30 2007, Christoph Hellwig wrote:
> > > On Mon, Apr 30, 2007 at 06:55:52PM +1000, Neil Brown wrote:
> > > > On Saturday April 28, zlynx@acm.org wrote:
> > > > > On Sat, 2007-04-28 at 21:19 +0200, Adrian Bunk wrote:
> > > > > > 4k stacks have become a well-tested feature used fore a long time in
> > > > > > Fedora and even in RHEL 4.
> > > > > 
> > > > > So has anyone fixed the bugs involving ext3 and LVM snapshots on top of
> > > > > DM mirror?
> > > > 
> > > > Well -mm has a patch which makes stacked block devices much less of an
> > > > issue, but this hasn't made it -linus yet - I think the dm developer
> > > > isn't happy that dm works properly with it.
> > > 
> > > I get a little tired about this objection.  If the particular dm code
> > > was racy before and this can in theory make it worse it's their problem,
> > > and putting in perfectly fine code will give them an incentive to
> > > fix their year old race.
> > 
> > Agree, Neil would you mind if I merged the patch?
> 
> I am perfectly happy with this patch being merged - yes.

Ok, queued for inclusion.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 19:15                     ` Christoph Hellwig
  2007-04-30 19:23                       ` Bill Irwin
  2007-04-30 22:04                       ` Bill Irwin
@ 2007-05-01 22:36                       ` Matt Mackall
  2007-05-01 22:51                         ` Bill Irwin
  2 siblings, 1 reply; 51+ messages in thread
From: Matt Mackall @ 2007-05-01 22:36 UTC (permalink / raw)
  To: Christoph Hellwig, William Lee Irwin III, Andi Kleen, Alan Cox,
	David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 08:15:11PM +0100, Christoph Hellwig wrote:
> So if you want to invest some time into getting this into mergeable
> shape I'd suggest you redo the patch series in the following way:
> 
>  patch 1: dynamic allocated irq stacks

Can we register them lazily at request_irq time?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-05-01 22:36                       ` Matt Mackall
@ 2007-05-01 22:51                         ` Bill Irwin
  2007-05-01 23:07                           ` Alan Cox
  2007-05-01 23:15                           ` Matt Mackall
  0 siblings, 2 replies; 51+ messages in thread
From: Bill Irwin @ 2007-05-01 22:51 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Christoph Hellwig, Andi Kleen, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel, wli

On Mon, Apr 30, 2007 at 08:15:11PM +0100, Christoph Hellwig wrote:
>> So if you want to invest some time into getting this into mergeable
>> shape I'd suggest you redo the patch series in the following way:
>>  patch 1: dynamic allocated irq stacks

On Tue, May 01, 2007 at 05:36:06PM -0500, Matt Mackall wrote:
> Can we register them lazily at request_irq time?

These IRQ stacks are per-cpu, not per-IRQ. It may make sense to
implement per-IRQ stacks, in which case dynamic allocation at the time
of request_irq() will make sense.

Would you like me to implement per-IRQ IRQ stacks?


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-05-01 22:51                         ` Bill Irwin
@ 2007-05-01 23:07                           ` Alan Cox
  2007-05-01 23:23                             ` Bill Irwin
  2007-05-01 23:15                           ` Matt Mackall
  1 sibling, 1 reply; 51+ messages in thread
From: Alan Cox @ 2007-05-01 23:07 UTC (permalink / raw)
  To: Bill Irwin
  Cc: Matt Mackall, Christoph Hellwig, Andi Kleen, David Chinner,
	Zan Lynx, Adrian Bunk, Linux Kernel, wli

> These IRQ stacks are per-cpu, not per-IRQ. It may make sense to
> implement per-IRQ stacks, in which case dynamic allocation at the time
> of request_irq() will make sense.

This depends if active IRQ count exceeds active CPU count worst cases.
For the big boxes it might well do but for small ones we seem to be best
with per CPU.

Alan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-05-01 22:51                         ` Bill Irwin
  2007-05-01 23:07                           ` Alan Cox
@ 2007-05-01 23:15                           ` Matt Mackall
  2007-05-01 23:27                             ` Bill Irwin
  1 sibling, 1 reply; 51+ messages in thread
From: Matt Mackall @ 2007-05-01 23:15 UTC (permalink / raw)
  To: Bill Irwin, Christoph Hellwig, Andi Kleen, Alan Cox,
	David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel

On Tue, May 01, 2007 at 03:51:25PM -0700, Bill Irwin wrote:
> On Mon, Apr 30, 2007 at 08:15:11PM +0100, Christoph Hellwig wrote:
> >> So if you want to invest some time into getting this into mergeable
> >> shape I'd suggest you redo the patch series in the following way:
> >>  patch 1: dynamic allocated irq stacks
> 
> On Tue, May 01, 2007 at 05:36:06PM -0500, Matt Mackall wrote:
> > Can we register them lazily at request_irq time?
> 
> These IRQ stacks are per-cpu, not per-IRQ. It may make sense to
> implement per-IRQ stacks, in which case dynamic allocation at the time
> of request_irq() will make sense.
> 
> Would you like me to implement per-IRQ IRQ stacks?

It's probably the "right" thing to do, but it does have higher
overhead for most systems.

But it also gives a very obvious migration path to -rt's irq threads.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-05-01 23:07                           ` Alan Cox
@ 2007-05-01 23:23                             ` Bill Irwin
  0 siblings, 0 replies; 51+ messages in thread
From: Bill Irwin @ 2007-05-01 23:23 UTC (permalink / raw)
  To: Alan Cox
  Cc: Bill Irwin, Matt Mackall, Christoph Hellwig, Andi Kleen,
	David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel, wli

At some point in the past, I wrote:
>> These IRQ stacks are per-cpu, not per-IRQ. It may make sense to
>> implement per-IRQ stacks, in which case dynamic allocation at the time
>> of request_irq() will make sense.

On Wed, May 02, 2007 at 12:07:45AM +0100, Alan Cox wrote:
> This depends if active IRQ count exceeds active CPU count worst cases.
> For the big boxes it might well do but for small ones we seem to be best
> with per CPU.

I'll leave IRQ stacks per-CPU, then. The prevailing opinion on large
i386 does not favor doing much of anything to accommodate it.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-05-01 23:15                           ` Matt Mackall
@ 2007-05-01 23:27                             ` Bill Irwin
  0 siblings, 0 replies; 51+ messages in thread
From: Bill Irwin @ 2007-05-01 23:27 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Bill Irwin, Christoph Hellwig, Andi Kleen, Alan Cox,
	David Chinner, Zan Lynx, Adrian Bunk, Linux Kernel, wli

On Tue, May 01, 2007 at 05:36:06PM -0500, Matt Mackall wrote:
>>> Can we register them lazily at request_irq time?

On Tue, May 01, 2007 at 03:51:25PM -0700, Bill Irwin wrote:
>> These IRQ stacks are per-cpu, not per-IRQ. It may make sense to
>> implement per-IRQ stacks, in which case dynamic allocation at the time
>> of request_irq() will make sense.
>> Would you like me to implement per-IRQ IRQ stacks?

On Tue, May 01, 2007 at 06:15:48PM -0500, Matt Mackall wrote:
> It's probably the "right" thing to do, but it does have higher
> overhead for most systems.
> But it also gives a very obvious migration path to -rt's irq threads.

Well, I don't really know how much of -rt or which pieces of it are
really going in. Otherwise it favors large-scale i386, which people
largely want to ignore/break/etc. I'll err on the side of caution and
not change it.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [4/6] go BUG on vmallocspace in __pa()
  2007-04-30 17:45               ` [4/6] go BUG on vmallocspace in __pa() " William Lee Irwin III
  2007-04-30 18:52                 ` Andi Kleen
@ 2007-05-02 22:31                 ` Jeremy Fitzhardinge
  2007-05-02 22:48                   ` Bill Irwin
  1 sibling, 1 reply; 51+ messages in thread
From: Jeremy Fitzhardinge @ 2007-05-02 22:31 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

William Lee Irwin III wrote:
>  
> +unsigned long __kvaddr_to_paddr(unsigned long kvaddr)
> +{
> +	if (high_memory)
> +		BUG_ON(kvaddr >= VMALLOC_START);
> +	else
> +		BUG_ON(kvaddr >= (unsigned long)__va(MAXMEM));
> +	return kvaddr - PAGE_OFFSET;
> +}
>   

Needs to be exported so that modules can use __pa.  Though I suspect
most modules doing so are buggy:

WARNING: "__kvaddr_to_paddr" [sound/pci/snd-intel8x0.ko] undefined!
WARNING: "__kvaddr_to_paddr" [sound/core/snd-pcm.ko] undefined!
WARNING: "__kvaddr_to_paddr" [sound/core/snd-page-alloc.ko] undefined!
WARNING: "__kvaddr_to_paddr" [net/ieee80211/ieee80211_crypt_wep.ko] undefined!
WARNING: "__kvaddr_to_paddr" [fs/xfs/xfs.ko] undefined!
WARNING: "__kvaddr_to_paddr" [fs/nfsd/nfsd.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/usb/misc/usbtest.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/scsi/sg.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/parport/parport_pc.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/net/e1000/e1000.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/mmc/mmc_core.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/mmc/mmc_block.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/kvm/kvm.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/kvm/kvm-intel.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/ieee1394/sbp2.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/ieee1394/ohci1394.ko] undefined!
WARNING: "__kvaddr_to_paddr" [drivers/char/drm/drm.ko] undefined!


    J

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [4/6] go BUG on vmallocspace in __pa()
  2007-05-02 22:31                 ` [4/6] go BUG on vmallocspace in __pa() Jeremy Fitzhardinge
@ 2007-05-02 22:48                   ` Bill Irwin
  0 siblings, 0 replies; 51+ messages in thread
From: Bill Irwin @ 2007-05-02 22:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel, wli

William Lee Irwin III wrote:
>> +unsigned long __kvaddr_to_paddr(unsigned long kvaddr)
>> +{
>> +	if (high_memory)
>> +		BUG_ON(kvaddr >= VMALLOC_START);
>> +	else
>> +		BUG_ON(kvaddr >= (unsigned long)__va(MAXMEM));
>> +	return kvaddr - PAGE_OFFSET;
>> +}

On Wed, May 02, 2007 at 03:31:03PM -0700, Jeremy Fitzhardinge wrote:
> Needs to be exported so that modules can use __pa.  Though I suspect
> most modules doing so are buggy:

Done.


-- wli

This patch introduces CONFIG_DEBUG_STACK, which vmalloc()'s task and IRQ
stacks in order to establish guard pages. In such a manner any stack
overflow that references pages immediately adjacent to the stack is
immediately trapped with a fault, which precludes silent memory corruption
or difficult-to-decipher failure modes resulting from stack corruption.

It furthermore adds a check to __pa() to catch drivers trying to DMA off
the stack, which more generally flags incorrect attempts to use __pa()
on vmallocspace addresses.

Signed-off-by: William Irwin <bill.irwin@oracle.com>


Index: stack-paranoia/arch/i386/Kconfig.debug
===================================================================
--- stack-paranoia.orig/arch/i386/Kconfig.debug	2007-05-01 10:18:50.942170611 -0700
+++ stack-paranoia/arch/i386/Kconfig.debug	2007-05-01 10:19:47.145373449 -0700
@@ -35,6 +35,16 @@
 
 	  This option will slow down process creation somewhat.
 
+config DEBUG_STACK
+	bool "Debug stack overflows"
+	depends on DEBUG_KERNEL
+	help
+	  Allocates the stack physically discontiguously and from high
+	  memory. Furthermore an unmapped guard page follows the stack,
+	  which results in immediately trapping stack overflows instead
+	  of silent corruption. This is not for end-users. It's intended
+	  to trigger fatal system errors under various forms of stack abuse.
+
 comment "Page alloc debug is incompatible with Software Suspend on i386"
 	depends on DEBUG_KERNEL && SOFTWARE_SUSPEND
 
Index: stack-paranoia/arch/i386/kernel/process.c
===================================================================
--- stack-paranoia.orig/arch/i386/kernel/process.c	2007-05-01 10:18:50.950171067 -0700
+++ stack-paranoia/arch/i386/kernel/process.c	2007-05-01 10:19:47.145373449 -0700
@@ -25,6 +25,7 @@
 #include <linux/stddef.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/workqueue.h>
 #include <linux/user.h>
 #include <linux/a.out.h>
 #include <linux/interrupt.h>
@@ -322,6 +323,58 @@
 	show_trace(NULL, regs, &regs->esp);
 }
 
+#ifdef CONFIG_DEBUG_STACK
+struct thread_info *alloc_thread_info(struct task_struct *unused)
+{
+	int i;
+	struct page *pages[THREAD_SIZE/PAGE_SIZE], **tmp = pages;
+	struct vm_struct *area;
+
+	/*
+	 * passing VM_IOREMAP for the sake of alignment is why
+	 * all this is done by hand.
+	 */
+	area = get_vm_area(THREAD_SIZE, VM_IOREMAP);
+	if (!area)
+		return NULL;
+	for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i) {
+		pages[i] = alloc_page(GFP_HIGHUSER);
+		if (!pages[i])
+			goto out_free_pages;
+	}
+	/* implicitly transfer page refcounts to the vm_struct */
+	if (map_vm_area(area, PAGE_KERNEL, &tmp))
+		goto out_remove_area;
+	/* it may be worth poisoning, save thread_info proper */
+	return (struct thread_info *)area->addr;
+out_remove_area:
+	remove_vm_area(area);
+out_free_pages:
+	do {
+		__free_page(pages[--i]);
+	} while (i >= 0);
+	return NULL;
+}
+
+static void work_free_thread_info(struct work_struct *work)
+{
+	int i;
+	void *p = work;
+
+	for (i = 0; i < THREAD_SIZE/PAGE_SIZE; ++i)
+		__free_page(vmalloc_to_page(p + PAGE_SIZE*i));
+	vfree(p);
+}
+
+void free_thread_info(struct thread_info *info)
+{
+	struct work_struct *work = (struct work_struct *)info;
+
+	INIT_WORK(work, work_free_thread_info);
+	schedule_work(work);
+}
+#endif
+
 /*
  * This gets run with %ebx containing the
  * function to call, and %edx containing
Index: stack-paranoia/include/asm-i386/module.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/module.h	2007-05-01 10:18:50.998173802 -0700
+++ stack-paranoia/include/asm-i386/module.h	2007-05-01 10:19:47.145373449 -0700
@@ -68,6 +68,13 @@
 #define MODULE_STACKSIZE ""
 #endif
 
-#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY MODULE_STACKSIZE
+#ifdef CONFIG_DEBUG_STACK
+#define MODULE_DEBUG_STACK "DEBUG_STACKS "
+#else
+#define MODULE_DEBUG_STACK ""
+#endif
+
+#define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY MODULE_STACKSIZE \
+		MODULE_DEBUG_STACK
 
 #endif /* _ASM_I386_MODULE_H */
Index: stack-paranoia/include/asm-i386/thread_info.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/thread_info.h	2007-05-01 10:18:51.006174258 -0700
+++ stack-paranoia/include/asm-i386/thread_info.h	2007-05-01 10:19:47.149373677 -0700
@@ -94,6 +94,11 @@
 }
 
 /* thread information allocation */
+#ifdef CONFIG_DEBUG_STACK
+struct task_struct;
+struct thread_info *alloc_thread_info(struct task_struct *);
+void free_thread_info(struct thread_info *);
+#else /* !CONFIG_DEBUG_STACK */
 #ifdef CONFIG_DEBUG_STACK_USAGE
 #define alloc_thread_info(tsk) kzalloc(THREAD_SIZE, GFP_KERNEL)
 #else
@@ -101,6 +106,7 @@
 #endif
 
 #define free_thread_info(info)	kfree(info)
+#endif /* !CONFIG_DEBUG_STACK */
 
 #else /* !__ASSEMBLY__ */
 
Index: stack-paranoia/arch/i386/kernel/doublefault.c
===================================================================
--- stack-paranoia.orig/arch/i386/kernel/doublefault.c	2007-05-01 10:18:50.962171751 -0700
+++ stack-paranoia/arch/i386/kernel/doublefault.c	2007-05-01 10:19:47.149373677 -0700
@@ -62,5 +62,5 @@
 	.ss		= __KERNEL_DS,
 	.ds		= __USER_DS,
 
-	.__cr3		= __pa(swapper_pg_dir)
+	.__cr3		= (unsigned long)swapper_pg_dir - PAGE_OFFSET,
 };
Index: stack-paranoia/arch/i386/kernel/irq.c
===================================================================
--- stack-paranoia.orig/arch/i386/kernel/irq.c	2007-05-01 10:19:43.941190853 -0700
+++ stack-paranoia/arch/i386/kernel/irq.c	2007-05-01 10:20:41.160451593 -0700
@@ -18,7 +18,7 @@
 #include <linux/cpu.h>
 #include <linux/delay.h>
 #include <linux/bootmem.h>
-
+#include <linux/mm.h>
 #include <asm/apic.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -145,6 +145,60 @@
 static DEFINE_PER_CPU(char *, softirq_stack);
 static DEFINE_PER_CPU(char *, hardirq_stack);
 
+#ifdef CONFIG_DEBUG_STACK
+static void * __init irq_remap_stack(void *stack)
+{
+	int i;
+	struct page *pages[THREAD_SIZE/PAGE_SIZE];
+
+	for (i = 0; i < ARRAY_SIZE(pages); ++i)
+		pages[i] =  virt_to_page(stack + PAGE_SIZE*i);
+	return vmap(pages, THREAD_SIZE/PAGE_SIZE, VM_IOREMAP, PAGE_KERNEL);
+}
+
+static int __init irq_guard_cpu0(void)
+{
+	unsigned long flags;
+	void *tmp;
+
+	tmp = irq_remap_stack(per_cpu(softirq_stack, 0));
+	if (!tmp)
+		return -ENOMEM;
+	else {
+		local_irq_save(flags);
+		per_cpu(softirq_stack, 0) = tmp;
+		local_irq_restore(flags);
+	}
+	tmp = irq_remap_stack(per_cpu(hardirq_stack, 0));
+	if (!tmp)
+		return -ENOMEM;
+	else {
+		local_irq_save(flags);
+		per_cpu(hardirq_stack, 0) = tmp;
+		local_irq_restore(flags);
+	}
+	return 0;
+}
+core_initcall(irq_guard_cpu0);
+
+static void * __init __alloc_irqstack(int cpu)
+{
+	int i;
+	struct page *pages[THREAD_SIZE/PAGE_SIZE], **tmp = pages;
+	struct vm_struct *area;
+
+	if (!slab_is_available())
+		return __alloc_bootmem(THREAD_SIZE, THREAD_SIZE,
+						__pa(MAX_DMA_ADDRESS));
+
+	/* failures here are unrecoverable anyway */
+	area = get_vm_area(THREAD_SIZE, VM_IOREMAP);
+	for (i = 0; i < ARRAY_SIZE(pages); ++i)
+		pages[i] = alloc_page(GFP_HIGHUSER);
+	map_vm_area(area, PAGE_KERNEL, &tmp);
+	return area->addr;
+}
+#else /* !CONFIG_DEBUG_STACK */
 static void * __init __alloc_irqstack(int cpu)
 {
 	if (!slab_is_available())
@@ -154,6 +208,7 @@
 	return (void *)__get_free_pages(GFP_KERNEL,
 					ilog2(THREAD_SIZE/PAGE_SIZE));
 }
+#endif /* !CONFIG_DEBUG_STACK */
 
 static void __init alloc_irqstacks(int cpu)
 {
Index: stack-paranoia/arch/i386/mm/pgtable.c
===================================================================
--- stack-paranoia.orig/arch/i386/mm/pgtable.c	2007-05-01 10:18:50.986173118 -0700
+++ stack-paranoia/arch/i386/mm/pgtable.c	2007-05-02 15:45:13.877793914 -0700
@@ -181,6 +181,18 @@
 #endif
 }
 
+#ifdef CONFIG_DEBUG_STACK
+unsigned long __kvaddr_to_paddr(unsigned long kvaddr)
+{
+	if (high_memory)
+		BUG_ON(kvaddr >= VMALLOC_START);
+	else
+		BUG_ON(kvaddr >= (unsigned long)__va(MAXMEM));
+	return kvaddr - PAGE_OFFSET;
+}
+EXPORT_SYMBOL(__kvaddr_to_paddr);
+#endif /* CONFIG_DEBUG_STACK */
+
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
 	return (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
Index: stack-paranoia/include/asm-i386/page.h
===================================================================
--- stack-paranoia.orig/include/asm-i386/page.h	2007-05-01 10:18:51.022175170 -0700
+++ stack-paranoia/include/asm-i386/page.h	2007-05-01 10:19:47.149373677 -0700
@@ -118,11 +118,17 @@
 #define __PAGE_OFFSET		((unsigned long)CONFIG_PAGE_OFFSET)
 #endif
 
-
 #define PAGE_OFFSET		((unsigned long)__PAGE_OFFSET)
+
+#if defined(CONFIG_DEBUG_STACK) && !defined(__ASSEMBLY__)
+unsigned long __kvaddr_to_paddr(unsigned long);
+#define __pa(x)			__kvaddr_to_paddr((unsigned long)(x))
+#else /* !CONFIG_DEBUG_STACK */
+#define __pa(x)			((unsigned long)(x)-PAGE_OFFSET)
+#endif /* !CONFIG_DEBUG_STACK */
+
 #define VMALLOC_RESERVE		((unsigned long)__VMALLOC_RESERVE)
 #define MAXMEM			(-__PAGE_OFFSET-__VMALLOC_RESERVE)
-#define __pa(x)			((unsigned long)(x)-PAGE_OFFSET)
 /* __pa_symbol should be used for C visible symbols.
    This seems to be the official gcc blessed way to do such arithmetic. */
 #define __pa_symbol(x)          __pa(RELOC_HIDE((unsigned long)(x),0))

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-04-30 17:43               ` [2/6] add config option to vmalloc stacks " William Lee Irwin III
  2007-04-30 18:11                 ` Christoph Hellwig
@ 2007-05-04  5:35                 ` Joseph Fannin
  2007-05-04  7:43                   ` Bill Irwin
  1 sibling, 1 reply; 51+ messages in thread
From: Joseph Fannin @ 2007-05-04  5:35 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel

On Mon, Apr 30, 2007 at 10:43:10AM -0700, William Lee Irwin III wrote:

> +	  Allocates the stack physically discontiguously and from high
> +	  memory. Furthermore an unmapped guard page follows the stack.
> +	  This is not for end-users. It's intended to trigger fatal
> +	  system errors under various forms of stack abuse.

    Why is this not for end-users?  Will it not trigger anything
useful unless set up properly, or is a big performace hit -- and how,
or what?

    All the kernel debug options are underdocumented this way -- I'd
like to have as many of them on as I can without absolutely killing
performance, (or rather, *you* would) -- but I can never tell without
grovelling all over for the info, which... well, I haven't done it
yet, anyway.

    "End-user" is just insufficently defined for anyone compiling
their own kernel.  Could you add a bit more text here describing what
the effect of physically discontiguous high-memory stacks is?  An
additional frobnitz dereference on every badda-bing badda-bang, likely
to double the time it takes to dance the hokey pokey?

   *shrug*  Some of those debug options probably don't get set very
often on kernels that are run for more than to see if it boots.

--
Joseph Fannin
jfannin@gmail.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [2/6] add config option to vmalloc stacks (was: Re: [-mm patch] i386: enable 4k stacks by default)
  2007-05-04  5:35                 ` Joseph Fannin
@ 2007-05-04  7:43                   ` Bill Irwin
  0 siblings, 0 replies; 51+ messages in thread
From: Bill Irwin @ 2007-05-04  7:43 UTC (permalink / raw)
  To: Andi Kleen, Christoph Hellwig, Alan Cox, David Chinner, Zan Lynx,
	Adrian Bunk, Linux Kernel, wli

On Mon, Apr 30, 2007 at 10:43:10AM -0700, William Lee Irwin III wrote:
>> +	  Allocates the stack physically discontiguously and from high
>> +	  memory. Furthermore an unmapped guard page follows the stack.
>> +	  This is not for end-users. It's intended to trigger fatal
>> +	  system errors under various forms of stack abuse.

On Fri, May 04, 2007 at 01:35:30AM -0400, Joseph Fannin wrote:
>     Why is this not for end-users?  Will it not trigger anything
> useful unless set up properly, or is a big performace hit -- and how,
> or what?
>     All the kernel debug options are underdocumented this way -- I'd
> like to have as many of them on as I can without absolutely killing
> performance, (or rather, *you* would) -- but I can never tell without
> grovelling all over for the info, which... well, I haven't done it
> yet, anyway.

There aren't many effective sideband methods to document things. If I
knew of an "expanded help" thing people could look at in Kconfig, I'd
write storybook documentation and put it there as I'm wont to do.


On Fri, May 04, 2007 at 01:35:30AM -0400, Joseph Fannin wrote:
>     "End-user" is just insufficently defined for anyone compiling
> their own kernel.  Could you add a bit more text here describing what
> the effect of physically discontiguous high-memory stacks is?  An
> additional frobnitz dereference on every badda-bing badda-bang, likely
> to double the time it takes to dance the hokey pokey?
>    *shrug*  Some of those debug options probably don't get set very
> often on kernels that are run for more than to see if it boots.

It's short for "whoever doesn't understand the terse jargon" as I'm
using it. The assumption here is that it's essentially for kernel
hackers who know all about kernel internals up-front anyway.

The option really is not to be trifled with. Maybe I could work with the
kerneldoc developers to arrange for outlets for more verbose
documentation for options (actually I'd like similar for API functions
as well; I'd like to write IRIX-style roman fleuve manpages for things).
There is a slight danger, though, that the documentation may get out of
synch. For now, I just have nowhere appropriate to put it.


-- wli

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2007-05-04  7:47 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-28 19:19 [-mm patch] i386: enable 4k stacks by default Adrian Bunk
2007-04-28 21:18 ` Zan Lynx
2007-04-30  3:58   ` David Chinner
2007-04-30  8:17     ` Alan Cox
2007-04-30 10:26       ` Andi Kleen
2007-04-30 10:48         ` Christoph Hellwig
2007-04-30 12:13           ` Andi Kleen
2007-04-30 17:38             ` William Lee Irwin III
2007-04-30 17:40               ` [1/6] make stack size configurable (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
2007-04-30 18:10                 ` Christoph Hellwig
2007-04-30 18:13                   ` William Lee Irwin III
2007-04-30 18:25                 ` Adrian Bunk
2007-04-30 18:32                   ` William Lee Irwin III
2007-04-30 17:43               ` [2/6] add config option to vmalloc stacks " William Lee Irwin III
2007-04-30 18:11                 ` Christoph Hellwig
2007-04-30 18:25                   ` Jan Engelhardt
2007-04-30 19:09                   ` William Lee Irwin III
2007-04-30 19:15                     ` Christoph Hellwig
2007-04-30 19:23                       ` Bill Irwin
2007-04-30 22:04                       ` Bill Irwin
2007-05-01 22:36                       ` Matt Mackall
2007-05-01 22:51                         ` Bill Irwin
2007-05-01 23:07                           ` Alan Cox
2007-05-01 23:23                             ` Bill Irwin
2007-05-01 23:15                           ` Matt Mackall
2007-05-01 23:27                             ` Bill Irwin
2007-05-04  5:35                 ` Joseph Fannin
2007-05-04  7:43                   ` Bill Irwin
2007-04-30 17:44               ` [3/6] make IRQ stacks independently configurable " William Lee Irwin III
2007-04-30 18:11                 ` Christoph Hellwig
2007-04-30 18:14                   ` William Lee Irwin III
2007-04-30 17:45               ` [4/6] go BUG on vmallocspace in __pa() " William Lee Irwin III
2007-04-30 18:52                 ` Andi Kleen
2007-04-30 18:58                   ` William Lee Irwin III
2007-04-30 19:20                   ` Alan Cox
2007-04-30 19:26                     ` Bill Irwin
2007-05-02 22:31                 ` [4/6] go BUG on vmallocspace in __pa() Jeremy Fitzhardinge
2007-05-02 22:48                   ` Bill Irwin
2007-04-30 17:46               ` [5/6] dynamically allocate IRQ stacks (was: Re: [-mm patch] i386: enable 4k stacks by default) William Lee Irwin III
2007-04-30 19:49                 ` Zwane Mwaikambo
2007-04-30 20:03                   ` Bill Irwin
2007-04-30 20:07                   ` Andi Kleen
2007-04-30 17:47               ` [6/6] arrange for a guard page on cpu 0's IRQ stack " William Lee Irwin III
2007-04-30 18:22               ` [-mm patch] i386: enable 4k stacks by default Jan Engelhardt
2007-04-30 18:35                 ` William Lee Irwin III
2007-04-30 18:51               ` Andi Kleen
2007-04-30  8:55   ` Neil Brown
2007-04-30  8:59     ` Christoph Hellwig
2007-04-30 11:30       ` Jens Axboe
2007-04-30 23:24         ` Neil Brown
2007-05-01  8:01           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).