LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 2/4] x86_64: make early_node_mem return align address v2
       [not found] <200801291113.35974.yinghai.lu@sun.com>
@ 2008-01-29 19:14 ` Yinghai Lu
  2008-01-30  2:39   ` Yinghai Lu
  2008-01-30  3:28   ` [PATCH 2/2] x86_64: make bootmap_start page align v3 Yinghai Lu
  2008-01-29 19:15 ` [PATCH 3/4] x86_64: Use early reservation for early node data Yinghai Lu
  2008-01-29 19:16 ` [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap Yinghai Lu
  2 siblings, 2 replies; 23+ messages in thread
From: Yinghai Lu @ 2008-01-29 19:14 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel

[PATCH 2/4] x86_64: make early_node_mem return align address v2

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>]  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60  EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS:  0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack:  ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
 [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
 [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12
 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
 [<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
 RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008a485 - 0000000000091484]
  bootmap [0000000000d406f4 -  0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100

the patch update early_node_mem to make sure we can extra range
for alignment.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -169,8 +169,14 @@ static void * __init early_node_mem(int 
 	unsigned long mem = find_e820_area(start, end, size);
 	void *ptr;
 
-	if (mem != -1L)
+	if (mem != -1L) {
+		/*
+		 * make sure we NODE_DATA, or bootmap is not overlapped
+		 * with bss section
+		 */
+		mem = round_up(mem, PAGE_SIZE);
 		return __va(mem);
+	}
 	ptr = __alloc_bootmem_nopanic(size,
 				SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS));
 	if (ptr == NULL) {


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/4] x86_64: Use early reservation for early node data
       [not found] <200801291113.35974.yinghai.lu@sun.com>
  2008-01-29 19:14 ` [PATCH 2/4] x86_64: make early_node_mem return align address v2 Yinghai Lu
@ 2008-01-29 19:15 ` Yinghai Lu
  2008-01-29 19:16 ` [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap Yinghai Lu
  2 siblings, 0 replies; 23+ messages in thread
From: Yinghai Lu @ 2008-01-29 19:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel

From: Andi Kleen <ak@suse.de>

[PATCH 3/4] x86_64: Use early reservation for early node data

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

===================================================================
--- linux.orig/arch/x86/mm/numa_64.c
+++ linux/arch/x86/mm/numa_64.c
@@ -175,6 +175,7 @@
 		 * with bss section
 		 */
 		mem = round_up(mem, PAGE_SIZE);
+		reserve_early(mem, mem + size);
 		return __va(mem);
 	}
 	ptr = __alloc_bootmem_nopanic(size,

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap
       [not found] <200801291113.35974.yinghai.lu@sun.com>
  2008-01-29 19:14 ` [PATCH 2/4] x86_64: make early_node_mem return align address v2 Yinghai Lu
  2008-01-29 19:15 ` [PATCH 3/4] x86_64: Use early reservation for early node data Yinghai Lu
@ 2008-01-29 19:16 ` Yinghai Lu
  2008-01-30  2:57   ` Andi Kleen
  2 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2008-01-29 19:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel

[PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap

otherise early_node_mem will use up these for 8 nodes system

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

diff --git a/arch/x86/kernel/e820_64.c b/arch/x86/kernel/e820_64.c
index f8b7beb..e3d3815 100644
--- a/arch/x86/kernel/e820_64.c
+++ b/arch/x86/kernel/e820_64.c
@@ -50,7 +50,7 @@ static unsigned long __initdata end_user_pfn = MAXMEM>>PAGE_SHIFT;
 /*
  * Early reserved memory areas.
  */
-#define MAX_EARLY_RES 20
+#define MAX_EARLY_RES 30
 
 struct early_res {
 	unsigned long start, end;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/4] x86_64: make early_node_mem return align address v2
  2008-01-29 19:14 ` [PATCH 2/4] x86_64: make early_node_mem return align address v2 Yinghai Lu
@ 2008-01-30  2:39   ` Yinghai Lu
  2008-01-30  3:28   ` [PATCH 2/2] x86_64: make bootmap_start page align v3 Yinghai Lu
  1 sibling, 0 replies; 23+ messages in thread
From: Yinghai Lu @ 2008-01-30  2:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel

On Tuesday 29 January 2008 11:14:48 am Yinghai Lu wrote:
> [PATCH 2/4] x86_64: make early_node_mem return align address v2
> 
> boot oops when system get 64g or 128 installed
> 

can you apply this updated version with others?

setup_node_mem should return with PAGE_ALIGN.

in setup_node_bootmem, it need bootmap_start to be PAGE_ALIGN, without this patch it will overlap with bss.

YH

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap
  2008-01-29 19:16 ` [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap Yinghai Lu
@ 2008-01-30  2:57   ` Andi Kleen
  2008-01-30  3:25     ` Yinghai Lu
  0 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-01-30  2:57 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Ingo Molnar, Christoph Lameter, Andrew Morton, linux-kernel

On Tuesday 29 January 2008 20:16, Yinghai Lu wrote:
> [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap
>
> otherise early_node_mem will use up these for 8 nodes system

Yes this was the problem with my early_reserve node bootmem patch.
It adds a node limit.

But even with increasing the limit is far too small. Probably best to not 
use the patch. In theory it should not have been needed anyways because
there is no need to reserve here because there are no interfering users.

Whatever your problem is it needs to be solved differently.

-Andi



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap
  2008-01-30  2:57   ` Andi Kleen
@ 2008-01-30  3:25     ` Yinghai Lu
  2008-01-31 13:24       ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2008-01-30  3:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, Christoph Lameter, Andrew Morton, linux-kernel

On Tuesday 29 January 2008 06:57:54 pm Andi Kleen wrote:
> On Tuesday 29 January 2008 20:16, Yinghai Lu wrote:
> > [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap
> >
> > otherise early_node_mem will use up these for 8 nodes system
> 
> Yes this was the problem with my early_reserve node bootmem patch.
> It adds a node limit.
> 
> But even with increasing the limit is far too small. Probably best to not 
> use the patch. In theory it should not have been needed anyways because
> there is no need to reserve here because there are no interfering users.
> 
> Whatever your problem is it needs to be solved differently.

ok, discard 3, and 4.

how about 2 v2?

YH

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/2] x86_64: make bootmap_start page align v3
  2008-01-29 19:14 ` [PATCH 2/4] x86_64: make early_node_mem return align address v2 Yinghai Lu
  2008-01-30  2:39   ` Yinghai Lu
@ 2008-01-30  3:28   ` Yinghai Lu
  2008-01-30 18:51     ` PATCH] x86_64: make bootmap_start page align v4 Yinghai Lu
  1 sibling, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2008-01-30  3:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel

[PATCH 2/2] x86_64: make bootmap_start page align v3

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>]  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60  EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS:  0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack:  ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
 [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
 [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12
 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
 [<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
 RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008a485 - 0000000000091484]
  bootmap [0000000000d406f4 -  0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100

the patch update bootmap_start to page_align to make sure we can extra range
for alignment.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -224,6 +224,9 @@ void __init setup_node_bootmem(int nodei
 	}
 	bootmap_start = __pa(bootmap);
 
+	/* make sure bootmap is not overlapped with bss section */
+	bootmap_start = round_up(bootmap_start, PAGE_SIZE);
+
 	bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
 					 bootmap_start >> PAGE_SHIFT,
 					 start_pfn, end_pfn);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* PATCH] x86_64: make bootmap_start page align v4
  2008-01-30  3:28   ` [PATCH 2/2] x86_64: make bootmap_start page align v3 Yinghai Lu
@ 2008-01-30 18:51     ` Yinghai Lu
  2008-01-30 19:08       ` Andi Kleen
  2008-01-31 13:07       ` Ingo Molnar
  0 siblings, 2 replies; 23+ messages in thread
From: Yinghai Lu @ 2008-01-30 18:51 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel

[PATCH] x86_64: make bootmap_start page align v3

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>]  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60  EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS:  0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack:  ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
 [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
 [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12
 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
 [<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
 RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008a485 - 0000000000091484]
  bootmap [0000000000d406f4 -  0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100

the patch update bootmap_start to page_align to make sure we can extra range
for alignment.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -224,6 +224,14 @@ void __init setup_node_bootmem(int nodei
 	}
 	bootmap_start = __pa(bootmap);
 
+	/*
+	 * when you have 64g or 128g ram, bootmap will be pushed after bss
+	 * section, the bootmap we get from early_node_mem via find_e820_area
+	 * is not page aligned, we need to round it up to  make sure bootmap
+	 * is not overlapped with bss section
+	 */
+	bootmap_start = round_up(bootmap_start, PAGE_SIZE);
+
 	bootmap_size = init_bootmem_node(NODE_DATA(nodeid),
 					 bootmap_start >> PAGE_SHIFT,
 					 start_pfn, end_pfn);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH] x86_64: make bootmap_start page align v4
  2008-01-30 18:51     ` PATCH] x86_64: make bootmap_start page align v4 Yinghai Lu
@ 2008-01-30 19:08       ` Andi Kleen
  2008-01-30 20:15         ` Yinghai Lu
  2008-01-31 13:07       ` Ingo Molnar
  1 sibling, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-01-30 19:08 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Ingo Molnar, Christoph Lameter, Andrew Morton, linux-kernel


> +	/*
> +	 * when you have 64g or 128g ram, bootmap will be pushed after bss
> +	 * section, the bootmap we get from early_node_mem via find_e820_area
> +	 * is not page aligned, we need to round it up to  make sure bootmap
> +	 * is not overlapped with bss section
> +	 */
> +	bootmap_start = round_up(bootmap_start, PAGE_SIZE);

The better solution would be to PAGE_ALIGN() the addresses
in bad_addr(). Or better fix it that no such alignment is needed to not 
get conflicts. 

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH] x86_64: make bootmap_start page align v4
  2008-01-30 19:08       ` Andi Kleen
@ 2008-01-30 20:15         ` Yinghai Lu
  2008-01-30 23:23           ` [PATCH] x86_64: make bootmap_start page align v5 Yinghai Lu
  2008-01-31  3:02           ` PATCH] x86_64: make bootmap_start page align v4 Andi Kleen
  0 siblings, 2 replies; 23+ messages in thread
From: Yinghai Lu @ 2008-01-30 20:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ingo Molnar, Christoph Lameter, Andrew Morton, linux-kernel

On Wednesday 30 January 2008 11:08:32 am Andi Kleen wrote:
> 
> > +	/*
> > +	 * when you have 64g or 128g ram, bootmap will be pushed after bss
> > +	 * section, the bootmap we get from early_node_mem via find_e820_area
> > +	 * is not page aligned, we need to round it up to  make sure bootmap
> > +	 * is not overlapped with bss section
> > +	 */
> > +	bootmap_start = round_up(bootmap_start, PAGE_SIZE);
> 
> The better solution would be to PAGE_ALIGN() the addresses
> in bad_addr(). Or better fix it that no such alignment is needed to not 
> get conflicts. 

I don't think so, that is setup_node_bootmem's problem. 

it is supposed to round_up instead of just do PAGE_SHIFT...

other caller will just use the address instead change it page.

YH

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] x86_64: make bootmap_start page align v5
  2008-01-30 20:15         ` Yinghai Lu
@ 2008-01-30 23:23           ` Yinghai Lu
  2008-01-30 23:50             ` Yinghai Lu
  2008-01-31  3:02           ` PATCH] x86_64: make bootmap_start page align v4 Andi Kleen
  1 sibling, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2008-01-30 23:23 UTC (permalink / raw)
  To: Andi Kleen, Ingo Molnar; +Cc: Christoph Lameter, Andrew Morton, linux-kernel

[PATCH] x86_64: make bootmap_start page align v5

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>]  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60  EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS:  0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack:  ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
 [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
 [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12
 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
 [<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
 RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008a485 - 0000000000091484]
  bootmap [0000000000d406f4 -  0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100

need to round bootmap_start to page_align to make sure we can extra range
for alignment.

andi said it is better to fix in bad_addr, instead of early_node_mem or
setup_node_bootmem directly

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

Index: linux-2.6/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6/arch/x86/kernel/e820_64.c
@@ -100,7 +100,7 @@ again:
 	for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
 		struct early_res *r = &early_res[i];
 		if (last >= r->start && addr < r->end) {
-			*addrp = addr = r->end;
+			*addrp = addr = PAGE_ALIGN(r->end);
 			changed = 1;
 			goto again;
 		}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86_64: make bootmap_start page align v5
  2008-01-30 23:23           ` [PATCH] x86_64: make bootmap_start page align v5 Yinghai Lu
@ 2008-01-30 23:50             ` Yinghai Lu
  0 siblings, 0 replies; 23+ messages in thread
From: Yinghai Lu @ 2008-01-30 23:50 UTC (permalink / raw)
  To: Andi Kleen, Ingo Molnar; +Cc: Christoph Lameter, Andrew Morton, linux-kernel

On Wednesday 30 January 2008 03:23:35 pm Yinghai Lu wrote:
> [PATCH] x86_64: make bootmap_start page align v5

Ingo,

Please consider to apply
v5:
or
v4:  http://lkml.org/lkml/2008/1/30/377
it makes bootmap page alignment
or
v2: http://lkml.org/lkml/2008/1/29/349
it makes NODE_DATA and bootmap page alignement

just pick one.

YH

v5:
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008b000 - 0000000000091fff]
  bootmap [0000000000d9d000 -  0000000000ea1fff] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100
early res: 0 [0-1000) BIOS data page
early res: 1 [6000-8000) SMP_TRAMPOLINE
early res: 2 [200000-d9c274) TEXT DATA BSS
early res: 3 [7e6f4000-7fff3a26) RAMDISK
early res: 4 [99800-9d800) EBDA
early res: 5 [8000-8a000) PGTABLE
early res: 6 [8a000-8a88b) MEMNODEMAP

v4:
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008a88b - 000000000009188a]
  bootmap [0000000000d9d000 -  0000000000ea1fff] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100
early res: 0 [0-1000) BIOS data page
early res: 1 [6000-8000) SMP_TRAMPOLINE
early res: 2 [200000-d9c274) TEXT DATA BSS
early res: 3 [7e6f4000-7fff3a26) RAMDISK
early res: 4 [99800-9d800) EBDA
early res: 5 [8000-8a000) PGTABLE
early res: 6 [8a000-8a88b) MEMNODEMAP

v2:
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008b000 - 0000000000091fff]
  bootmap [0000000000d9d000 -  0000000000ea1fff] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100
early res: 0 [0-1000) BIOS data page
early res: 1 [6000-8000) SMP_TRAMPOLINE
early res: 2 [200000-d9c274) TEXT DATA BSS
early res: 3 [7e6f4000-7fff3a26) RAMDISK
early res: 4 [99800-9d800) EBDA
early res: 5 [8000-8a000) PGTABLE
early res: 6 [8a000-8a88b) MEMNODEMAP



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH] x86_64: make bootmap_start page align v4
  2008-01-30 20:15         ` Yinghai Lu
  2008-01-30 23:23           ` [PATCH] x86_64: make bootmap_start page align v5 Yinghai Lu
@ 2008-01-31  3:02           ` Andi Kleen
  2008-01-31  3:29             ` Yinghai Lu
  1 sibling, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-01-31  3:02 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Ingo Molnar, Christoph Lameter, Andrew Morton, linux-kernel


> I don't think so, that is setup_node_bootmem's problem.
>
> it is supposed to round_up instead of just do PAGE_SHIFT...
>
> other caller will just use the address instead change it page.

You're right. In this case it really is the best way to round up in the 
caller. I retract my earlier objection.

That said there is not really any true requirement for the bootmem
map to be page aligned (AFAIK) so an alternative might be to 
fix the bootmem interface to not take a PFN. But that would be a 
much more intrusive patch because you would need to fix 
other architectures too or add compat wrappers.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH] x86_64: make bootmap_start page align v4
  2008-01-31  3:02           ` PATCH] x86_64: make bootmap_start page align v4 Andi Kleen
@ 2008-01-31  3:29             ` Yinghai Lu
  0 siblings, 0 replies; 23+ messages in thread
From: Yinghai Lu @ 2008-01-31  3:29 UTC (permalink / raw)
  To: Andi Kleen, Ingo Molnar; +Cc: Christoph Lameter, Andrew Morton, linux-kernel

On Wednesday 30 January 2008 07:02:14 pm Andi Kleen wrote:
> 
> > I don't think so, that is setup_node_bootmem's problem.
> >
> > it is supposed to round_up instead of just do PAGE_SHIFT...
> >
> > other caller will just use the address instead change it page.
> 
> You're right. In this case it really is the best way to round up in the 
> caller. I retract my earlier objection.
> 
> That said there is not really any true requirement for the bootmem
> map to be page aligned (AFAIK) so an alternative might be to 
> fix the bootmem interface to not take a PFN. But that would be a 
> much more intrusive patch because you would need to fix 
> other architectures too or add compat wrappers.

thanks.

So we need to v4.

Ingo,

can you apply the v4?

YH

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: PATCH] x86_64: make bootmap_start page align v4
  2008-01-30 18:51     ` PATCH] x86_64: make bootmap_start page align v4 Yinghai Lu
  2008-01-30 19:08       ` Andi Kleen
@ 2008-01-31 13:07       ` Ingo Molnar
  1 sibling, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2008-01-31 13:07 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel


* Yinghai Lu <Yinghai.Lu@Sun.COM> wrote:

> [PATCH] x86_64: make bootmap_start page align v3
> 
> boot oops when system get 64g or 128 installed

> +++ linux-2.6/arch/x86/mm/numa_64.c
> @@ -224,6 +224,14 @@ void __init setup_node_bootmem(int nodei
>  	}
>  	bootmap_start = __pa(bootmap);
>  
> +	/*
> +	 * when you have 64g or 128g ram, bootmap will be pushed after bss
> +	 * section, the bootmap we get from early_node_mem via find_e820_area
> +	 * is not page aligned, we need to round it up to  make sure bootmap
> +	 * is not overlapped with bss section
> +	 */
> +	bootmap_start = round_up(bootmap_start, PAGE_SIZE);
> +

thanks, applied.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap
  2008-01-30  3:25     ` Yinghai Lu
@ 2008-01-31 13:24       ` Ingo Molnar
  2008-01-31 13:34         ` Andi Kleen
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2008-01-31 13:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel,
	Thomas Gleixner, H. Peter Anvin


* Yinghai Lu <Yinghai.Lu@Sun.COM> wrote:

> ok, discard 3, and 4.
> 
> how about 2 v2?

i'm leaning towards v4, but the more fundamental breakage is in the 
early_node_mem() ad-hoc allocator that got butchered into this code a 
year ago:

  commit a8062231d80239cf3405982858c02aea21a6066a
  Author: Andi Kleen <ak@suse.de>
  Date:   Fri Apr 7 19:49:21 2006 +0200

      [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory

  ...
  +static void * __init
  +early_node_mem(int nodeid, unsigned long start, unsigned long end,
  +             unsigned long size)

and we are now suffering the side-effects of that hack.

what i suspect we need instead is a proper early-allocator that works in 
the e820 space.

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap
  2008-01-31 13:24       ` Ingo Molnar
@ 2008-01-31 13:34         ` Andi Kleen
  2008-01-31 20:37           ` [PATCH] x86_64: add debug name for early_res Yinghai Lu
  0 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-01-31 13:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Yinghai Lu, Christoph Lameter, Andrew Morton, linux-kernel,
	Thomas Gleixner, H. Peter Anvin

On Thursday 31 January 2008 14:24:38 Ingo Molnar wrote:
> 
> * Yinghai Lu <Yinghai.Lu@Sun.COM> wrote:
> 
> > ok, discard 3, and 4.
> > 
> > how about 2 v2?
> 
> i'm leaning towards v4, but the more fundamental breakage is in the 
> early_node_mem() ad-hoc allocator that got butchered into this code a 
> year ago:

No it has nothing to do with early_node_mem which is just a thin
wrapper around find_e820_area() anyways.

I think the problem is that the page alignment in bad_addr() and friends is not 
always correct. e.g. the early_reserve for the kernel in head64.c really need to 
round up to pages. I suspect (not 100% sure yet that is the core of the problem) 

Note this was broken even before early reservation; the only difference
was that it was all hard coded in bad_addr() then.

There were various hacks around this in the past, but none fixed the problem 
completely.

>   commit a8062231d80239cf3405982858c02aea21a6066a
>   Author: Andi Kleen <ak@suse.de>
>   Date:   Fri Apr 7 19:49:21 2006 +0200
> 
>       [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory
> 
>   ...
>   +static void * __init
>   +early_node_mem(int nodeid, unsigned long start, unsigned long end,
>   +             unsigned long size)
> 
> and we are now suffering the side-effects of that hack.
> 
> what i suspect we need instead is a proper early-allocator that works in 
> the e820 space.

That is find_e820_area() or rather find_e820_area+early_reserve now.

I had this implemented as a shrink wrapped function earlier for lockdep too, 
but dropped the patch because there was a nasty ordering issue with the e820 
command line parsing that i could not easily resolve.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] x86_64: add debug name for early_res
  2008-01-31 13:34         ` Andi Kleen
@ 2008-01-31 20:37           ` Yinghai Lu
  2008-01-31 20:44             ` [PATCH] x86_64: make bootmap_start page align v6 Yinghai Lu
  0 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2008-01-31 20:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel,
	Thomas Gleixner, H. Peter Anvin

[PATCH] x86_64: add debug name for early_res

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

Index: linux-2.6/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6/arch/x86/kernel/e820_64.c
@@ -54,30 +54,33 @@ static unsigned long __initdata end_user
 
 struct early_res {
 	unsigned long start, end;
+	char name[16];
 };
 static struct early_res early_res[MAX_EARLY_RES] __initdata = {
-	{ 0, PAGE_SIZE },			/* BIOS data page */
+	{ 0, PAGE_SIZE, "BIOS data page" },			/* BIOS data page */
 #ifdef CONFIG_SMP
-	{ SMP_TRAMPOLINE_BASE, SMP_TRAMPOLINE_BASE + 2*PAGE_SIZE },
+	{ SMP_TRAMPOLINE_BASE, SMP_TRAMPOLINE_BASE + 2*PAGE_SIZE, "SMP_TRAMPOLINE" },
 #endif
 	{}
 };
 
-void __init reserve_early(unsigned long start, unsigned long end)
+void __init reserve_early(unsigned long start, unsigned long end, char *name)
 {
 	int i;
 	struct early_res *r;
 	for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
 		r = &early_res[i];
 		if (end > r->start && start < r->end)
-			panic("Overlapping early reservations %lx-%lx to %lx-%lx\n",
-			      start, end, r->start, r->end);
+			panic("Overlapping early reservations %lx-%lx %s to %lx-%lx %s\n",
+			      start, end - 1, name?name:"", r->start, r->end - 1, r->name);
 	}
 	if (i >= MAX_EARLY_RES)
 		panic("Too many early reservations");
 	r = &early_res[i];
 	r->start = start;
 	r->end = end;
+	if (name)
+		strncpy(r->name, name, sizeof(r->name) - 1);
 }
 
 void __init early_res_to_bootmem(void)
@@ -85,6 +88,8 @@ void __init early_res_to_bootmem(void)
 	int i;
 	for (i = 0; i < MAX_EARLY_RES && early_res[i].end; i++) {
 		struct early_res *r = &early_res[i];
+		printk(KERN_INFO "early res: %d [%lx-%lx] %s\n", i,
+			r->start, r->end - 1, r->name);
 		reserve_bootmem_generic(r->start, r->end - r->start);
 	}
 }
Index: linux-2.6/arch/x86/kernel/head64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/head64.c
+++ linux-2.6/arch/x86/kernel/head64.c
@@ -75,7 +75,7 @@ static __init void reserve_ebda(void)
 	if (ebda_size > 64*1024)
 		ebda_size = 64*1024;
 
-	reserve_early(ebda_addr, ebda_addr + ebda_size);
+	reserve_early(ebda_addr, ebda_addr + ebda_size, "EBDA");
 }
 
 void __init x86_64_start_kernel(char * real_mode_data)
@@ -105,14 +105,14 @@ void __init x86_64_start_kernel(char * r
 	pda_init(0);
 	copy_bootdata(__va(real_mode_data));
 
-	reserve_early(__pa_symbol(&_text), __pa_symbol(&_end));
+	reserve_early(__pa_symbol(&_text), __pa_symbol(&_end), "TEXT DATA BSS");
 
 	/* Reserve INITRD */
 	if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
 		unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
 		unsigned long ramdisk_size  = boot_params.hdr.ramdisk_size;
 		unsigned long ramdisk_end   = ramdisk_image + ramdisk_size;
-		reserve_early(ramdisk_image, ramdisk_end);
+		reserve_early(ramdisk_image, ramdisk_end, "RAMDISK");
 	}
 
 	reserve_ebda();
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -420,7 +420,7 @@ void __init_refok init_memory_mapping(un
 		mmu_cr4_features = read_cr4();
 	__flush_tlb_all();
 
-	reserve_early(table_start << PAGE_SHIFT, table_end << PAGE_SHIFT);
+	reserve_early(table_start << PAGE_SHIFT, table_end << PAGE_SHIFT, "PGTABLE");
 }
 
 #ifndef CONFIG_NUMA
Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -103,7 +103,7 @@ static int __init allocate_cachealigned_
 	}
 	pad_addr = (nodemap_addr + pad) & ~pad;
 	memnodemap = phys_to_virt(pad_addr);
-	reserve_early(nodemap_addr, nodemap_addr + nodemap_size);
+	reserve_early(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");
 
 	printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
 	       nodemap_addr, nodemap_addr + nodemap_size);
Index: linux-2.6/include/asm-x86/e820_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/e820_64.h
+++ linux-2.6/include/asm-x86/e820_64.h
@@ -41,7 +41,7 @@ extern void finish_e820_parsing(void);
 extern struct e820map e820;
 extern void update_e820(void);
 
-extern void reserve_early(unsigned long start, unsigned long end);
+extern void reserve_early(unsigned long start, unsigned long end, char *name);
 extern void early_res_to_bootmem(void);
 
 #endif/*!__ASSEMBLY__*/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] x86_64: make bootmap_start page align v6
  2008-01-31 20:37           ` [PATCH] x86_64: add debug name for early_res Yinghai Lu
@ 2008-01-31 20:44             ` Yinghai Lu
  2008-01-31 21:05               ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2008-01-31 20:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel,
	Thomas Gleixner, H. Peter Anvin

[PATCH] x86_64: make bootmap_start page align v6

need to apply after x86_64: add debug name for early_res

boot oops when system get 64g or 128 installed

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>]  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60  EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS:  0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack:  ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
 [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
 [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12
 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
 [<ffffffff8020ccee>] ? child_rip+0x0/0x12


Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
 RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss is corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008a485 - 0000000000091484]
  bootmap [0000000000d406f4 -  0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100

actually, setup_node_bootmem hope to make NODE_DATA to be aligned,
and bootmap will after that in PAGE.

the patch update find_e820_area to make sure we can address with for alignment.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

Index: linux-2.6/arch/x86/kernel/e820_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/e820_64.c
+++ linux-2.6/arch/x86/kernel/e820_64.c
@@ -171,12 +171,13 @@ int __init e820_all_mapped(unsigned long
 }
 
 /*
- * Find a free area in a specific range.
+ * Find a free area with specified alignment in a specific range.
  */
 unsigned long __init find_e820_area(unsigned long start, unsigned long end,
-				    unsigned size)
+				    unsigned size, unsigned long align)
 {
 	int i;
+	unsigned long mask = ~(align - 1);
 
 	for (i = 0; i < e820.nr_map; i++) {
 		struct e820entry *ei = &e820.map[i];
@@ -190,7 +191,8 @@ unsigned long __init find_e820_area(unsi
 			continue;
 		while (bad_addr(&addr, size) && addr+size <= ei->addr+ei->size)
 			;
-		last = PAGE_ALIGN(addr) + size;
+		addr = (addr + align - 1) & mask;
+		last = addr + size;
 		if (last > ei->addr + ei->size)
 			continue;
 		if (last > end)
Index: linux-2.6/arch/x86/kernel/setup_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup_64.c
+++ linux-2.6/arch/x86/kernel/setup_64.c
@@ -182,7 +182,8 @@ contig_initmem_init(unsigned long start_
 	unsigned long bootmap_size, bootmap;
 
 	bootmap_size = bootmem_bootmap_pages(end_pfn)<<PAGE_SHIFT;
-	bootmap = find_e820_area(0, end_pfn<<PAGE_SHIFT, bootmap_size);
+	bootmap = find_e820_area(0, end_pfn<<PAGE_SHIFT, bootmap_size,
+				 PAGE_SIZE);
 	if (bootmap == -1L)
 		panic("Cannot find bootmem map of size %ld\n", bootmap_size);
 	bootmap_size = init_bootmem(bootmap >> PAGE_SHIFT, end_pfn);
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -354,17 +354,10 @@ static void __init find_early_table_spac
 	 * need roughly 0.5KB per GB.
 	 */
 	start = 0x8000;
-	table_start = find_e820_area(start, end, tables);
+	table_start = find_e820_area(start, end, tables, PAGE_SIZE);
 	if (table_start == -1UL)
 		panic("Cannot find space for the kernel page tables");
 
-	/*
-	 * When you have a lot of RAM like 256GB, early_table will not fit
-	 * into 0x8000 range, find_e820_area() will find area after kernel
-	 * bss but the table_start is not page aligned, so need to round it
-	 * up to avoid overlap with bss:
-	 */
-	table_start = round_up(table_start, PAGE_SIZE);
 	table_start >>= PAGE_SHIFT;
 	table_end = table_start;
 
@@ -420,7 +413,9 @@ void __init_refok init_memory_mapping(un
 		mmu_cr4_features = read_cr4();
 	__flush_tlb_all();
 
-	reserve_early(table_start << PAGE_SHIFT, table_end << PAGE_SHIFT, "PGTABLE");
+	if (!after_bootmem)
+		reserve_early(table_start << PAGE_SHIFT,
+				 table_end << PAGE_SHIFT, "PGTABLE");
 }
 
 #ifndef CONFIG_NUMA
Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -84,25 +84,23 @@ static int __init populate_memnodemap(co
 
 static int __init allocate_cachealigned_memnodemap(void)
 {
-	unsigned long pad, pad_addr;
+	unsigned long addr;
 
 	memnodemap = memnode.embedded_map;
 	if (memnodemapsize <= ARRAY_SIZE(memnode.embedded_map))
 		return 0;
 
-	pad = L1_CACHE_BYTES - 1;
-	pad_addr = 0x8000;
-	nodemap_size = pad + sizeof(s16) * memnodemapsize;
-	nodemap_addr = find_e820_area(pad_addr, end_pfn<<PAGE_SHIFT,
-				      nodemap_size);
+	addr = 0x8000;
+	nodemap_size = round_up(sizeof(s16) * memnodemapsize, L1_CACHE_BYTES);
+	nodemap_addr = find_e820_area(addr, end_pfn<<PAGE_SHIFT,
+				      nodemap_size, L1_CACHE_BYTES);
 	if (nodemap_addr == -1UL) {
 		printk(KERN_ERR
 		       "NUMA: Unable to allocate Memory to Node hash map\n");
 		nodemap_addr = nodemap_size = 0;
 		return -1;
 	}
-	pad_addr = (nodemap_addr + pad) & ~pad;
-	memnodemap = phys_to_virt(pad_addr);
+	memnodemap = phys_to_virt(nodemap_addr);
 	reserve_early(nodemap_addr, nodemap_addr + nodemap_size, "MEMNODEMAP");
 
 	printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
@@ -164,15 +162,17 @@ int early_pfn_to_nid(unsigned long pfn)
 }
 
 static void * __init early_node_mem(int nodeid, unsigned long start,
-				    unsigned long end, unsigned long size)
+				    unsigned long end, unsigned long size,
+				    unsigned long align)
 {
-	unsigned long mem = find_e820_area(start, end, size);
+	unsigned long mem = find_e820_area(start, end, size, align);
 	void *ptr;
 
-	if (mem != -1L)
+	if (mem != -1L) {
+		mem = round_up(mem, align);
 		return __va(mem);
-	ptr = __alloc_bootmem_nopanic(size,
-				SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS));
+	}
+	ptr = __alloc_bootmem_nopanic(size, align, __pa(MAX_DMA_ADDRESS));
 	if (ptr == NULL) {
 		printk(KERN_ERR "Cannot find %lu bytes in node %d\n",
 		       size, nodeid);
@@ -198,7 +198,8 @@ void __init setup_node_bootmem(int nodei
 	start_pfn = start >> PAGE_SHIFT;
 	end_pfn = end >> PAGE_SHIFT;
 
-	node_data[nodeid] = early_node_mem(nodeid, start, end, pgdat_size);
+	node_data[nodeid] = early_node_mem(nodeid, start, end, pgdat_size,
+					   SMP_CACHE_BYTES);
 	if (node_data[nodeid] == NULL)
 		return;
 	nodedata_phys = __pa(node_data[nodeid]);
@@ -213,8 +214,12 @@ void __init setup_node_bootmem(int nodei
 	/* Find a place for the bootmem map */
 	bootmap_pages = bootmem_bootmap_pages(end_pfn - start_pfn);
 	bootmap_start = round_up(nodedata_phys + pgdat_size, PAGE_SIZE);
+	/*
+	 * SMP_CAHCE_BYTES could be enough, but init_bootmem_node like
+	 * to use that to align to PAGE_SIZE
+	 */
 	bootmap = early_node_mem(nodeid, bootmap_start, end,
-					bootmap_pages<<PAGE_SHIFT);
+				 bootmap_pages<<PAGE_SHIFT, PAGE_SIZE);
 	if (bootmap == NULL)  {
 		if (nodedata_phys < start || nodedata_phys >= end)
 			free_bootmem((unsigned long)node_data[nodeid],
Index: linux-2.6/include/asm-x86/e820_64.h
===================================================================
--- linux-2.6.orig/include/asm-x86/e820_64.h
+++ linux-2.6/include/asm-x86/e820_64.h
@@ -15,7 +15,7 @@
 
 #ifndef __ASSEMBLY__
 extern unsigned long find_e820_area(unsigned long start, unsigned long end, 
-				    unsigned size);
+				    unsigned size, unsigned long align);
 extern void add_memory_region(unsigned long start, unsigned long size, 
 			      int type);
 extern void setup_memory_region(void);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86_64: make bootmap_start page align v6
  2008-01-31 20:44             ` [PATCH] x86_64: make bootmap_start page align v6 Yinghai Lu
@ 2008-01-31 21:05               ` Ingo Molnar
  2008-01-31 21:30                 ` Yinghai Lu
  2008-01-31 22:55                 ` [PATCH] x86_64: remove unneeded round_up Yinghai Lu
  0 siblings, 2 replies; 23+ messages in thread
From: Ingo Molnar @ 2008-01-31 21:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel,
	Thomas Gleixner, H. Peter Anvin


* Yinghai Lu <Yinghai.Lu@Sun.COM> wrote:

> [PATCH] x86_64: make bootmap_start page align v6
> 
> need to apply after x86_64: add debug name for early_res
> 
> boot oops when system get 64g or 128 installed

thanks - this v6 approach looks a _lot_ saner because it solves the core 
problem: the fragility of the early allocator code. They are also 
cleanups, besides being fixes. Applied.

does this solve all the boot problems you were seeing with 64 or 128 GB 
of RAM?

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86_64: make bootmap_start page align v6
  2008-01-31 21:05               ` Ingo Molnar
@ 2008-01-31 21:30                 ` Yinghai Lu
  2008-01-31 22:55                 ` [PATCH] x86_64: remove unneeded round_up Yinghai Lu
  1 sibling, 0 replies; 23+ messages in thread
From: Yinghai Lu @ 2008-01-31 21:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel,
	Thomas Gleixner, H. Peter Anvin

On Thursday 31 January 2008 01:05:53 pm Ingo Molnar wrote:
> 
> * Yinghai Lu <Yinghai.Lu@Sun.COM> wrote:
> 
> > [PATCH] x86_64: make bootmap_start page align v6
> > 
> > need to apply after x86_64: add debug name for early_res
> > 
> > boot oops when system get 64g or 128 installed
> 
> thanks - this v6 approach looks a _lot_ saner because it solves the core 
> problem: the fragility of the early allocator code. They are also 
> cleanups, besides being fixes. Applied.
> 
> does this solve all the boot problems you were seeing with 64 or 128 GB 
> of RAM?


yes. 

YH

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] x86_64: remove unneeded round_up
  2008-01-31 22:55                 ` [PATCH] x86_64: remove unneeded round_up Yinghai Lu
@ 2008-01-31 22:53                   ` Ingo Molnar
  0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2008-01-31 22:53 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel,
	Thomas Gleixner, H. Peter Anvin


* Yinghai Lu <Yinghai.Lu@Sun.COM> wrote:

> -	if (mem != -1L) {
> -		mem = round_up(mem, align);
> +	if (mem != -1L)
>  		return __va(mem);
> -	}

thanks, applied.

It even reduces the size of the kernel a tiny bit:

   text    data     bss     dec     hex filename
   2963    4149    4352   11464    2cc8 numa_64.o.before
   2949    4149    4352   11450    2cba numa_64.o.after

and it's always a good sign to kernel quality when patches (that change 
functionality) have that effect :)

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] x86_64: remove unneeded round_up
  2008-01-31 21:05               ` Ingo Molnar
  2008-01-31 21:30                 ` Yinghai Lu
@ 2008-01-31 22:55                 ` Yinghai Lu
  2008-01-31 22:53                   ` Ingo Molnar
  1 sibling, 1 reply; 23+ messages in thread
From: Yinghai Lu @ 2008-01-31 22:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Christoph Lameter, Andrew Morton, linux-kernel,
	Thomas Gleixner, H. Peter Anvin

[PATCH] x86_64: remove unneeded round_up

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index d585d27..5a02bf4 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -168,10 +168,9 @@ static void * __init early_node_mem(int nodeid, unsigned long start,
 	unsigned long mem = find_e820_area(start, end, size, align);
 	void *ptr;
 
-	if (mem != -1L) {
-		mem = round_up(mem, align);
+	if (mem != -1L)
 		return __va(mem);
-	}
+
 	ptr = __alloc_bootmem_nopanic(size, align, __pa(MAX_DMA_ADDRESS));
 	if (ptr == NULL) {
 		printk(KERN_ERR "Cannot find %lu bytes in node %d\n",

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-01-31 22:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200801291113.35974.yinghai.lu@sun.com>
2008-01-29 19:14 ` [PATCH 2/4] x86_64: make early_node_mem return align address v2 Yinghai Lu
2008-01-30  2:39   ` Yinghai Lu
2008-01-30  3:28   ` [PATCH 2/2] x86_64: make bootmap_start page align v3 Yinghai Lu
2008-01-30 18:51     ` PATCH] x86_64: make bootmap_start page align v4 Yinghai Lu
2008-01-30 19:08       ` Andi Kleen
2008-01-30 20:15         ` Yinghai Lu
2008-01-30 23:23           ` [PATCH] x86_64: make bootmap_start page align v5 Yinghai Lu
2008-01-30 23:50             ` Yinghai Lu
2008-01-31  3:02           ` PATCH] x86_64: make bootmap_start page align v4 Andi Kleen
2008-01-31  3:29             ` Yinghai Lu
2008-01-31 13:07       ` Ingo Molnar
2008-01-29 19:15 ` [PATCH 3/4] x86_64: Use early reservation for early node data Yinghai Lu
2008-01-29 19:16 ` [PATCH 4/4] x86_64: increse MAX_EARLY_RES for NODE_DATA and bootmap Yinghai Lu
2008-01-30  2:57   ` Andi Kleen
2008-01-30  3:25     ` Yinghai Lu
2008-01-31 13:24       ` Ingo Molnar
2008-01-31 13:34         ` Andi Kleen
2008-01-31 20:37           ` [PATCH] x86_64: add debug name for early_res Yinghai Lu
2008-01-31 20:44             ` [PATCH] x86_64: make bootmap_start page align v6 Yinghai Lu
2008-01-31 21:05               ` Ingo Molnar
2008-01-31 21:30                 ` Yinghai Lu
2008-01-31 22:55                 ` [PATCH] x86_64: remove unneeded round_up Yinghai Lu
2008-01-31 22:53                   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).