LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH] Fix alignment of early reservation for EBDA @ 2008-02-24 17:46 Alexander van Heukelum 2008-02-24 19:27 ` Andi Kleen ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-24 17:46 UTC (permalink / raw) To: Andi Kleen; +Cc: Ingo Molnar, Thomas Gleixner, LKML, Alexander van Heukelum Hi Andi, My eyes fell on the following table in the boot messages: early res: 0 [0-fff] BIOS data page early res: 1 [6000-7fff] SMP_TRAMPOLINE early res: 2 [200000-374557] TEXT DATA BSS early res: 3 [9fc00-a0bff] EBDA early res: 4 [8000-afff] PGTABLE The memory reserved for the EBDA overflows into the area normally reserved for the VGA adaptor. It seems that you wanted to force the allocation to cover whole pages, like: early res: 3 [9f000-9ffff] EBDA This is what this patch implements. Is it really necessary to force the allocation to a page boundary? Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> arch/x86/kernel/head64.c | 12 +++++++----- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index ad24408..2c52543 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -53,7 +53,7 @@ static void __init copy_bootdata(char *real_mode_data) static __init void reserve_ebda(void) { - unsigned ebda_addr, ebda_size; + unsigned int ebda_addr, ebda_size, ebda_end; /* * there is a real-mode segmented pointer pointing to the @@ -70,12 +70,14 @@ static __init void reserve_ebda(void) /* Round EBDA up to pages */ if (ebda_size == 0) ebda_size = 1; + if (ebda_size > 64) + ebda_size = 64; ebda_size <<= 10; - ebda_size = round_up(ebda_size + (ebda_addr & ~PAGE_MASK), PAGE_SIZE); - if (ebda_size > 64*1024) - ebda_size = 64*1024; + + ebda_end = ALIGN(ebda_addr + ebda_size, PAGE_SIZE); + ebda_addr = ALIGN(ebda_addr - PAGE_SIZE, PAGE_SIZE); - reserve_early(ebda_addr, ebda_addr + ebda_size, "EBDA"); + reserve_early(ebda_addr, ebda_end, "EBDA"); } void __init x86_64_start_kernel(char * real_mode_data) ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] Fix alignment of early reservation for EBDA 2008-02-24 17:46 [PATCH] Fix alignment of early reservation for EBDA Alexander van Heukelum @ 2008-02-24 19:27 ` Andi Kleen 2008-02-24 19:41 ` Ingo Molnar 2008-02-25 2:18 ` H. Peter Anvin 2 siblings, 0 replies; 62+ messages in thread From: Andi Kleen @ 2008-02-24 19:27 UTC (permalink / raw) To: Alexander van Heukelum Cc: Ingo Molnar, Thomas Gleixner, LKML, Alexander van Heukelum On Sunday 24 February 2008 18:46:05 Alexander van Heukelum wrote: > Hi Andi, > > My eyes fell on the following table in the boot messages: > > early res: 0 [0-fff] BIOS data page > early res: 1 [6000-7fff] SMP_TRAMPOLINE > early res: 2 [200000-374557] TEXT DATA BSS > early res: 3 [9fc00-a0bff] EBDA > early res: 4 [8000-afff] PGTABLE > > The memory reserved for the EBDA overflows into the area normally > reserved for the VGA adaptor. It seems that you wanted to force > the allocation to cover whole pages, like: > > early res: 3 [9f000-9ffff] EBDA > > This is what this patch implements. Thanks. > Is it really necessary to force the allocation to a page boundary? In theory not, in practice it works around some problems in early allocations where the other users assume page alignment. At some point it needs to be cleaned up properly. -Andi ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] Fix alignment of early reservation for EBDA 2008-02-24 17:46 [PATCH] Fix alignment of early reservation for EBDA Alexander van Heukelum 2008-02-24 19:27 ` Andi Kleen @ 2008-02-24 19:41 ` Ingo Molnar 2008-02-24 20:53 ` Alexander van Heukelum 2008-02-25 2:18 ` H. Peter Anvin 2 siblings, 1 reply; 62+ messages in thread From: Ingo Molnar @ 2008-02-24 19:41 UTC (permalink / raw) To: Alexander van Heukelum Cc: Andi Kleen, Thomas Gleixner, LKML, Alexander van Heukelum * Alexander van Heukelum <heukelum@mailshack.com> wrote: > Hi Andi, > > My eyes fell on the following table in the boot messages: > > early res: 0 [0-fff] BIOS data page > early res: 1 [6000-7fff] SMP_TRAMPOLINE > early res: 2 [200000-374557] TEXT DATA BSS > early res: 3 [9fc00-a0bff] EBDA > early res: 4 [8000-afff] PGTABLE > > The memory reserved for the EBDA overflows into the area normally > reserved for the VGA adaptor. It seems that you wanted to force the > allocation to cover whole pages, like: well, that's what your EBDA descriptor says - it's set to 9fc00 which is 512 bytes below the VGA range. This behavior didnt really change over v2.6.24 (which reserved 'into' the VGA range too), it's just that in v2.6.25 we also print out these early reservations. Can you see any regression? There should be no harm from overlapping into the VGA range - these "reservations" only make RAM unavailable for normal allocations. your patch on the other hand rounds the EBDA area down which could in theory be unsafe on other boxes (where there could be real RAM above the EBDA area): the safest approach is to round the beginning of it down, the end of it up (to page boundary). Your patch _should_ be OK, but in practice it doesnt hurt to reserve a bit more around the edges than to accidentally give a page to the OS that the BIOS might rely upon. Ingo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] Fix alignment of early reservation for EBDA 2008-02-24 19:41 ` Ingo Molnar @ 2008-02-24 20:53 ` Alexander van Heukelum 0 siblings, 0 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-24 20:53 UTC (permalink / raw) To: Ingo Molnar; +Cc: Andi Kleen, Thomas Gleixner, LKML, Alexander van Heukelum On Sun, 24 Feb 2008 20:41:21 +0100, "Ingo Molnar" <mingo@elte.hu> said: > > * Alexander van Heukelum <heukelum@mailshack.com> wrote: > > > Hi Andi, > > > > My eyes fell on the following table in the boot messages: > > > > early res: 0 [0-fff] BIOS data page > > early res: 1 [6000-7fff] SMP_TRAMPOLINE > > early res: 2 [200000-374557] TEXT DATA BSS > > early res: 3 [9fc00-a0bff] EBDA > > early res: 4 [8000-afff] PGTABLE > > > > The memory reserved for the EBDA overflows into the area normally > > reserved for the VGA adaptor. It seems that you wanted to force the > > allocation to cover whole pages, like: > > well, that's what your EBDA descriptor says - it's set to 9fc00 which is > 512 bytes below the VGA range. It's 1024 bytes below, but yes, the EBDA starts there. Then the first two bytes of the EBDA contain the value 0x0001, which means that its size is 1kb, so the BIOS is correct. > This behavior didnt really change over > v2.6.24 (which reserved 'into' the VGA range too), it's just that in > v2.6.25 we also print out these early reservations. Correct. I thought it was new code, but looking more closely, the behaviour has indeed not changed recently (note to self: git log -p somefile.c does not indicate in any way that code was moved from some other place.) > Can you see any > regression? There should be no harm from overlapping into the VGA range > - these "reservations" only make RAM unavailable for normal allocations. No regressions, it's just cosmetic. > your patch on the other hand rounds the EBDA area down which could in > theory be unsafe on other boxes (where there could be real RAM above the > EBDA area): the safest approach is to round the beginning of it down, > the end of it up (to page boundary). Not really: ebda_addr is just a local variable. If the system needs to find the start of the EBDA, it will just have to look at the 16 bit value at address 0x40E again. > Your patch _should_ be OK, but in > practice it doesnt hurt to reserve a bit more around the edges than to > accidentally give a page to the OS that the BIOS might rely upon. The patch is exactly trying to do that. The code that was there seemed to imply that the author wanted to allocate whole pages, in such a way that the allocation contained the whole EBDA. I think that is what it does after this patch. Greetings, Alexander -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - And now for something completely different ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] Fix alignment of early reservation for EBDA 2008-02-24 17:46 [PATCH] Fix alignment of early reservation for EBDA Alexander van Heukelum 2008-02-24 19:27 ` Andi Kleen 2008-02-24 19:41 ` Ingo Molnar @ 2008-02-25 2:18 ` H. Peter Anvin 2008-02-25 16:54 ` Alexander van Heukelum 2 siblings, 1 reply; 62+ messages in thread From: H. Peter Anvin @ 2008-02-25 2:18 UTC (permalink / raw) To: Alexander van Heukelum Cc: Andi Kleen, Ingo Molnar, Thomas Gleixner, LKML, Alexander van Heukelum Alexander van Heukelum wrote: > Hi Andi, > > My eyes fell on the following table in the boot messages: > > early res: 0 [0-fff] BIOS data page > early res: 1 [6000-7fff] SMP_TRAMPOLINE > early res: 2 [200000-374557] TEXT DATA BSS > early res: 3 [9fc00-a0bff] EBDA > early res: 4 [8000-afff] PGTABLE > > The memory reserved for the EBDA overflows into the area normally > reserved for the VGA adaptor. It seems that you wanted to force > the allocation to cover whole pages, like: > > early res: 3 [9f000-9ffff] EBDA > > This is what this patch implements. > > Is it really necessary to force the allocation to a page boundary? > It is, but that rounding gets done in reserve_bootmem() anyway, so there is no need for the arch-specific code to do it. The 32-bit EBDA code hard-codes a size of 4K, which is probably equally wrong; my gut feel is that the right thing to do is to reserve from the EBDA up to the 640K mark (some BIOSes use an area like that for SMM stuff), possibly with some sanity checking. -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] Fix alignment of early reservation for EBDA 2008-02-25 2:18 ` H. Peter Anvin @ 2008-02-25 16:54 ` Alexander van Heukelum 2008-02-25 17:01 ` Ingo Molnar 2008-02-27 14:25 ` [PATCH] Fix alignment of early reservation for EBDA Andi Kleen 0 siblings, 2 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-25 16:54 UTC (permalink / raw) To: H. Peter Anvin, Andi Kleen Cc: Ingo Molnar, Thomas Gleixner, LKML, Alexander van Heukelum On Sun, 24 Feb 2008 18:18:16 -0800, "H. Peter Anvin" <hpa@zytor.com> said: > Alexander van Heukelum wrote: > > early res: 3 [9f000-9ffff] EBDA > > > > Is it really necessary to force the allocation to a page boundary? > > It is, but that rounding gets done in reserve_bootmem() anyway, so there > is no need for the arch-specific code to do it. > > The 32-bit EBDA code hard-codes a size of 4K, which is probably equally > wrong; my gut feel is that the right thing to do is to reserve from the > EBDA up to the 640K mark (some BIOSes use an area like that for SMM > stuff), possibly with some sanity checking. Then, how about reserving everything from the end of conventional memory up to the 1Mb mark? Like this: /* * The BIOS places the EBDA/XBDA at the top of conventional * memory, and usually decreases the reported amount of * conventional memory (int 0x12) too. */ static __init void reserve_ebda(void) { unsigned int lowmem, ebda_addr; /* end of low (conventional) memory */ lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); lowmem <<= 10; /* start of EBDA area */ ebda_addr = *(unsigned short *)__va(BIOS_EBDA_SEGMENT); ebda_addr <<= 4; /* Fixup: bios puts an EBDA in the top 64K segment */ /* of conventional memory, but does not adjust lowmem. */ if ((lowmem - ebda_addr) <= 0x10000) lowmem = ebda_addr; /* Fixup: bios does not report an EBDA at all. */ /* Some old Dells seem to need 4k anyhow (bugzilla 2990) */ if ((ebda_addr == 0) && (lowmem >= 0x9f000)) lowmem = 0x9f000; /* Paranoia: should never happen, but... */ if (lowmem >= 0x100000) lowmem = 0xa0000; /* reserve all memory between lowmem and the 1MB mark */ reserve_early(lowmem, 0x100000, "BIOS reserved"); } Greetings, Alexander -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - Choose from over 50 domains or use your own ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] Fix alignment of early reservation for EBDA 2008-02-25 16:54 ` Alexander van Heukelum @ 2008-02-25 17:01 ` Ingo Molnar 2008-02-25 18:07 ` [PATCH] reserve_early end-of-conventional-memory to 1MB Alexander van Heukelum 2008-02-27 14:25 ` [PATCH] Fix alignment of early reservation for EBDA Andi Kleen 1 sibling, 1 reply; 62+ messages in thread From: Ingo Molnar @ 2008-02-25 17:01 UTC (permalink / raw) To: Alexander van Heukelum Cc: H. Peter Anvin, Andi Kleen, Thomas Gleixner, LKML, Alexander van Heukelum * Alexander van Heukelum <heukelum@fastmail.fm> wrote: > Then, how about reserving everything from the end of conventional > memory up to the 1Mb mark? Like this: looks very good to me - could you send a patch against x86.git#testing? Ingo ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH] reserve_early end-of-conventional-memory to 1MB 2008-02-25 17:01 ` Ingo Molnar @ 2008-02-25 18:07 ` Alexander van Heukelum 2008-02-25 18:13 ` H. Peter Anvin 2008-02-28 13:13 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit Alexander van Heukelum 0 siblings, 2 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-25 18:07 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, LKML Explicitly reserve_early the whole address range from the end of conventional memory as reported by the bios data area up to the 1Mb mark. Regard the info retrieved from the BIOS data area with a bit of paranoia, though, because some biosses forget to register the EBDA correctly. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> -- I think this patch is against -x86.git#testing :-/. Greetings, Alexander arch/x86/kernel/head64.c | 45 +++++++++++++++++++++++++++------------------ 1 files changed, 27 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 38f32e7..b684552 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -49,33 +49,42 @@ static void __init copy_bootdata(char *real_mode_data) } } -#define EBDA_ADDR_POINTER 0x40E +#define BIOS_EBDA_SEGMENT 0x40E +#define BIOS_LOWMEM_KILOBYTES 0x413 +/* + * The BIOS places the EBDA/XBDA at the top of conventional + * memory, and usually decreases the reported amount of + * conventional memory (int 0x12) too. + */ static __init void reserve_ebda(void) { - unsigned ebda_addr, ebda_size; + unsigned int lowmem, ebda_addr; - /* - * there is a real-mode segmented pointer pointing to the - * 4K EBDA area at 0x40E - */ - ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER); + /* end of low (conventional) memory */ + lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); + lowmem <<= 10; + + /* start of EBDA area */ + ebda_addr = *(unsigned short *)__va(BIOS_EBDA_SEGMENT); ebda_addr <<= 4; - if (!ebda_addr) - return; + /* Fixup: bios puts an EBDA in the top 64K segment */ + /* of conventional memory, but does not adjust lowmem. */ + if ((lowmem - ebda_addr) <= 0x10000) + lowmem = ebda_addr; - ebda_size = *(unsigned short *)__va(ebda_addr); + /* Fixup: bios does not report an EBDA at all. */ + /* Some old Dells seem to need 4k anyhow (bugzilla 2990) */ + if ((ebda_addr == 0) && (lowmem >= 0x9f000)) + lowmem = 0x9f000; - /* Round EBDA up to pages */ - if (ebda_size == 0) - ebda_size = 1; - ebda_size <<= 10; - ebda_size = round_up(ebda_size + (ebda_addr & ~PAGE_MASK), PAGE_SIZE); - if (ebda_size > 64*1024) - ebda_size = 64*1024; + /* Paranoia: should never happen, but... */ + if (lowmem >= 0x100000) + lowmem = 0xa0000; - reserve_early(ebda_addr, ebda_addr + ebda_size, "EBDA"); + /* reserve all memory between lowmem and the 1MB mark */ + reserve_early(lowmem, 0x100000, "BIOS reserved"); } void __init x86_64_start_kernel(char * real_mode_data) ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve_early end-of-conventional-memory to 1MB 2008-02-25 18:07 ` [PATCH] reserve_early end-of-conventional-memory to 1MB Alexander van Heukelum @ 2008-02-25 18:13 ` H. Peter Anvin 2008-02-25 19:46 ` Alexander van Heukelum 2008-02-28 13:13 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit Alexander van Heukelum 1 sibling, 1 reply; 62+ messages in thread From: H. Peter Anvin @ 2008-02-25 18:13 UTC (permalink / raw) To: Alexander van Heukelum Cc: Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, LKML Alexander van Heukelum wrote: > > arch/x86/kernel/head64.c | 45 +++++++++++++++++++++++++++------------------ > 1 files changed, 27 insertions(+), 18 deletions(-) > > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c > index 38f32e7..b684552 100644 > --- a/arch/x86/kernel/head64.c > +++ b/arch/x86/kernel/head64.c > @@ -49,33 +49,42 @@ static void __init copy_bootdata(char *real_mode_data) > } > } > > -#define EBDA_ADDR_POINTER 0x40E > +#define BIOS_EBDA_SEGMENT 0x40E > +#define BIOS_LOWMEM_KILOBYTES 0x413 > Linus has specific comments in the 32-bit code that states we do not want to use address 0x413 for anything - if nothing else because it's often lowered when there is boottime code involved like CD-ROM booting or PXE. Either way, the code should be shared between 32 and 64 bits. There is nothing bitsize-specific about it! -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve_early end-of-conventional-memory to 1MB 2008-02-25 18:13 ` H. Peter Anvin @ 2008-02-25 19:46 ` Alexander van Heukelum 2008-02-25 21:17 ` H. Peter Anvin 2008-02-26 9:30 ` Ingo Molnar 0 siblings, 2 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-25 19:46 UTC (permalink / raw) To: H. Peter Anvin Cc: Ingo Molnar, Andi Kleen, Thomas Gleixner, LKML, Alexander van Heukelum On Mon, 25 Feb 2008 10:13:17 -0800, "H. Peter Anvin" <hpa@zytor.com> said: > Alexander van Heukelum wrote: > > > > arch/x86/kernel/head64.c | 45 +++++++++++++++++++++++++++------------------ > > 1 files changed, 27 insertions(+), 18 deletions(-) > > > > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c > > index 38f32e7..b684552 100644 > > --- a/arch/x86/kernel/head64.c > > +++ b/arch/x86/kernel/head64.c > > @@ -49,33 +49,42 @@ static void __init copy_bootdata(char *real_mode_data) > > } > > } > > > > -#define EBDA_ADDR_POINTER 0x40E > > +#define BIOS_EBDA_SEGMENT 0x40E > > +#define BIOS_LOWMEM_KILOBYTES 0x413 > > > > Linus has specific comments in the 32-bit code that states we do not > want to use address 0x413 for anything - if nothing else because it's > often lowered when there is boottime code involved like CD-ROM booting > or PXE. Hello hpa, I failed to find a comment referring to 0x413 or int 0x12 in arch/x86/kernel. I do know the comment in Documentation/i386/boot.txt, however, suggesting that "INT 12h" _should_ be used, but that it would be pointless for zImage and old bzImage kernels, because they _have_ to be started from 0x90000 anyway. New bzImage kernels do not have this limitation, and smart bootloaders simply put them at much lower addresses, like 0x40000. (I know: you know.) My point is just that the argument not to use 0x413 in the bootloader is not valid for this case. That leaves the possibility that code/data is inserted at the top of conventional memory by either the BIOS or some kind of bootloader. My view on this is that this code/data should be preserved: it was specifically asked from us by lowering 0x413. One just loses a tiny bit of usable memory, and if the program booting the kernel knows better it can unload the driver if it wants to. It's just not a kernel problem. > Either way, the code should be shared between 32 and 64 bits. > There is nothing bitsize-specific about it! Of course. That's also why I already added the old-Dell case ;). But one problem at a time, please! Greetings, Alexander -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - I mean, what is it about a decent email service? ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve_early end-of-conventional-memory to 1MB 2008-02-25 19:46 ` Alexander van Heukelum @ 2008-02-25 21:17 ` H. Peter Anvin 2008-02-26 9:30 ` Ingo Molnar 1 sibling, 0 replies; 62+ messages in thread From: H. Peter Anvin @ 2008-02-25 21:17 UTC (permalink / raw) To: Alexander van Heukelum Cc: Ingo Molnar, Andi Kleen, Thomas Gleixner, LKML, Alexander van Heukelum Alexander van Heukelum wrote: > > Hello hpa, > > I failed to find a comment referring to 0x413 or int 0x12 in > arch/x86/kernel. I do know the comment in Documentation/i386/boot.txt, > however, suggesting that "INT 12h" _should_ be used, but that it would > be pointless for zImage and old bzImage kernels, because they _have_ > to be started from 0x90000 anyway. New bzImage kernels do not have > this limitation, and smart bootloaders simply put them at much > lower addresses, like 0x40000. (I know: you know.) My point is just > that the argument not to use 0x413 in the bootloader is not valid > for this case. > OK, let me see if I can dig up the comment I thought I found at one point, and maybe found out the history behind it. Either that, or find out that my memory is faulty ;) -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve_early end-of-conventional-memory to 1MB 2008-02-25 19:46 ` Alexander van Heukelum 2008-02-25 21:17 ` H. Peter Anvin @ 2008-02-26 9:30 ` Ingo Molnar 2008-02-27 14:26 ` Andi Kleen 2008-02-27 14:38 ` [PATCH] reserve_early end-of-conventional-memory to 1MB II - some numbers to put it into perspective Andi Kleen 1 sibling, 2 replies; 62+ messages in thread From: Ingo Molnar @ 2008-02-26 9:30 UTC (permalink / raw) To: Alexander van Heukelum Cc: H. Peter Anvin, Andi Kleen, Thomas Gleixner, LKML, Alexander van Heukelum * Alexander van Heukelum <heukelum@fastmail.fm> wrote: > > Either way, the code should be shared between 32 and 64 bits. There > > is nothing bitsize-specific about it! > > Of course. That's also why I already added the old-Dell case ;). But > one problem at a time, please! i've applied your patch to x86.git#testing. Feel free to send a code-unification patch too :-) Ingo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve_early end-of-conventional-memory to 1MB 2008-02-26 9:30 ` Ingo Molnar @ 2008-02-27 14:26 ` Andi Kleen 2008-02-27 14:38 ` [PATCH] reserve_early end-of-conventional-memory to 1MB II - some numbers to put it into perspective Andi Kleen 1 sibling, 0 replies; 62+ messages in thread From: Andi Kleen @ 2008-02-27 14:26 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander van Heukelum, H. Peter Anvin, Thomas Gleixner, LKML, Alexander van Heukelum Ingo Molnar wrote: > * Alexander van Heukelum <heukelum@fastmail.fm> wrote: > >>> Either way, the code should be shared between 32 and 64 bits. There >>> is nothing bitsize-specific about it! >> Of course. That's also why I already added the old-Dell case ;). But >> one problem at a time, please! > > i've applied your patch to x86.git#testing. Feel free to send a > code-unification patch too :-) So you've basically wasted a lot of memory with no concrete breakage known ...Please reconsider. -Andi ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve_early end-of-conventional-memory to 1MB II - some numbers to put it into perspective 2008-02-26 9:30 ` Ingo Molnar 2008-02-27 14:26 ` Andi Kleen @ 2008-02-27 14:38 ` Andi Kleen 2008-02-27 16:44 ` H. Peter Anvin 2008-02-27 20:01 ` Alexander van Heukelum 1 sibling, 2 replies; 62+ messages in thread From: Andi Kleen @ 2008-02-27 14:38 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander van Heukelum, H. Peter Anvin, Thomas Gleixner, LKML, Alexander van Heukelum Ingo Molnar wrote: > * Alexander van Heukelum <heukelum@fastmail.fm> wrote: > >>> Either way, the code should be shared between 32 and 64 bits. There >>> is nothing bitsize-specific about it! >> Of course. That's also why I already added the old-Dell case ;). But >> one problem at a time, please! > > i've applied your patch to x86.git#testing. Feel free to send a > code-unification patch too :-) Just to give some perspective of this: On my laptop here BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000d2000 - 0000000000100000 (reserved) This means it reserves only ~193KB in the 640k-1MB area With this patch it will reserve 384KB instead. This means 191KB are lost. While that doesn't sound too much it worth as much as 382 patches that reduce kernel code size by 512bytes or worth 3820 patches that reduce kernel code by 100 bytes in terms of memory consumption. Now such kernel code size patches are always popular, but why undo that work by throwing away perfectly good memory elsewhere? Or also the laptop kernel does Freeing unused kernel memory: 340k freed This means the 193KB now lost are worth 56% of the complete memory that is freed by __initdata/__init. Just maintaining these annotations is a lot of work, but why do all that if we then throw away than half as much memory as they save so easily? -Andi ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve_early end-of-conventional-memory to 1MB II - some numbers to put it into perspective 2008-02-27 14:38 ` [PATCH] reserve_early end-of-conventional-memory to 1MB II - some numbers to put it into perspective Andi Kleen @ 2008-02-27 16:44 ` H. Peter Anvin 2008-02-27 20:01 ` Alexander van Heukelum 1 sibling, 0 replies; 62+ messages in thread From: H. Peter Anvin @ 2008-02-27 16:44 UTC (permalink / raw) To: Andi Kleen Cc: Ingo Molnar, Alexander van Heukelum, Thomas Gleixner, LKML, Alexander van Heukelum Andi Kleen wrote: > > Just to give some perspective of this: > > On my laptop here > > BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved) > BIOS-e820: 00000000000d2000 - 0000000000100000 (reserved) > > This means it reserves only ~193KB in the 640k-1MB area > > With this patch it will reserve 384KB instead. This means 191KB > are lost. While that doesn't sound too much it worth as much as > 382 patches that reduce kernel code size by 512bytes or > worth 3820 patches that reduce kernel code by 100 bytes in terms > of memory consumption. > > Now such kernel code size patches are always popular, but why undo that > work by throwing away perfectly good memory elsewhere? > > Or also the laptop kernel does > > Freeing unused kernel memory: 340k freed > > This means the 193KB now lost are worth 56% of the complete > memory that is freed by __initdata/__init. Just maintaining > these annotations is a lot of work, but why do all that if we > then throw away than half as much memory as they save so easily? > It doesn't waste any memory at all. It arguably wastes some *address space*, but that's an entirely different thing. Unless you have chipset-specific drivers to enable memory in the 640-1024K memory area, there is nothing there. -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve_early end-of-conventional-memory to 1MB II - some numbers to put it into perspective 2008-02-27 14:38 ` [PATCH] reserve_early end-of-conventional-memory to 1MB II - some numbers to put it into perspective Andi Kleen 2008-02-27 16:44 ` H. Peter Anvin @ 2008-02-27 20:01 ` Alexander van Heukelum 1 sibling, 0 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-27 20:01 UTC (permalink / raw) To: Andi Kleen, Ingo Molnar Cc: H. Peter Anvin, Thomas Gleixner, LKML, Alexander van Heukelum On Wed, 27 Feb 2008 15:38:28 +0100, "Andi Kleen" <ak@suse.de> said: > Ingo Molnar wrote: > > * Alexander van Heukelum <heukelum@fastmail.fm> wrote: > > > >>> Either way, the code should be shared between 32 and 64 bits. There > >>> is nothing bitsize-specific about it! > >> Of course. That's also why I already added the old-Dell case ;). But > >> one problem at a time, please! > > > > i've applied your patch to x86.git#testing. Feel free to send a > > code-unification patch too :-) > > Just to give some perspective of this: > > On my laptop here > > BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved) > BIOS-e820: 00000000000d2000 - 0000000000100000 (reserved) Hi Andi, Can you provide the complete 'raw' e820 info? This is at least not complete, and also might be after the sanitation. Do you mean that (1) there is usable RAM somewhere between 0xa0000 and 0xd2000? Or that (2) this second line should be RAM, not reserved? Case (1) would surpise me, because I expect the VGA adapter (which I assume is there...) to occupy 0xa0000 to 0xc0000 for the framebuffer. Also, your case would be a lot stronger if there were a line that explicitly indicated that there was RAM there. Case (2) would surprise me too, because a lot of software would expect the system BIOS to reside in at least the area 0xf0000 to 0x100000. jmp 0xf000:fff0 for a reboot, no? Laptops do not always have the VGA bios in the standard place, so the range 0xc0000-0xd2000 could well be unused. Still, I doubt that there is real RAM accessible in that region. So I think you have not correctly interpreted the e820 info. But, if you (or anyone else, for that matter) provide(s) a raw e820 map that shows usable RAM in the region 0xa0000-0x100000, the I agree that the patch should be reconsidered. > This means it reserves only ~193KB in the 640k-1MB area > > With this patch it will reserve 384KB instead. This means 191KB > are lost. The two lines you gave say that two regions are reserved. Nothing tells what is in between those regions. If a region is not covered by e820 at all, it is to be considered unusable, right? > While that doesn't sound too much it worth as much as > 382 patches that reduce kernel code size by 512bytes or > worth 3820 patches that reduce kernel code by 100 bytes in terms > of memory consumption. Agreed. > Now such kernel code size patches are always popular, but why > undo that work by throwing away perfectly good memory elsewhere? I don't intend to. I have never seen a machine with usable memory in the 0xa000-0x100000 region. > Or also the laptop kernel does > > Freeing unused kernel memory: 340k freed > > This means the 193KB now lost are worth 56% of the complete > memory that is freed by __initdata/__init. Just maintaining > these annotations is a lot of work, but why do all that if we > then throw away than half as much memory as they save so easily? Agreed, if... Greetings, Alexander -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - mmm... Fastmail... ^ permalink raw reply [flat|nested] 62+ messages in thread
* [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit 2008-02-25 18:07 ` [PATCH] reserve_early end-of-conventional-memory to 1MB Alexander van Heukelum 2008-02-25 18:13 ` H. Peter Anvin @ 2008-02-28 13:13 ` Alexander van Heukelum 2008-02-28 13:28 ` [RFC] use realmode code to reserve end-of-conventional-memory to 1MB Alexander van Heukelum 2008-02-28 21:09 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit Ian Campbell 1 sibling, 2 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-28 13:13 UTC (permalink / raw) To: Ingo Molnar, Ian Campbell Cc: Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML The 32-bit code still uses reserve_bootmem, so this is not really a unification with the 64-bit version of the ebda reservation code, but at least it provides the same detection logic and reserves the same areas. This does not crash immediately on qemu. No further testing was done! Otherwise: Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- Hello Ian, Ingo, This patch should do the same reservations as the reserve_early patch for 64 bit. Maybe you could try if this solves the problems you are seeing with Xen? On the other hand, Xen should be able to use those legacy ranges as normal memory. Why doesn't it work? Does Xen use a special bootloader? In that case it could just add the reservation to the e820 memory map that is provided. Another possible approach to the whole problem is just to ammend the e820 memory map in the boot-code. Thanks to hpa, it is now easy to change this code. Then the kernel can just trust the e820 memory map, and only 'sophisticated' bootloaders need to do something similar by themselves. Greetings, Alexander diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c index a1d7071..e12cc31 100644 --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -385,15 +385,47 @@ unsigned long __init find_max_low_pfn(void) return max_low_pfn; } +#define BIOS_EBDA_SEGMENT 0x40E +#define BIOS_LOWMEM_KILOBYTES 0x413 + /* - * workaround for Dell systems that neglect to reserve EBDA + * The BIOS places the EBDA/XBDA at the top of conventional + * memory, and usually decreases the reported amount of + * conventional memory (int 0x12) too. This also contains a + * workaround for Dell systems that neglect to reserve EBDA. + * The same workaround also avoids a problem with the AMD768MPX + * chipset: reserve a page before VGA to prevent PCI prefetch + * into it (errata #56). Usually the page is reserved anyways, + * unless you have no PS/2 mouse plugged in. */ static void __init reserve_ebda_region(void) { - unsigned int addr; - addr = get_bios_ebda(); - if (addr) - reserve_bootmem(addr, PAGE_SIZE, BOOTMEM_DEFAULT); + unsigned int lowmem, ebda_addr; + + /* end of low (conventional) memory */ + lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); + lowmem <<= 10; + + /* start of EBDA area */ + ebda_addr = *(unsigned short *)__va(BIOS_EBDA_SEGMENT); + ebda_addr <<= 4; + + /* Fixup: bios puts an EBDA in the top 64K segment */ + /* of conventional memory, but does not adjust lowmem. */ + if ((lowmem - ebda_addr) <= 0x10000) + lowmem = ebda_addr; + + /* Fixup: bios does not report an EBDA at all. */ + /* Some old Dells seem to need 4k anyhow (bugzilla 2990) */ + if ((ebda_addr == 0) && (lowmem >= 0x9f000)) + lowmem = 0x9f000; + + /* Paranoia: should never happen, but... */ + if (lowmem >= 0x100000) + lowmem = 0xa0000; + + /* reserve all memory between lowmem and the 1MB mark */ + reserve_bootmem(lowmem, 0x100000 - lowmem, BOOTMEM_DEFAULT); } #ifndef CONFIG_NEED_MULTIPLE_NODES @@ -619,16 +651,9 @@ void __init setup_bootmem_allocator(void) */ reserve_bootmem(0, PAGE_SIZE, BOOTMEM_DEFAULT); - /* reserve EBDA region, it's a 4K region */ + /* reserve EBDA region */ reserve_ebda_region(); - /* could be an AMD 768MPX chipset. Reserve a page before VGA to prevent - PCI prefetch into it (errata #56). Usually the page is reserved anyways, - unless you have no PS/2 mouse plugged in. */ - if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && - boot_cpu_data.x86 == 6) - reserve_bootmem(0xa0000 - 4096, 4096, BOOTMEM_DEFAULT); - #ifdef CONFIG_SMP /* * But first pinch a few for the stack/trampoline stuff ^ permalink raw reply [flat|nested] 62+ messages in thread
* [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-02-28 13:13 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit Alexander van Heukelum @ 2008-02-28 13:28 ` Alexander van Heukelum 2008-02-28 21:12 ` Ian Campbell 2008-02-28 21:09 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit Ian Campbell 1 sibling, 1 reply; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-28 13:28 UTC (permalink / raw) To: Ingo Molnar, Ian Campbell Cc: Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML Instead of using early reservations inside the kernel code, we could use the realmode code to modify the e820 memmap. This patch shows what that would look like. I have not looked at the case where the BIOS does not provide an e820 memmap yet. Probably a full solution would need to create a fake e820 memmap in that case. Comments? Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> diff --git a/arch/x86/boot/memory.c b/arch/x86/boot/memory.c index e77d89f..522920a 100644 --- a/arch/x86/boot/memory.c +++ b/arch/x86/boot/memory.c @@ -23,6 +23,7 @@ static int detect_memory_e820(void) int count = 0; u32 next = 0; u32 size, id; + u32 lowmem, ebda_addr; u8 err; struct e820entry *desc = boot_params.e820_map; @@ -50,13 +51,51 @@ static int detect_memory_e820(void) just return failure. */ if (id != SMAP) { count = 0; - break; + goto out; } count++; desc++; - } while (next && count < E820MAX); - + } while (next && count < (E820MAX - 1)); + + /* Some BIOSes do not reserve the EBDA/XBDA area correctly in. + The e820-map. Find out where the EBDA resides by looking at + the BIOS data area and reserve the EBDA and the following + legacy adapter area explicitly. */ +#define BIOS_EBDA_SEGMENT 0x40E +#define BIOS_LOWMEM_KILOBYTES 0x413 + + /* end of low (conventional) memory */ + set_fs(0); + lowmem = rdfs16(BIOS_LOWMEM_KILOBYTES); + lowmem <<= 10; + + /* start of EBDA area */ + ebda_addr = rdfs16(BIOS_EBDA_SEGMENT); + ebda_addr <<= 4; + + /* Fixup: bios puts an EBDA in the top 64K segment */ + /* of conventional memory, but does not adjust lowmem. */ + if ((lowmem - ebda_addr) <= 0x10000) + lowmem = ebda_addr; + + /* Fixup: bios does not report an EBDA at all. */ + /* Some old Dells seem to need 4k anyhow (bugzilla 2990) */ + if ((ebda_addr == 0) && (lowmem >= 0x9f000)) + lowmem = 0x9f000; + + /* Paranoia: should never happen, but... */ + if (lowmem >= 0x100000) + lowmem = 0xa0000; + + /* reserve all memory between lowmem and the 1MB mark */ + desc->addr = lowmem; + desc->size = 0x100000 - lowmem; + desc->type = E820_RESERVED; + count++; + desc++; + +out: return boot_params.e820_entries = count; } ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-02-28 13:28 ` [RFC] use realmode code to reserve end-of-conventional-memory to 1MB Alexander van Heukelum @ 2008-02-28 21:12 ` Ian Campbell 2008-02-28 21:14 ` H. Peter Anvin 0 siblings, 1 reply; 62+ messages in thread From: Ian Campbell @ 2008-02-28 21:12 UTC (permalink / raw) To: Alexander van Heukelum Cc: Ingo Molnar, Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML On Thu, 2008-02-28 at 14:28 +0100, Alexander van Heukelum wrote: > Instead of using early reservations inside the kernel code, > we could use the realmode code to modify the e820 memmap. > This patch shows what that would look like. I have not looked > at the case where the BIOS does not provide an e820 memmap > yet. Probably a full solution would need to create a fake > e820 memmap in that case. An e820 is already faked up in machine_specific_memory_setup() if one doesn't already exist. > Comments? This won't work for Xen since the real-mode code never runs there. I think it could be fixed in xen_memory_setup() though if native goes down this route. Ian. > > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> > > diff --git a/arch/x86/boot/memory.c b/arch/x86/boot/memory.c > index e77d89f..522920a 100644 > --- a/arch/x86/boot/memory.c > +++ b/arch/x86/boot/memory.c > @@ -23,6 +23,7 @@ static int detect_memory_e820(void) > int count = 0; > u32 next = 0; > u32 size, id; > + u32 lowmem, ebda_addr; > u8 err; > struct e820entry *desc = boot_params.e820_map; > > @@ -50,13 +51,51 @@ static int detect_memory_e820(void) > just return failure. */ > if (id != SMAP) { > count = 0; > - break; > + goto out; > } > > count++; > desc++; > - } while (next && count < E820MAX); > - > + } while (next && count < (E820MAX - 1)); > + > + /* Some BIOSes do not reserve the EBDA/XBDA area correctly in. > + The e820-map. Find out where the EBDA resides by looking at > + the BIOS data area and reserve the EBDA and the following > + legacy adapter area explicitly. */ > +#define BIOS_EBDA_SEGMENT 0x40E > +#define BIOS_LOWMEM_KILOBYTES 0x413 > + > + /* end of low (conventional) memory */ > + set_fs(0); > + lowmem = rdfs16(BIOS_LOWMEM_KILOBYTES); > + lowmem <<= 10; > + > + /* start of EBDA area */ > + ebda_addr = rdfs16(BIOS_EBDA_SEGMENT); > + ebda_addr <<= 4; > + > + /* Fixup: bios puts an EBDA in the top 64K segment */ > + /* of conventional memory, but does not adjust lowmem. */ > + if ((lowmem - ebda_addr) <= 0x10000) > + lowmem = ebda_addr; > + > + /* Fixup: bios does not report an EBDA at all. */ > + /* Some old Dells seem to need 4k anyhow (bugzilla 2990) */ > + if ((ebda_addr == 0) && (lowmem >= 0x9f000)) > + lowmem = 0x9f000; > + > + /* Paranoia: should never happen, but... */ > + if (lowmem >= 0x100000) > + lowmem = 0xa0000; > + > + /* reserve all memory between lowmem and the 1MB mark */ > + desc->addr = lowmem; > + desc->size = 0x100000 - lowmem; > + desc->type = E820_RESERVED; > + count++; > + desc++; > + > +out: > return boot_params.e820_entries = count; > } > > > -- Ian Campbell Nasrudin walked into a teahouse and declaimed, "The moon is more useful than the sun." "Why?", he was asked. "Because at night we need the light more." ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-02-28 21:12 ` Ian Campbell @ 2008-02-28 21:14 ` H. Peter Anvin 2008-02-28 23:16 ` Ian Campbell 0 siblings, 1 reply; 62+ messages in thread From: H. Peter Anvin @ 2008-02-28 21:14 UTC (permalink / raw) To: Ian Campbell Cc: Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML Ian Campbell wrote: > On Thu, 2008-02-28 at 14:28 +0100, Alexander van Heukelum wrote: >> Instead of using early reservations inside the kernel code, >> we could use the realmode code to modify the e820 memmap. >> This patch shows what that would look like. I have not looked >> at the case where the BIOS does not provide an e820 memmap >> yet. Probably a full solution would need to create a fake >> e820 memmap in that case. > > An e820 is already faked up in machine_specific_memory_setup() if one > doesn't already exist. > >> Comments? > > This won't work for Xen since the real-mode code never runs there. I > think it could be fixed in xen_memory_setup() though if native goes down > this route. > s/could/should/. You need to set up your memory map more sensibly; it's not just the kernel, user space tries to access these areas too. -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-02-28 21:14 ` H. Peter Anvin @ 2008-02-28 23:16 ` Ian Campbell 2008-02-29 20:00 ` Ingo Molnar ` (3 more replies) 0 siblings, 4 replies; 62+ messages in thread From: Ian Campbell @ 2008-02-28 23:16 UTC (permalink / raw) To: H. Peter Anvin Cc: Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML On Thu, 2008-02-28 at 13:14 -0800, H. Peter Anvin wrote: > Ian Campbell wrote: > > On Thu, 2008-02-28 at 14:28 +0100, Alexander van Heukelum wrote: > >> Instead of using early reservations inside the kernel code, > >> we could use the realmode code to modify the e820 memmap. > >> This patch shows what that would look like. I have not looked > >> at the case where the BIOS does not provide an e820 memmap > >> yet. Probably a full solution would need to create a fake > >> e820 memmap in that case. > > > > An e820 is already faked up in machine_specific_memory_setup() if one > > doesn't already exist. > > > >> Comments? > > > > This won't work for Xen since the real-mode code never runs there. I > > think it could be fixed in xen_memory_setup() though if native goes down > > this route. > > > > s/could/should/. > > You need to set up your memory map more sensibly; it's not just the > kernel, user space tries to access these areas too. Agreed, I think it's pure luck that Xen kernels have gotten away with it in the past. The patch below seems like the right thing to do. It certainly boots in a domU without the DMI problem (without any of the other related patches such as Alexander's). However ddcprobe hangs when run -- need to investigate some more, vm86 in general works ok (i.e. my vm86 test code passes). BTW Jeremy, the kernel doesn't use XENMEM_memory_map -- any reason other than it not being useful at the time? These days the tools can push an arbitrary e820 down for a guest which might be useful to support, although nothing interesting is done with it today. Ian. x86/xen: Construct e820 map with a hole between 640K-1M. Signed-off-by: Ian Campbell <ijc@hellion.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> --- arch/x86/xen/setup.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 3bad477..2341492 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -38,7 +38,8 @@ char * __init xen_memory_setup(void) unsigned long max_pfn = xen_start_info->nr_pages; e820.nr_map = 0; - add_memory_region(0, PFN_PHYS(max_pfn), E820_RAM); + add_memory_region(0, LOWMEMSIZE(), E820_RAM); + add_memory_region(HIGH_MEMORY, PFN_PHYS(max_pfn)-HIGH_MEMORY, E820_RAM); return "Xen"; } -- 1.5.4.2 -- Ian Campbell His mind is like a steel trap: full of mice. -- Foghorn Leghorn ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-02-28 23:16 ` Ian Campbell @ 2008-02-29 20:00 ` Ingo Molnar 2008-03-04 11:41 ` Mark McLoughlin ` (2 subsequent siblings) 3 siblings, 0 replies; 62+ messages in thread From: Ingo Molnar @ 2008-02-29 20:00 UTC (permalink / raw) To: Ian Campbell Cc: H. Peter Anvin, Alexander van Heukelum, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML * Ian Campbell <ijc@hellion.org.uk> wrote: > Agreed, I think it's pure luck that Xen kernels have gotten away with > it in the past. > > The patch below seems like the right thing to do. It certainly boots > in a domU without the DMI problem (without any of the other related > patches such as Alexander's). > > However ddcprobe hangs when run -- need to investigate some more, vm86 > in general works ok (i.e. my vm86 test code passes). looks good to me - please resend with a note to add it to x86.git once you are happy with it. Seems like 2.6.25 material as well. Ingo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-02-28 23:16 ` Ian Campbell 2008-02-29 20:00 ` Ingo Molnar @ 2008-03-04 11:41 ` Mark McLoughlin 2008-03-04 14:33 ` Ingo Molnar 2008-03-04 17:44 ` Jeremy Fitzhardinge 2008-03-05 15:59 ` Eduardo Habkost 3 siblings, 1 reply; 62+ messages in thread From: Mark McLoughlin @ 2008-03-04 11:41 UTC (permalink / raw) To: Ian Campbell Cc: H. Peter Anvin, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Thu, 2008-02-28 at 23:16 +0000, Ian Campbell wrote: > On Thu, 2008-02-28 at 13:14 -0800, H. Peter Anvin wrote: > > > > You need to set up your memory map more sensibly; it's not just the > > kernel, user space tries to access these areas too. > > Agreed, I think it's pure luck that Xen kernels have gotten away with it > in the past. > > The patch below seems like the right thing to do. It certainly boots in > a domU without the DMI problem (without any of the other related patches > such as Alexander's). Yep, this patch on its own fixes 2.6.25-rc3 DomU boot for me here. > x86/xen: Construct e820 map with a hole between 640K-1M. > > Signed-off-by: Ian Campbell <ijc@hellion.org.uk> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: H. Peter Anvin <hpa@zytor.com> > Cc: Jeremy Fitzhardinge <jeremy@goop.org> > --- > arch/x86/xen/setup.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 3bad477..2341492 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -38,7 +38,8 @@ char * __init xen_memory_setup(void) > unsigned long max_pfn = xen_start_info->nr_pages; > > e820.nr_map = 0; > - add_memory_region(0, PFN_PHYS(max_pfn), E820_RAM); > + add_memory_region(0, LOWMEMSIZE(), E820_RAM); > + add_memory_region(HIGH_MEMORY, PFN_PHYS(max_pfn)-HIGH_MEMORY, E820_RAM); Seems to me to be the right thing to do ... Acked-by: Mark McLoughlin <markmc@redhat.com> Cheers, Mark. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-04 11:41 ` Mark McLoughlin @ 2008-03-04 14:33 ` Ingo Molnar 2008-03-04 15:12 ` Ian Campbell ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: Ingo Molnar @ 2008-03-04 14:33 UTC (permalink / raw) To: Mark McLoughlin Cc: Ian Campbell, H. Peter Anvin, Alexander van Heukelum, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML * Mark McLoughlin <markmc@redhat.com> wrote: > > The patch below seems like the right thing to do. It certainly boots > > in a domU without the DMI problem (without any of the other related > > patches such as Alexander's). > > Yep, this patch on its own fixes 2.6.25-rc3 DomU boot for me here. thanks - i picked it up for v2.6.25 merging. i'm wondering, what triggered this bug, and why didnt we have these problems in the past? Ingo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-04 14:33 ` Ingo Molnar @ 2008-03-04 15:12 ` Ian Campbell 2008-03-04 15:13 ` Jeremy Fitzhardinge 2008-03-04 16:15 ` Mark McLoughlin 2 siblings, 0 replies; 62+ messages in thread From: Ian Campbell @ 2008-03-04 15:12 UTC (permalink / raw) To: Ingo Molnar Cc: Mark McLoughlin, H. Peter Anvin, Alexander van Heukelum, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Tue, 2008-03-04 at 15:33 +0100, Ingo Molnar wrote: > * Mark McLoughlin <markmc@redhat.com> wrote: > > > > The patch below seems like the right thing to do. It certainly boots > > > in a domU without the DMI problem (without any of the other related > > > patches such as Alexander's). > > > > Yep, this patch on its own fixes 2.6.25-rc3 DomU boot for me here. > > thanks - i picked it up for v2.6.25 merging. Thanks, I'd been planning to resend once I got the ddcprobe thing sorted but I've been a bit snowed under (deadlines :-|). This patch is independently correct though and turns the ddcprobe failure from a domU kernel crash into a userspace hang. ddcprobe hangs because the page at pfn 0 is comprised entirely 0xc2 bytes (the Xen memory scrub poison value -- I'm running a debug hypervisor) which causes lrmi to try and jump to 0xc2c2*16+0xc2c2 to implement int $0xXX. gdb won't let me actually examine that address for some reason so I don't actually know what it's hitting to cause the loop/hang but no good can come of jumping to that address in any case... Sanest solution seems to me to be to explicitly zero PFN 0 -- I have a patch I just haven't had a chance to clean it up for submission yet. > i'm wondering, what triggered this bug, and why didnt we have these > problems in the past? Did someone make the boot allocator start at the bottom instead of the top of memory or something? Ian. -- Ian Campbell Current Noise: Nile - Even The Gods Must Die There is a natural hootchy-kootchy to a goldfish. -- Walt Disney ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-04 14:33 ` Ingo Molnar 2008-03-04 15:12 ` Ian Campbell @ 2008-03-04 15:13 ` Jeremy Fitzhardinge 2008-03-04 15:25 ` Ingo Molnar 2008-03-04 16:15 ` Mark McLoughlin 2 siblings, 1 reply; 62+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-04 15:13 UTC (permalink / raw) To: Ingo Molnar Cc: Mark McLoughlin, Ian Campbell, H. Peter Anvin, Alexander van Heukelum, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, LKML Ingo Molnar wrote: > i'm wondering, what triggered this bug, and why didnt we have these > problems in the past? I think it was a side effect of all the early_ioremap work, which changed the order in which things happened at boot time. J ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-04 15:13 ` Jeremy Fitzhardinge @ 2008-03-04 15:25 ` Ingo Molnar 2008-03-04 16:02 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 62+ messages in thread From: Ingo Molnar @ 2008-03-04 15:25 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Mark McLoughlin, Ian Campbell, H. Peter Anvin, Alexander van Heukelum, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, LKML * Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Ingo Molnar wrote: >> i'm wondering, what triggered this bug, and why didnt we have these >> problems in the past? > > I think it was a side effect of all the early_ioremap work, which > changed the order in which things happened at boot time. which commits do you mean in particular? Ingo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-04 15:25 ` Ingo Molnar @ 2008-03-04 16:02 ` Jeremy Fitzhardinge 2008-03-04 16:15 ` Ingo Molnar 0 siblings, 1 reply; 62+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-04 16:02 UTC (permalink / raw) To: Ingo Molnar Cc: Mark McLoughlin, Ian Campbell, H. Peter Anvin, Alexander van Heukelum, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, LKML Ingo Molnar wrote: >> I think it was a side effect of all the early_ioremap work, which >> changed the order in which things happened at boot time. >> > > which commits do you mean in particular? Good question. It might have been cpa as well. I'd have to look through to confirm the theory, but I think Xen got broken for other reasons around there, so simple bisect wouldn't be enough to identify the problem. J ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-04 16:02 ` Jeremy Fitzhardinge @ 2008-03-04 16:15 ` Ingo Molnar 0 siblings, 0 replies; 62+ messages in thread From: Ingo Molnar @ 2008-03-04 16:15 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Mark McLoughlin, Ian Campbell, H. Peter Anvin, Alexander van Heukelum, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, LKML * Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Ingo Molnar wrote: >>> I think it was a side effect of all the early_ioremap work, which changed >>> the order in which things happened at boot time. >>> >> >> which commits do you mean in particular? > > Good question. It might have been cpa as well. I'd have to look > through to confirm the theory, but I think Xen got broken for other > reasons around there, so simple bisect wouldn't be enough to identify > the problem. i'm trying to do bug forensics to figure out where we broke Xen - i really have the feeling that there could be other side-effects of whatever change caused this. Ingo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-04 14:33 ` Ingo Molnar 2008-03-04 15:12 ` Ian Campbell 2008-03-04 15:13 ` Jeremy Fitzhardinge @ 2008-03-04 16:15 ` Mark McLoughlin 2008-03-04 16:23 ` Ingo Molnar 2 siblings, 1 reply; 62+ messages in thread From: Mark McLoughlin @ 2008-03-04 16:15 UTC (permalink / raw) To: Ingo Molnar Cc: Ian Campbell, H. Peter Anvin, Alexander van Heukelum, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Tue, 2008-03-04 at 15:33 +0100, Ingo Molnar wrote: > * Mark McLoughlin <markmc@redhat.com> wrote: > > > > The patch below seems like the right thing to do. It certainly boots > > > in a domU without the DMI problem (without any of the other related > > > patches such as Alexander's). > > > > Yep, this patch on its own fixes 2.6.25-rc3 DomU boot for me here. > > thanks - i picked it up for v2.6.25 merging. Great. > i'm wondering, what triggered this bug, and why didnt we have these > problems in the past? That's been bothering me too, but hadn't come up with anything until I just now looked again. I'm pretty certain that dmi_scan_machine() wasn't using a fixmap before (in 2.6.24) since I had to fix a broken xen dom0 patch to handle this new situation. Looking at 2.6.24 bt_ioremap(), it would have been hitting this: /* * Don't remap the low PCI/ISA area, it's always mapped.. */ if (phys_addr >= ISA_START_ADDRESS && last_addr < ISA_END_ADDRESS) return phys_to_virt(phys_addr); If I restore this behaviour (i.e. remove Ian's e820 hole and add the patch below), then I see us still allocating a page table at 0xf0000 but we don't then try and create a new writable mapping to it later, and everything goes fine. That's a little worrying though, surely? The mapping created by kernel_physical_mapping_init() is writable, so shouldn't Xen barf if you try and use a page within that mapping as a page table? Cheers, Mark. >From c43a8a1badf5347a57d66556a4ba497623d620d4 Mon Sep 17 00:00:00 2001 From: Mark McLoughlin <markmc@redhat.com> Date: Tue, 4 Mar 2008 15:32:03 +0000 Subject: [PATCH] x86: Don't remap the ISA region in early_ioremap() Restore the previous behaviour of bt_ioremap() where it would use the existing mapping of the ISA region rather than using a fixmap. Signed-off-by: Mark McLoughlin <markmc@redhat.com> --- arch/x86/mm/ioremap.c | 16 +++++++++++----- 1 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 882328e..89fdd8f 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -425,6 +425,13 @@ void __init *early_ioremap(unsigned long phys_addr, unsigned long size) return NULL; } + /* + * Don't remap the low PCI/ISA area, it's always mapped.. + */ + if (phys_addr >= ISA_START_ADDRESS && last_addr < ISA_END_ADDRESS) + return phys_to_virt(phys_addr); + + if (nesting >= FIX_BTMAPS_NESTING) { WARN_ON(1); return NULL; @@ -471,6 +478,10 @@ void __init early_iounmap(void *addr, unsigned long size) enum fixed_addresses idx; unsigned int nesting; + virt_addr = (unsigned long)addr; + if (virt_addr < fix_to_virt(FIX_BTMAP_BEGIN)) + return; + nesting = --early_ioremap_nested; WARN_ON(nesting < 0); @@ -480,11 +491,6 @@ void __init early_iounmap(void *addr, unsigned long size) dump_stack(); } - virt_addr = (unsigned long)addr; - if (virt_addr < fix_to_virt(FIX_BTMAP_BEGIN)) { - WARN_ON(1); - return; - } offset = virt_addr & ~PAGE_MASK; nrpages = PAGE_ALIGN(offset + size - 1) >> PAGE_SHIFT; -- 1.5.4.2 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-04 16:15 ` Mark McLoughlin @ 2008-03-04 16:23 ` Ingo Molnar 0 siblings, 0 replies; 62+ messages in thread From: Ingo Molnar @ 2008-03-04 16:23 UTC (permalink / raw) To: Mark McLoughlin Cc: Ian Campbell, H. Peter Anvin, Alexander van Heukelum, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML * Mark McLoughlin <markmc@redhat.com> wrote: > > i'm wondering, what triggered this bug, and why didnt we have these > > problems in the past? > > That's been bothering me too, but hadn't come up with anything until I > just now looked again. > > I'm pretty certain that dmi_scan_machine() wasn't using a fixmap > before (in 2.6.24) since I had to fix a broken xen dom0 patch to > handle this new situation. ah, ok, indeed Jeremy's gut feeling about early_ioremap() was right - we didnt use fixmaps for early_ioremap() [ ex. bt_ioremap() ] in .24. We try to slowly get rid of the linearity assumptions from arch/x86, this was one of the steps. Ingo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-02-28 23:16 ` Ian Campbell 2008-02-29 20:00 ` Ingo Molnar 2008-03-04 11:41 ` Mark McLoughlin @ 2008-03-04 17:44 ` Jeremy Fitzhardinge 2008-03-05 15:59 ` Eduardo Habkost 3 siblings, 0 replies; 62+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-04 17:44 UTC (permalink / raw) To: Ian Campbell Cc: H. Peter Anvin, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Mark McLoughlin, LKML Ian Campbell wrote: > BTW Jeremy, the kernel doesn't use XENMEM_memory_map -- any reason other > than it not being useful at the time? These days the tools can push an > arbitrary e820 down for a guest which might be useful to support, > although nothing interesting is done with it today. No, never looked at that stuff in detail. I'll bear it in mind. J ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-02-28 23:16 ` Ian Campbell ` (2 preceding siblings ...) 2008-03-04 17:44 ` Jeremy Fitzhardinge @ 2008-03-05 15:59 ` Eduardo Habkost 2008-03-05 16:08 ` H. Peter Anvin 2008-03-05 16:38 ` Jeremy Fitzhardinge 3 siblings, 2 replies; 62+ messages in thread From: Eduardo Habkost @ 2008-03-05 15:59 UTC (permalink / raw) To: Ian Campbell Cc: H. Peter Anvin, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML On Thu, Feb 28, 2008 at 11:16:49PM +0000, Ian Campbell wrote: > > The patch below seems like the right thing to do. It certainly boots in > a domU without the DMI problem (without any of the other related patches > such as Alexander's). > > However ddcprobe hangs when run -- need to investigate some more, vm86 > in general works ok (i.e. my vm86 test code passes). > > BTW Jeremy, the kernel doesn't use XENMEM_memory_map -- any reason other > than it not being useful at the time? These days the tools can push an > arbitrary e820 down for a guest which might be useful to support, > although nothing interesting is done with it today. > > Ian. > > x86/xen: Construct e820 map with a hole between 640K-1M. > > Signed-off-by: Ian Campbell <ijc@hellion.org.uk> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: H. Peter Anvin <hpa@zytor.com> > Cc: Jeremy Fitzhardinge <jeremy@goop.org> > --- > arch/x86/xen/setup.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 3bad477..2341492 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -38,7 +38,8 @@ char * __init xen_memory_setup(void) > unsigned long max_pfn = xen_start_info->nr_pages; > > e820.nr_map = 0; > - add_memory_region(0, PFN_PHYS(max_pfn), E820_RAM); > + add_memory_region(0, LOWMEMSIZE(), E820_RAM); > + add_memory_region(HIGH_MEMORY, PFN_PHYS(max_pfn)-HIGH_MEMORY, E820_RAM); Won't this waste 300+ KB of Perfectly Good RAM? Or I understood it incorrectly? I am aware that it would take more work to tell all kernel code that it shouldn't look for BIOS data on this region when running as a domU guest, but it seems that it would be a better solution. -- Eduardo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-05 15:59 ` Eduardo Habkost @ 2008-03-05 16:08 ` H. Peter Anvin 2008-03-05 16:53 ` Eduardo Habkost 2008-03-05 16:38 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 62+ messages in thread From: H. Peter Anvin @ 2008-03-05 16:08 UTC (permalink / raw) To: Eduardo Habkost Cc: Ian Campbell, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML Eduardo Habkost wrote: >> >> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c >> index 3bad477..2341492 100644 >> --- a/arch/x86/xen/setup.c >> +++ b/arch/x86/xen/setup.c >> @@ -38,7 +38,8 @@ char * __init xen_memory_setup(void) >> unsigned long max_pfn = xen_start_info->nr_pages; >> >> e820.nr_map = 0; >> - add_memory_region(0, PFN_PHYS(max_pfn), E820_RAM); >> + add_memory_region(0, LOWMEMSIZE(), E820_RAM); >> + add_memory_region(HIGH_MEMORY, PFN_PHYS(max_pfn)-HIGH_MEMORY, E820_RAM); > > Won't this waste 300+ KB of Perfectly Good RAM? Or I understood it > incorrectly? > > I am aware that it would take more work to tell all kernel code that it > shouldn't look for BIOS data on this region when running as a domU guest, > but it seems that it would be a better solution. > No, the right thing is for Xen to not try to map RAM in this area. -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-05 16:08 ` H. Peter Anvin @ 2008-03-05 16:53 ` Eduardo Habkost 2008-03-05 17:28 ` H. Peter Anvin 0 siblings, 1 reply; 62+ messages in thread From: Eduardo Habkost @ 2008-03-05 16:53 UTC (permalink / raw) To: H. Peter Anvin Cc: Ian Campbell, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML On Wed, Mar 05, 2008 at 08:08:01AM -0800, H. Peter Anvin wrote: > Eduardo Habkost wrote: > >> > >>diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > >>index 3bad477..2341492 100644 > >>--- a/arch/x86/xen/setup.c > >>+++ b/arch/x86/xen/setup.c > >>@@ -38,7 +38,8 @@ char * __init xen_memory_setup(void) > >> unsigned long max_pfn = xen_start_info->nr_pages; > >> > >> e820.nr_map = 0; > >>- add_memory_region(0, PFN_PHYS(max_pfn), E820_RAM); > >>+ add_memory_region(0, LOWMEMSIZE(), E820_RAM); > >>+ add_memory_region(HIGH_MEMORY, PFN_PHYS(max_pfn)-HIGH_MEMORY, > >>E820_RAM); > > > >Won't this waste 300+ KB of Perfectly Good RAM? Or I understood it > >incorrectly? > > > >I am aware that it would take more work to tell all kernel code that it > >shouldn't look for BIOS data on this region when running as a domU guest, > >but it seems that it would be a better solution. > > > > No, the right thing is for Xen to not try to map RAM in this area. Why? If the Xen host is telling us there is valid RAM in this area, why can't we use it? Existing Xen hosts use those physical address ranges for RAM. We can't fix this on the e820 map on the guest side without making otherwise-valid RAM unused. -- Eduardo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-05 16:53 ` Eduardo Habkost @ 2008-03-05 17:28 ` H. Peter Anvin 2008-03-05 17:28 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 62+ messages in thread From: H. Peter Anvin @ 2008-03-05 17:28 UTC (permalink / raw) To: Eduardo Habkost Cc: Ian Campbell, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML Eduardo Habkost wrote: >>> >>> I am aware that it would take more work to tell all kernel code that it >>> shouldn't look for BIOS data on this region when running as a domU guest, >>> but it seems that it would be a better solution. >>> >> No, the right thing is for Xen to not try to map RAM in this area. > > Why? If the Xen host is telling us there is valid RAM in this area, > why can't we use it? > > Existing Xen hosts use those physical address ranges for RAM. We can't > fix this on the e820 map on the guest side without making otherwise-valid > RAM unused. > x86 has a number of historical assumptions, both in kernel and in userspace (consider dmidecode!) Trying to go against them is a losing proposition, *AND* a headache for the maintainers, which will have to consider "oh yes, and Xen does this dumb thing which goes against what all the hardware does." -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-05 17:28 ` H. Peter Anvin @ 2008-03-05 17:28 ` Jeremy Fitzhardinge 2008-03-05 17:38 ` H. Peter Anvin 0 siblings, 1 reply; 62+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-05 17:28 UTC (permalink / raw) To: H. Peter Anvin Cc: Eduardo Habkost, Ian Campbell, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Mark McLoughlin, LKML H. Peter Anvin wrote: > x86 has a number of historical assumptions, both in kernel and in > userspace (consider dmidecode!) Trying to go against them is a losing > proposition, *AND* a headache for the maintainers, which will have to > consider "oh yes, and Xen does this dumb thing which goes against what > all the hardware does." Ahem. "Hardware does this dumb thing that Xen avoids". kthxbye ;) J ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-05 17:28 ` Jeremy Fitzhardinge @ 2008-03-05 17:38 ` H. Peter Anvin 0 siblings, 0 replies; 62+ messages in thread From: H. Peter Anvin @ 2008-03-05 17:38 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Eduardo Habkost, Ian Campbell, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Mark McLoughlin, LKML Jeremy Fitzhardinge wrote: > H. Peter Anvin wrote: >> x86 has a number of historical assumptions, both in kernel and in >> userspace (consider dmidecode!) Trying to go against them is a losing >> proposition, *AND* a headache for the maintainers, which will have to >> consider "oh yes, and Xen does this dumb thing which goes against what >> all the hardware does." > > Ahem. "Hardware does this dumb thing that Xen avoids". kthxbye ;) If hardware was trying to emulate Xen, you'd be right. -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-05 15:59 ` Eduardo Habkost 2008-03-05 16:08 ` H. Peter Anvin @ 2008-03-05 16:38 ` Jeremy Fitzhardinge 2008-03-05 17:27 ` H. Peter Anvin 1 sibling, 1 reply; 62+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-05 16:38 UTC (permalink / raw) To: Eduardo Habkost Cc: Ian Campbell, H. Peter Anvin, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Mark McLoughlin, LKML Eduardo Habkost wrote: > On Thu, Feb 28, 2008 at 11:16:49PM +0000, Ian Campbell wrote: > >> The patch below seems like the right thing to do. It certainly boots in >> a domU without the DMI problem (without any of the other related patches >> such as Alexander's). >> >> However ddcprobe hangs when run -- need to investigate some more, vm86 >> in general works ok (i.e. my vm86 test code passes). >> >> BTW Jeremy, the kernel doesn't use XENMEM_memory_map -- any reason other >> than it not being useful at the time? These days the tools can push an >> arbitrary e820 down for a guest which might be useful to support, >> although nothing interesting is done with it today. >> >> Ian. >> >> x86/xen: Construct e820 map with a hole between 640K-1M. >> >> Signed-off-by: Ian Campbell <ijc@hellion.org.uk> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: H. Peter Anvin <hpa@zytor.com> >> Cc: Jeremy Fitzhardinge <jeremy@goop.org> >> --- >> arch/x86/xen/setup.c | 3 ++- >> 1 files changed, 2 insertions(+), 1 deletions(-) >> >> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c >> index 3bad477..2341492 100644 >> --- a/arch/x86/xen/setup.c >> +++ b/arch/x86/xen/setup.c >> @@ -38,7 +38,8 @@ char * __init xen_memory_setup(void) >> unsigned long max_pfn = xen_start_info->nr_pages; >> >> e820.nr_map = 0; >> - add_memory_region(0, PFN_PHYS(max_pfn), E820_RAM); >> + add_memory_region(0, LOWMEMSIZE(), E820_RAM); >> + add_memory_region(HIGH_MEMORY, PFN_PHYS(max_pfn)-HIGH_MEMORY, E820_RAM); >> > > Won't this waste 300+ KB of Perfectly Good RAM? Or I understood it > incorrectly? > > I am aware that it would take more work to tell all kernel code that it > shouldn't look for BIOS data on this region when running as a domU guest, > but it seems that it would be a better solution. For the moment that's true, but we should be able to release those pages. On the other hand, there's been talk of making Xen hand out memory in physically contigious 2M chunks, which means trying to unmap 300k won't be possible or worthwhile. I suspect real hardware will just waste this memory too (it won't remap that 320k of ram to somewhere else; it will just be shadowed) - which is nothing compared to chipsets which will just throw away 1G to make space for PCI... J ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [RFC] use realmode code to reserve end-of-conventional-memory to 1MB 2008-03-05 16:38 ` Jeremy Fitzhardinge @ 2008-03-05 17:27 ` H. Peter Anvin 0 siblings, 0 replies; 62+ messages in thread From: H. Peter Anvin @ 2008-03-05 17:27 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Eduardo Habkost, Ian Campbell, Alexander van Heukelum, Ingo Molnar, Alexander van Heukelum, Andi Kleen, Thomas Gleixner, Mark McLoughlin, LKML Jeremy Fitzhardinge wrote: >> >> I am aware that it would take more work to tell all kernel code that it >> shouldn't look for BIOS data on this region when running as a domU guest, >> but it seems that it would be a better solution. > > For the moment that's true, but we should be able to release those > pages. On the other hand, there's been talk of making Xen hand out > memory in physically contigious 2M chunks, which means trying to unmap > 300k won't be possible or worthwhile. I suspect real hardware will just > waste this memory too (it won't remap that 320k of ram to somewhere > else; it will just be shadowed) - which is nothing compared to chipsets > which will just throw away 1G to make space for PCI... > Not that 384K is all that much to worry about, but you could just map memory from 2 MB upward, and set the kernel load address to 2 MB (which is good for performance anyway.) There has been talk of making the default kernel load address 16 MB, to keep the DMA zone free. -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit 2008-02-28 13:13 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit Alexander van Heukelum 2008-02-28 13:28 ` [RFC] use realmode code to reserve end-of-conventional-memory to 1MB Alexander van Heukelum @ 2008-02-28 21:09 ` Ian Campbell 2008-02-29 11:49 ` Alexander van Heukelum 1 sibling, 1 reply; 62+ messages in thread From: Ian Campbell @ 2008-02-28 21:09 UTC (permalink / raw) To: Alexander van Heukelum Cc: Ingo Molnar, Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML On Thu, 2008-02-28 at 14:13 +0100, Alexander van Heukelum wrote: > The 32-bit code still uses reserve_bootmem, so this is not really > a unification with the 64-bit version of the ebda reservation code, > but at least it provides the same detection logic and reserves the > same areas. > > This does not crash immediately on qemu. No further testing was > done! Otherwise: I haven't tested extensively either but it does seem to solve the problem for Xen. Thanks! Ian > > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> > > --- > > Hello Ian, Ingo, > > This patch should do the same reservations as the reserve_early > patch for 64 bit. Maybe you could try if this solves the problems > you are seeing with Xen? > > On the other hand, Xen should be able to use those legacy ranges > as normal memory. Why doesn't it work? Does Xen use a special > bootloader? In that case it could just add the reservation to > the e820 memory map that is provided. > > Another possible approach to the whole problem is just to ammend > the e820 memory map in the boot-code. Thanks to hpa, it is now easy > to change this code. Then the kernel can just trust the e820 memory > map, and only 'sophisticated' bootloaders need to do something > similar by themselves. > > Greetings, > Alexander > > diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c > index a1d7071..e12cc31 100644 > --- a/arch/x86/kernel/setup_32.c > +++ b/arch/x86/kernel/setup_32.c > @@ -385,15 +385,47 @@ unsigned long __init find_max_low_pfn(void) > return max_low_pfn; > } > > +#define BIOS_EBDA_SEGMENT 0x40E > +#define BIOS_LOWMEM_KILOBYTES 0x413 > + > /* > - * workaround for Dell systems that neglect to reserve EBDA > + * The BIOS places the EBDA/XBDA at the top of conventional > + * memory, and usually decreases the reported amount of > + * conventional memory (int 0x12) too. This also contains a > + * workaround for Dell systems that neglect to reserve EBDA. > + * The same workaround also avoids a problem with the AMD768MPX > + * chipset: reserve a page before VGA to prevent PCI prefetch > + * into it (errata #56). Usually the page is reserved anyways, > + * unless you have no PS/2 mouse plugged in. > */ > static void __init reserve_ebda_region(void) > { > - unsigned int addr; > - addr = get_bios_ebda(); > - if (addr) > - reserve_bootmem(addr, PAGE_SIZE, BOOTMEM_DEFAULT); > + unsigned int lowmem, ebda_addr; > + > + /* end of low (conventional) memory */ > + lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); > + lowmem <<= 10; > + > + /* start of EBDA area */ > + ebda_addr = *(unsigned short *)__va(BIOS_EBDA_SEGMENT); > + ebda_addr <<= 4; > + > + /* Fixup: bios puts an EBDA in the top 64K segment */ > + /* of conventional memory, but does not adjust lowmem. */ > + if ((lowmem - ebda_addr) <= 0x10000) > + lowmem = ebda_addr; > + > + /* Fixup: bios does not report an EBDA at all. */ > + /* Some old Dells seem to need 4k anyhow (bugzilla 2990) */ > + if ((ebda_addr == 0) && (lowmem >= 0x9f000)) > + lowmem = 0x9f000; > + > + /* Paranoia: should never happen, but... */ > + if (lowmem >= 0x100000) > + lowmem = 0xa0000; > + > + /* reserve all memory between lowmem and the 1MB mark */ > + reserve_bootmem(lowmem, 0x100000 - lowmem, BOOTMEM_DEFAULT); > } > > #ifndef CONFIG_NEED_MULTIPLE_NODES > @@ -619,16 +651,9 @@ void __init setup_bootmem_allocator(void) > */ > reserve_bootmem(0, PAGE_SIZE, BOOTMEM_DEFAULT); > > - /* reserve EBDA region, it's a 4K region */ > + /* reserve EBDA region */ > reserve_ebda_region(); > > - /* could be an AMD 768MPX chipset. Reserve a page before VGA to prevent > - PCI prefetch into it (errata #56). Usually the page is reserved anyways, > - unless you have no PS/2 mouse plugged in. */ > - if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && > - boot_cpu_data.x86 == 6) > - reserve_bootmem(0xa0000 - 4096, 4096, BOOTMEM_DEFAULT); > - > #ifdef CONFIG_SMP > /* > * But first pinch a few for the stack/trampoline stuff > -- Ian Campbell In spite of everything, I still believe that people are good at heart. -- Anne Frank ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit 2008-02-28 21:09 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit Ian Campbell @ 2008-02-29 11:49 ` Alexander van Heukelum 2008-02-29 17:14 ` Mark McLoughlin 0 siblings, 1 reply; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-29 11:49 UTC (permalink / raw) To: Ingo Molnar Cc: Ian Campbell, Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, Mark McLoughlin, LKML This patch adds explicit detection of the EBDA and reservation of the rom and adapter address space 0xa0000-0x100000 to the i386 kernels. It uses reserve_bootmem instead of reserve_early, because reserve_early is not yet available on i386. Before this patch, the EBDA size was hardcoded as 4Kb. Also, the reservation of the adapter range was done by modifying the e820 map which is now not necessary any longer, and the code is removed from copy_e820_map. The changes in e820_64.c are only a change in the comment above copy_e820_map, and some changes of the types of local variables in that function such that the 32 and 64 bit versions become equal. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- On Thu, Feb 28, 2008 at 09:09:56PM +0000, Ian Campbell wrote: > On Thu, 2008-02-28 at 14:13 +0100, Alexander van Heukelum wrote: > > The 32-bit code still uses reserve_bootmem, so this is not really > > a unification with the 64-bit version of the ebda reservation code, > > but at least it provides the same detection logic and reserves the > > same areas. > > > > This does not crash immediately on qemu. No further testing was > > done! Otherwise: > > I haven't tested extensively either but it does seem to solve the > problem for Xen. > > Thanks! > Ian Thank you! Ingo, I think this is ready for -x86#testing. It boots to a small userspace in qemu (i386). If I should separate the cleanups, let me know. Greetings, Alexander arch/x86/kernel/e820_32.c | 30 +++++++------------------- arch/x86/kernel/e820_64.c | 15 +++++++++--- arch/x86/kernel/setup_32.c | 51 ++++++++++++++++++++++++++++++++----------- 3 files changed, 57 insertions(+), 39 deletions(-) diff --git a/arch/x86/kernel/e820_32.c b/arch/x86/kernel/e820_32.c index 4e16ef4..8ea7db2 100644 --- a/arch/x86/kernel/e820_32.c +++ b/arch/x86/kernel/e820_32.c @@ -444,44 +444,30 @@ int __init sanitize_e820_map(struct e820entry * biosmap, char * pnr_map) * set up memory. If we aren't, we'll fake a memory map. * * We check to see that the memory map contains at least 2 elements - * before we'll use it, because the detection code in setup.S may - * not be perfect and most every PC known to man has two memory + * before we'll use it, because the detection code in boot/memory.c + * may not be perfect and every PC known to man has two memory * regions: one from 0 to 640k, and one from 1mb up. (The IBM * thinkpad 560x, for example, does not cooperate with the memory * detection code.) */ -int __init copy_e820_map(struct e820entry * biosmap, int nr_map) +int __init copy_e820_map(struct e820entry *biosmap, int nr_map) { /* Only one memory region (or negative)? Ignore it */ if (nr_map < 2) return -1; do { - unsigned long long start = biosmap->addr; - unsigned long long size = biosmap->size; - unsigned long long end = start + size; - unsigned long type = biosmap->type; + u64 start = biosmap->addr; + u64 size = biosmap->size; + u64 end = start + size; + u32 type = biosmap->type; /* Overflow in 64 bits? Ignore the memory map. */ if (start > end) return -1; - /* - * Some BIOSes claim RAM in the 640k - 1M region. - * Not right. Fix it up. - */ - if (type == E820_RAM) { - if (start < 0x100000ULL && end > 0xA0000ULL) { - if (start < 0xA0000ULL) - add_memory_region(start, 0xA0000ULL-start, type); - if (end <= 0x100000ULL) - continue; - start = 0x100000ULL; - size = end - start; - } - } add_memory_region(start, size, type); - } while (biosmap++,--nr_map); + } while (biosmap++, --nr_map); return 0; } diff --git a/arch/x86/kernel/e820_64.c b/arch/x86/kernel/e820_64.c index dd68cfd..e7f0400 100644 --- a/arch/x86/kernel/e820_64.c +++ b/arch/x86/kernel/e820_64.c @@ -613,6 +613,13 @@ static int __init sanitize_e820_map(struct e820entry *biosmap, char *pnr_map) * If we're lucky and live on a modern system, the setup code * will have given us a memory map that we can use to properly * set up memory. If we aren't, we'll fake a memory map. + * + * We check to see that the memory map contains at least 2 elements + * before we'll use it, because the detection code in boot/memory.c + * may not be perfect and every PC known to man has two memory + * regions: one from 0 to 640k, and one from 1mb up. (The IBM + * thinkpad 560x, for example, does not cooperate with the memory + * detection code.) */ static int __init copy_e820_map(struct e820entry *biosmap, int nr_map) { @@ -621,10 +628,10 @@ static int __init copy_e820_map(struct e820entry *biosmap, int nr_map) return -1; do { - unsigned long start = biosmap->addr; - unsigned long size = biosmap->size; - unsigned long end = start + size; - unsigned long type = biosmap->type; + u64 start = biosmap->addr; + u64 size = biosmap->size; + u64 end = start + size; + u32 type = biosmap->type; /* Overflow in 64 bits? Ignore the memory map. */ if (start > end) diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c index a1d7071..e12cc31 100644 --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -385,15 +385,47 @@ unsigned long __init find_max_low_pfn(void) return max_low_pfn; } +#define BIOS_EBDA_SEGMENT 0x40E +#define BIOS_LOWMEM_KILOBYTES 0x413 + /* - * workaround for Dell systems that neglect to reserve EBDA + * The BIOS places the EBDA/XBDA at the top of conventional + * memory, and usually decreases the reported amount of + * conventional memory (int 0x12) too. This also contains a + * workaround for Dell systems that neglect to reserve EBDA. + * The same workaround also avoids a problem with the AMD768MPX + * chipset: reserve a page before VGA to prevent PCI prefetch + * into it (errata #56). Usually the page is reserved anyways, + * unless you have no PS/2 mouse plugged in. */ static void __init reserve_ebda_region(void) { - unsigned int addr; - addr = get_bios_ebda(); - if (addr) - reserve_bootmem(addr, PAGE_SIZE, BOOTMEM_DEFAULT); + unsigned int lowmem, ebda_addr; + + /* end of low (conventional) memory */ + lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); + lowmem <<= 10; + + /* start of EBDA area */ + ebda_addr = *(unsigned short *)__va(BIOS_EBDA_SEGMENT); + ebda_addr <<= 4; + + /* Fixup: bios puts an EBDA in the top 64K segment */ + /* of conventional memory, but does not adjust lowmem. */ + if ((lowmem - ebda_addr) <= 0x10000) + lowmem = ebda_addr; + + /* Fixup: bios does not report an EBDA at all. */ + /* Some old Dells seem to need 4k anyhow (bugzilla 2990) */ + if ((ebda_addr == 0) && (lowmem >= 0x9f000)) + lowmem = 0x9f000; + + /* Paranoia: should never happen, but... */ + if (lowmem >= 0x100000) + lowmem = 0xa0000; + + /* reserve all memory between lowmem and the 1MB mark */ + reserve_bootmem(lowmem, 0x100000 - lowmem, BOOTMEM_DEFAULT); } #ifndef CONFIG_NEED_MULTIPLE_NODES @@ -619,16 +651,9 @@ void __init setup_bootmem_allocator(void) */ reserve_bootmem(0, PAGE_SIZE, BOOTMEM_DEFAULT); - /* reserve EBDA region, it's a 4K region */ + /* reserve EBDA region */ reserve_ebda_region(); - /* could be an AMD 768MPX chipset. Reserve a page before VGA to prevent - PCI prefetch into it (errata #56). Usually the page is reserved anyways, - unless you have no PS/2 mouse plugged in. */ - if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && - boot_cpu_data.x86 == 6) - reserve_bootmem(0xa0000 - 4096, 4096, BOOTMEM_DEFAULT); - #ifdef CONFIG_SMP /* * But first pinch a few for the stack/trampoline stuff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit 2008-02-29 11:49 ` Alexander van Heukelum @ 2008-02-29 17:14 ` Mark McLoughlin 2008-02-29 18:38 ` Alexander van Heukelum 0 siblings, 1 reply; 62+ messages in thread From: Mark McLoughlin @ 2008-02-29 17:14 UTC (permalink / raw) To: Alexander van Heukelum Cc: Ingo Molnar, Ian Campbell, Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Fri, 2008-02-29 at 12:49 +0100, Alexander van Heukelum wrote: > This patch adds explicit detection of the EBDA and reservation > of the rom and adapter address space 0xa0000-0x100000 to the > i386 kernels. It uses reserve_bootmem instead of reserve_early, > because reserve_early is not yet available on i386. > > Before this patch, the EBDA size was hardcoded as 4Kb. Also, the > reservation of the adapter range was done by modifying the e820 > map which is now not necessary any longer, and the code is removed > from copy_e820_map. > > The changes in e820_64.c are only a change in the comment above > copy_e820_map, and some changes of the types of local variables > in that function such that the 32 and 64 bit versions become equal. > > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> > > --- > > On Thu, Feb 28, 2008 at 09:09:56PM +0000, Ian Campbell wrote: > > On Thu, 2008-02-28 at 14:13 +0100, Alexander van Heukelum wrote: > > > The 32-bit code still uses reserve_bootmem, so this is not really > > > a unification with the 64-bit version of the ebda reservation code, > > > but at least it provides the same detection logic and reserves the > > > same areas. > > > > > > This does not crash immediately on qemu. No further testing was > > > done! Otherwise: > > > > I haven't tested extensively either but it does seem to solve the > > problem for Xen. > > > > Thanks! > > Ian > > Thank you! > > Ingo, > > I think this is ready for -x86#testing. > It boots to a small userspace in qemu (i386). > If I should separate the cleanups, let me know. I haven't investigated in any detail, but with 2.6.25-rc3 and your patch I'm seeing a Xen guest hit this BUG: void __init smp_alloc_memory(void) { trampoline_base = alloc_bootmem_low_pages(PAGE_SIZE); /* * Has to be in very low memory so we can execute * real-mode AP code. */ if (__pa(trampoline_base) >= 0x9F000) BUG(); } Stack looks like: [<c137ef97>] smp_alloc_memory+0x25 <-- [<c137ef97>] smp_alloc_memory+0x25 [<c137a500>] setup_arch+0x28e [<c13735f7>] start_kernel+0x7a [<c1379240>] xen_start_kernel+0x300 Cheers, Mark. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit 2008-02-29 17:14 ` Mark McLoughlin @ 2008-02-29 18:38 ` Alexander van Heukelum 2008-02-29 18:44 ` H. Peter Anvin 2008-02-29 22:06 ` Mark McLoughlin 0 siblings, 2 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-29 18:38 UTC (permalink / raw) To: Mark McLoughlin, Alexander van Heukelum Cc: Ingo Molnar, Ian Campbell, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Fri, 29 Feb 2008 17:14:07 +0000, "Mark McLoughlin" <markmc@redhat.com> said: > On Fri, 2008-02-29 at 12:49 +0100, Alexander van Heukelum wrote: > > This patch adds explicit detection of the EBDA and reservation > > of the rom and adapter address space 0xa0000-0x100000 to the > > i386 kernels. It uses reserve_bootmem instead of reserve_early, > > because reserve_early is not yet available on i386. > > > > Before this patch, the EBDA size was hardcoded as 4Kb. Also, the > > reservation of the adapter range was done by modifying the e820 > > map which is now not necessary any longer, and the code is removed > > from copy_e820_map. > > > > The changes in e820_64.c are only a change in the comment above > > copy_e820_map, and some changes of the types of local variables > > in that function such that the 32 and 64 bit versions become equal. > > > > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> > > > > --- > > > > On Thu, Feb 28, 2008 at 09:09:56PM +0000, Ian Campbell wrote: > > > On Thu, 2008-02-28 at 14:13 +0100, Alexander van Heukelum wrote: > > > > The 32-bit code still uses reserve_bootmem, so this is not really > > > > a unification with the 64-bit version of the ebda reservation code, > > > > but at least it provides the same detection logic and reserves the > > > > same areas. > > > > > > > > This does not crash immediately on qemu. No further testing was > > > > done! Otherwise: > > > > > > I haven't tested extensively either but it does seem to solve the > > > problem for Xen. > > > > > > Thanks! > > > Ian > > > > Thank you! > > > > Ingo, > > > > I think this is ready for -x86#testing. > > It boots to a small userspace in qemu (i386). > > If I should separate the cleanups, let me know. > > I haven't investigated in any detail, but with 2.6.25-rc3 and your > patch I'm seeing a Xen guest hit this BUG: > > void __init smp_alloc_memory(void) > { > trampoline_base = alloc_bootmem_low_pages(PAGE_SIZE); > /* > * Has to be in very low memory so we can execute > * real-mode AP code. > */ > if (__pa(trampoline_base) >= 0x9F000) > BUG(); > } > > Stack looks like: > > [<c137ef97>] smp_alloc_memory+0x25 <-- > [<c137ef97>] smp_alloc_memory+0x25 > [<c137a500>] setup_arch+0x28e > [<c13735f7>] start_kernel+0x7a > [<c1379240>] xen_start_kernel+0x300 Ouch. My first guess is that the BIOS data area is completely non-existent for Xen. Is it guaranteed that the memory is zeroed out on boot? In that case we can special-case it easily: change: /* Paranoia: should never happen, but... */ if (lowmem >= 0x100000) lowmem = 0xa0000; into: /* Strange case, like Xen ;) */ if (lowmem == 0 || lowmem >= 0x100000) lowmem = 0x9f000; Can you test that? Greetings, Alexander -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - Send your email first class ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit 2008-02-29 18:38 ` Alexander van Heukelum @ 2008-02-29 18:44 ` H. Peter Anvin 2008-02-29 18:56 ` Alexander van Heukelum 2008-02-29 22:06 ` Mark McLoughlin 1 sibling, 1 reply; 62+ messages in thread From: H. Peter Anvin @ 2008-02-29 18:44 UTC (permalink / raw) To: Alexander van Heukelum Cc: Mark McLoughlin, Alexander van Heukelum, Ingo Molnar, Ian Campbell, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML Alexander van Heukelum wrote: > > My first guess is that the BIOS data area is completely non-existent for > Xen. > Is it guaranteed that the memory is zeroed out on boot? In that case we > can > special-case it easily: > > change: > /* Paranoia: should never happen, but... */ > if (lowmem >= 0x100000) > lowmem = 0xa0000; > > into: > /* Strange case, like Xen ;) */ > if (lowmem == 0 || lowmem >= 0x100000) > lowmem = 0x9f000; > > Can you test that? > The EBDA is optional anyway; I presume it should have a zero pointer if it isn't present. -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit 2008-02-29 18:44 ` H. Peter Anvin @ 2008-02-29 18:56 ` Alexander van Heukelum 0 siblings, 0 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-29 18:56 UTC (permalink / raw) To: H. Peter Anvin Cc: Mark McLoughlin, Alexander van Heukelum, Ingo Molnar, Ian Campbell, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Fri, 29 Feb 2008 10:44:36 -0800, "H. Peter Anvin" <hpa@zytor.com> said: > Alexander van Heukelum wrote: > > > > My first guess is that the BIOS data area is completely non-existent for > > Xen. > > Is it guaranteed that the memory is zeroed out on boot? In that case we > > can > > special-case it easily: > > > > change: > > /* Paranoia: should never happen, but... */ > > if (lowmem >= 0x100000) > > lowmem = 0xa0000; > > > > into: > > /* Strange case, like Xen ;) */ > > if (lowmem == 0 || lowmem >= 0x100000) > > lowmem = 0x9f000; > > > > Can you test that? > > > > The EBDA is optional anyway; I presume it should have a zero pointer if > it isn't present. Correct, but the value that indicates the size of the conventional memory area must be set. At least on any real hardware. My guess is that both values read as 0 on Xen, which causes 0-0x100000 to be reserved. If that is always the case the workaround is fine, I think. Alexander > -hpa -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - mmm... Fastmail... ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit 2008-02-29 18:38 ` Alexander van Heukelum 2008-02-29 18:44 ` H. Peter Anvin @ 2008-02-29 22:06 ` Mark McLoughlin 2008-02-29 22:26 ` Alexander van Heukelum 2008-03-01 16:09 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Alexander van Heukelum 1 sibling, 2 replies; 62+ messages in thread From: Mark McLoughlin @ 2008-02-29 22:06 UTC (permalink / raw) To: Alexander van Heukelum Cc: Alexander van Heukelum, Ingo Molnar, Ian Campbell, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Fri, 2008-02-29 at 19:38 +0100, Alexander van Heukelum wrote: > On Fri, 29 Feb 2008 17:14:07 +0000, "Mark McLoughlin" > <markmc@redhat.com> said: > > > I haven't investigated in any detail, but with 2.6.25-rc3 and your > > patch I'm seeing a Xen guest hit this BUG: > > > > void __init smp_alloc_memory(void) > > { > > trampoline_base = alloc_bootmem_low_pages(PAGE_SIZE); > > /* > > * Has to be in very low memory so we can execute > > * real-mode AP code. > > */ > > if (__pa(trampoline_base) >= 0x9F000) > > BUG(); > > } > > > > Stack looks like: > > > > [<c137ef97>] smp_alloc_memory+0x25 <-- > > [<c137ef97>] smp_alloc_memory+0x25 > > [<c137a500>] setup_arch+0x28e > > [<c13735f7>] start_kernel+0x7a > > [<c1379240>] xen_start_kernel+0x300 > > Ouch. > > My first guess is that the BIOS data area is completely non-existent for > Xen. > Is it guaranteed that the memory is zeroed out on boot? Yep, that seems to be the case > In that case we can special-case it easily: > > change: > /* Paranoia: should never happen, but... */ > if (lowmem >= 0x100000) > lowmem = 0xa0000; > > into: > /* Strange case, like Xen ;) */ > if (lowmem == 0 || lowmem >= 0x100000) > lowmem = 0x9f000; > > Can you test that? Yes, that fixes boot for me. Thanks, Mark. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit 2008-02-29 22:06 ` Mark McLoughlin @ 2008-02-29 22:26 ` Alexander van Heukelum 2008-03-01 16:09 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Alexander van Heukelum 1 sibling, 0 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-02-29 22:26 UTC (permalink / raw) To: Mark McLoughlin Cc: Alexander van Heukelum, Ingo Molnar, Ian Campbell, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Fri, 29 Feb 2008 22:06:59 +0000, "Mark McLoughlin" <markmc@redhat.com> said: > On Fri, 2008-02-29 at 19:38 +0100, Alexander van Heukelum wrote: > > My first guess is that the BIOS data area is completely non-existent for > > Xen. Is it guaranteed that the memory is zeroed out on boot? > > Yep, that seems to be the case > > > In that case we can special-case it easily: > > > > change: > > /* Paranoia: should never happen, but... */ > > if (lowmem >= 0x100000) > > lowmem = 0xa0000; > > > > into: > > /* Strange case, like Xen ;) */ > > if (lowmem == 0 || lowmem >= 0x100000) > > lowmem = 0x9f000; > > > > Can you test that? > > Yes, that fixes boot for me. Thanks for testing this. I'ld rather not count on Xen providing zeroed memory, though. I'll try to find a different solution. It's weekend anyhow ;). Greetings, Alexander -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - Or how I learned to stop worrying and love email again ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-02-29 22:06 ` Mark McLoughlin 2008-02-29 22:26 ` Alexander van Heukelum @ 2008-03-01 16:09 ` Alexander van Heukelum 2008-03-01 16:12 ` [PATCH] reserve end-of-conventional-memory to 1MB on 64-bit add-on Alexander van Heukelum 2008-03-04 11:44 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Mark McLoughlin 1 sibling, 2 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-03-01 16:09 UTC (permalink / raw) To: Ingo Molnar, Ian Campbell, Mark McLoughlin Cc: Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML This patch adds explicit detection of the EBDA and reservation of the rom and adapter address space 0xa0000-0x100000 to the i386 kernels. Before this patch, the EBDA size was hardcoded as 4Kb. Also, the reservation of the adapter range was done by modifying the e820 map which is now not necessary any longer, and that code is removed from copy_e820_map. The amount of conventional memory and the start of the EBDA are detected by reading the BIOS data area directly. Paravirtual environments do not provide this area, so we bail out early in that case. They will just have to set up a correct memory map to start with. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- Hi Ingo, This is the second attempt at a i386-version of the ebda patch. I hope that one of the Xen people will be able to check that this does not break their setups, but I think it will be fine after their patch to exclude the 0x9f000- 0x100000 area explicitly in their setup. Greetings, Alexander diff --git a/arch/x86/kernel/e820_32.c b/arch/x86/kernel/e820_32.c index 4e16ef4..746d683 100644 --- a/arch/x86/kernel/e820_32.c +++ b/arch/x86/kernel/e820_32.c @@ -450,38 +450,25 @@ int __init sanitize_e820_map(struct e820entry * biosmap, char * pnr_map) * thinkpad 560x, for example, does not cooperate with the memory * detection code.) */ -int __init copy_e820_map(struct e820entry * biosmap, int nr_map) +int __init copy_e820_map(struct e820entry *biosmap, int nr_map) { /* Only one memory region (or negative)? Ignore it */ if (nr_map < 2) return -1; do { - unsigned long long start = biosmap->addr; - unsigned long long size = biosmap->size; - unsigned long long end = start + size; - unsigned long type = biosmap->type; + u64 start = biosmap->addr; + u64 size = biosmap->size; + u64 end = start + size; + u32 type = biosmap->type; /* Overflow in 64 bits? Ignore the memory map. */ if (start > end) return -1; - /* - * Some BIOSes claim RAM in the 640k - 1M region. - * Not right. Fix it up. - */ - if (type == E820_RAM) { - if (start < 0x100000ULL && end > 0xA0000ULL) { - if (start < 0xA0000ULL) - add_memory_region(start, 0xA0000ULL-start, type); - if (end <= 0x100000ULL) - continue; - start = 0x100000ULL; - size = end - start; - } - } add_memory_region(start, size, type); - } while (biosmap++,--nr_map); + } while (biosmap++, --nr_map); + return 0; } diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c index a1d7071..20e537b 100644 --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -385,15 +385,60 @@ unsigned long __init find_max_low_pfn(void) return max_low_pfn; } +#define BIOS_EBDA_SEGMENT 0x40E +#define BIOS_LOWMEM_KILOBYTES 0x413 + /* - * workaround for Dell systems that neglect to reserve EBDA + * The BIOS places the EBDA/XBDA at the top of conventional + * memory, and usually decreases the reported amount of + * conventional memory (int 0x12) too. This also contains a + * workaround for Dell systems that neglect to reserve EBDA. + * The same workaround also avoids a problem with the AMD768MPX + * chipset: reserve a page before VGA to prevent PCI prefetch + * into it (errata #56). Usually the page is reserved anyways, + * unless you have no PS/2 mouse plugged in. */ static void __init reserve_ebda_region(void) { - unsigned int addr; - addr = get_bios_ebda(); - if (addr) - reserve_bootmem(addr, PAGE_SIZE, BOOTMEM_DEFAULT); + unsigned int lowmem, ebda_addr; + + /* To determine the position of the EBDA and the */ + /* end of conventional memory, we need to look at */ + /* the BIOS data area. In a paravirtual environment */ + /* that area is absent. We'll just have to assume */ + /* that the paravirt case can handle memory setup */ + /* correctly, without our help. */ +#ifdef CONFIG_PARAVIRT + if ((boot_params.hdr.version >= 0x207) && + (boot_params.hdr.hardware_subarch != 0)) { + return; + } +#endif + + /* end of low (conventional) memory */ + lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); + lowmem <<= 10; + + /* start of EBDA area */ + ebda_addr = *(unsigned short *)__va(BIOS_EBDA_SEGMENT); + ebda_addr <<= 4; + + /* Fixup: bios puts an EBDA in the top 64K segment */ + /* of conventional memory, but does not adjust lowmem. */ + if ((lowmem - ebda_addr) <= 0x10000) + lowmem = ebda_addr; + + /* Fixup: bios does not report an EBDA at all. */ + /* Some old Dells seem to need 4k anyhow (bugzilla 2990) */ + if ((ebda_addr == 0) && (lowmem >= 0x9f000)) + lowmem = 0x9f000; + + /* Paranoia: should never happen, but... */ + if ((lowmem == 0) || (lowmem >= 0x100000)) + lowmem = 0x9f000; + + /* reserve all memory between lowmem and the 1MB mark */ + reserve_bootmem(lowmem, 0x100000 - lowmem, BOOTMEM_DEFAULT); } #ifndef CONFIG_NEED_MULTIPLE_NODES @@ -619,16 +664,9 @@ void __init setup_bootmem_allocator(void) */ reserve_bootmem(0, PAGE_SIZE, BOOTMEM_DEFAULT); - /* reserve EBDA region, it's a 4K region */ + /* reserve EBDA region */ reserve_ebda_region(); - /* could be an AMD 768MPX chipset. Reserve a page before VGA to prevent - PCI prefetch into it (errata #56). Usually the page is reserved anyways, - unless you have no PS/2 mouse plugged in. */ - if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && - boot_cpu_data.x86 == 6) - reserve_bootmem(0xa0000 - 4096, 4096, BOOTMEM_DEFAULT); - #ifdef CONFIG_SMP /* * But first pinch a few for the stack/trampoline stuff ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 64-bit add-on 2008-03-01 16:09 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Alexander van Heukelum @ 2008-03-01 16:12 ` Alexander van Heukelum 2008-03-04 11:44 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Mark McLoughlin 1 sibling, 0 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-03-01 16:12 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander van Heukelum, Ian Campbell, Mark McLoughlin, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML This patch is an add-on to the 64-bit ebda patch. It makes the functions reserve_ebda_region (renamed from reserve_ebda) and copy_e820_map equal to the 32-bit versions of the previous patch. Changes: Use u64 and u32 for local variables in copy_e820_map. The amount of conventional memory and the start of the EBDA are detected by reading the BIOS data area directly. Paravirtual environments do not provide this area, so we bail out early in that case. They will just have to set up a correct memory map to start with. Add a safety net for zeroed out BIOS data area. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- diff --git a/arch/x86/kernel/e820_32.c b/arch/x86/kernel/e820_32.c diff --git a/arch/x86/kernel/e820_64.c b/arch/x86/kernel/e820_64.c index dd68cfd..baa4223 100644 --- a/arch/x86/kernel/e820_64.c +++ b/arch/x86/kernel/e820_64.c @@ -621,10 +621,10 @@ static int __init copy_e820_map(struct e820entry *biosmap, int nr_map) return -1; do { - unsigned long start = biosmap->addr; - unsigned long size = biosmap->size; - unsigned long end = start + size; - unsigned long type = biosmap->type; + u64 start = biosmap->addr; + u64 size = biosmap->size; + u64 end = start + size; + u32 type = biosmap->type; /* Overflow in 64 bits? Ignore the memory map. */ if (start > end) diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index b684552..269a6b4 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -55,12 +55,30 @@ static void __init copy_bootdata(char *real_mode_data) /* * The BIOS places the EBDA/XBDA at the top of conventional * memory, and usually decreases the reported amount of - * conventional memory (int 0x12) too. + * conventional memory (int 0x12) too. This also contains a + * workaround for Dell systems that neglect to reserve EBDA. + * The same workaround also avoids a problem with the AMD768MPX + * chipset: reserve a page before VGA to prevent PCI prefetch + * into it (errata #56). Usually the page is reserved anyways, + * unless you have no PS/2 mouse plugged in. */ -static __init void reserve_ebda(void) +static void __init reserve_ebda_region(void) { unsigned int lowmem, ebda_addr; + /* To determine the position of the EBDA and the */ + /* end of conventional memory, we need to look at */ + /* the BIOS data area. In a paravirtual environment */ + /* that area is absent. We'll just have to assume */ + /* that the paravirt case can handle memory setup */ + /* correctly, without our help. */ +#ifdef CONFIG_PARAVIRT + if ((boot_params.hdr.version >= 0x207) && + (boot_params.hdr.hardware_subarch != 0)) { + return; + } +#endif + /* end of low (conventional) memory */ lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); lowmem <<= 10; @@ -80,8 +98,8 @@ static __init void reserve_ebda(void) lowmem = 0x9f000; /* Paranoia: should never happen, but... */ - if (lowmem >= 0x100000) - lowmem = 0xa0000; + if ((lowmem == 0) || (lowmem >= 0x100000)) + lowmem = 0x9f000; /* reserve all memory between lowmem and the 1MB mark */ reserve_early(lowmem, 0x100000, "BIOS reserved"); @@ -140,7 +158,7 @@ void __init x86_64_start_kernel(char * real_mode_data) reserve_early(ramdisk_image, ramdisk_end, "RAMDISK"); } - reserve_ebda(); + reserve_ebda_region(); /* * At this point everything still needed from the boot loader diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-03-01 16:09 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Alexander van Heukelum 2008-03-01 16:12 ` [PATCH] reserve end-of-conventional-memory to 1MB on 64-bit add-on Alexander van Heukelum @ 2008-03-04 11:44 ` Mark McLoughlin 2008-03-04 13:31 ` Alexander van Heukelum ` (2 more replies) 1 sibling, 3 replies; 62+ messages in thread From: Mark McLoughlin @ 2008-03-04 11:44 UTC (permalink / raw) To: Alexander van Heukelum Cc: Ingo Molnar, Ian Campbell, Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Sat, 2008-03-01 at 17:09 +0100, Alexander van Heukelum wrote: > This patch adds explicit detection of the EBDA and reservation > of the rom and adapter address space 0xa0000-0x100000 to the > i386 kernels. Before this patch, the EBDA size was hardcoded > as 4Kb. Also, the reservation of the adapter range was done by > modifying the e820 map which is now not necessary any longer, > and that code is removed from copy_e820_map. > > The amount of conventional memory and the start of the EBDA are > detected by reading the BIOS data area directly. Paravirtual > environments do not provide this area, so we bail out early > in that case. They will just have to set up a correct memory > map to start with. > > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> > > --- > > Hi Ingo, > > This is the second attempt at a i386-version of the ebda > patch. I hope that one of the Xen people will be able to > check that this does not break their setups, but I think > it will be fine after their patch to exclude the 0x9f000- > 0x100000 area explicitly in their setup. Confirmed that with Ian's e820 map patch and your patch, Xen DomU boots fine. > diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c > index a1d7071..20e537b 100644 > --- a/arch/x86/kernel/setup_32.c > +++ b/arch/x86/kernel/setup_32.c > @@ -385,15 +385,60 @@ unsigned long __init find_max_low_pfn(void) > return max_low_pfn; > } > > +#define BIOS_EBDA_SEGMENT 0x40E > +#define BIOS_LOWMEM_KILOBYTES 0x413 > + > /* > - * workaround for Dell systems that neglect to reserve EBDA > + * The BIOS places the EBDA/XBDA at the top of conventional > + * memory, and usually decreases the reported amount of > + * conventional memory (int 0x12) too. This also contains a > + * workaround for Dell systems that neglect to reserve EBDA. > + * The same workaround also avoids a problem with the AMD768MPX > + * chipset: reserve a page before VGA to prevent PCI prefetch > + * into it (errata #56). Usually the page is reserved anyways, > + * unless you have no PS/2 mouse plugged in. > */ > static void __init reserve_ebda_region(void) > { > - unsigned int addr; > - addr = get_bios_ebda(); > - if (addr) > - reserve_bootmem(addr, PAGE_SIZE, BOOTMEM_DEFAULT); > + unsigned int lowmem, ebda_addr; > + > + /* To determine the position of the EBDA and the */ > + /* end of conventional memory, we need to look at */ > + /* the BIOS data area. In a paravirtual environment */ > + /* that area is absent. We'll just have to assume */ > + /* that the paravirt case can handle memory setup */ > + /* correctly, without our help. */ > +#ifdef CONFIG_PARAVIRT > + if ((boot_params.hdr.version >= 0x207) && > + (boot_params.hdr.hardware_subarch != 0)) { > + return; > + } > +#endif This is a bit magic, is it worth splitting it out as something like is_paravirt_environment() ? Cheers, Mark. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-03-04 11:44 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Mark McLoughlin @ 2008-03-04 13:31 ` Alexander van Heukelum 2008-03-04 14:49 ` Ingo Molnar 2008-03-04 15:18 ` Jeremy Fitzhardinge 2 siblings, 0 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-03-04 13:31 UTC (permalink / raw) To: Mark McLoughlin, H. Peter Anvin Cc: Ingo Molnar, Ian Campbell, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML, Alexander van Heukelum On Tue, 04 Mar 2008 11:44:42 +0000, "Mark McLoughlin" <markmc@redhat.com> said: > On Sat, 2008-03-01 at 17:09 +0100, Alexander van Heukelum wrote: > > This patch adds explicit detection of the EBDA and reservation > > of the rom and adapter address space 0xa0000-0x100000 to the > > i386 kernels. Before this patch, the EBDA size was hardcoded > > as 4Kb. Also, the reservation of the adapter range was done by > > modifying the e820 map which is now not necessary any longer, > > and that code is removed from copy_e820_map. > > > > The amount of conventional memory and the start of the EBDA are > > detected by reading the BIOS data area directly. Paravirtual > > environments do not provide this area, so we bail out early > > in that case. They will just have to set up a correct memory > > map to start with. > > > > Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> > > > > --- > > > > Hi Ingo, > > > > This is the second attempt at a i386-version of the ebda > > patch. I hope that one of the Xen people will be able to > > check that this does not break their setups, but I think > > it will be fine after their patch to exclude the 0x9f000- > > 0x100000 area explicitly in their setup. > > Confirmed that with Ian's e820 map patch and your patch, Xen DomU boots > fine. Thanks for testing that! > > diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c > > index a1d7071..20e537b 100644 > > --- a/arch/x86/kernel/setup_32.c > > +++ b/arch/x86/kernel/setup_32.c > > @@ -385,15 +385,60 @@ unsigned long __init find_max_low_pfn(void) > > return max_low_pfn; > > } > > > > +#define BIOS_EBDA_SEGMENT 0x40E > > +#define BIOS_LOWMEM_KILOBYTES 0x413 > > + > > /* > > - * workaround for Dell systems that neglect to reserve EBDA > > + * The BIOS places the EBDA/XBDA at the top of conventional > > + * memory, and usually decreases the reported amount of > > + * conventional memory (int 0x12) too. This also contains a > > + * workaround for Dell systems that neglect to reserve EBDA. > > + * The same workaround also avoids a problem with the AMD768MPX > > + * chipset: reserve a page before VGA to prevent PCI prefetch > > + * into it (errata #56). Usually the page is reserved anyways, > > + * unless you have no PS/2 mouse plugged in. > > */ > > static void __init reserve_ebda_region(void) > > { > > - unsigned int addr; > > - addr = get_bios_ebda(); > > - if (addr) > > - reserve_bootmem(addr, PAGE_SIZE, BOOTMEM_DEFAULT); > > + unsigned int lowmem, ebda_addr; > > + > > + /* To determine the position of the EBDA and the */ > > + /* end of conventional memory, we need to look at */ > > + /* the BIOS data area. In a paravirtual environment */ > > + /* that area is absent. We'll just have to assume */ > > + /* that the paravirt case can handle memory setup */ > > + /* correctly, without our help. */ > > +#ifdef CONFIG_PARAVIRT > > + if ((boot_params.hdr.version >= 0x207) && > > + (boot_params.hdr.hardware_subarch != 0)) { > > + return; > > + } > > +#endif > > This is a bit magic, is it worth splitting it out as something like > is_paravirt_environment() ? Yes, I guess it would make sense, but do we really want to start creating accessor functions to the boot_params struct? If we do, then maybe put this as an inline function in <asm-x86/bootparam.h>? I don't think <asm-x86/paravirt.h> is the right place for this. Should is_paravirt_environment then always return 0 if CONFIG_PARAVIRT is not set? Maybe a function like get_subarch would be preferable? or has_legacy_bios_areas? I don't know. If someone makes a reasonable suggestion, I'll code it up. Greetings, Alexander > Cheers, > Mark. -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - And now for something completely different ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-03-04 11:44 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Mark McLoughlin 2008-03-04 13:31 ` Alexander van Heukelum @ 2008-03-04 14:49 ` Ingo Molnar 2008-03-04 15:16 ` Mark McLoughlin 2008-03-04 15:18 ` Jeremy Fitzhardinge 2 siblings, 1 reply; 62+ messages in thread From: Ingo Molnar @ 2008-03-04 14:49 UTC (permalink / raw) To: Mark McLoughlin Cc: Alexander van Heukelum, Ian Campbell, Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML * Mark McLoughlin <markmc@redhat.com> wrote: > > This is the second attempt at a i386-version of the ebda patch. I > > hope that one of the Xen people will be able to check that this does > > not break their setups, but I think it will be fine after their > > patch to exclude the 0x9f000- 0x100000 area explicitly in their > > setup. > > Confirmed that with Ian's e820 map patch and your patch, Xen DomU > boots fine. hm, for now i've only got the patch below queued up for v2.6.25. Could you check whether just the patch below ontop of -rc3-ish upstream solves the problem too? The EBDA patch would be a bit risky now - it's queued up for v2.6.26 at the moment. Ingo ---------------> Subject: x86/xen: fix DomU boot problem From: Ian Campbell <ijc@hellion.org.uk> Date: Thu, 28 Feb 2008 23:16:49 +0000 Construct Xen guest e820 map with a hole between 640K-1M. It's pure luck that Xen kernels have gotten away with it in the past. The patch below seems like the right thing to do. It certainly boots in a domU without the DMI problem (without any of the other related patches such as Alexander's). Signed-off-by: Ian Campbell <ijc@hellion.org.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Mark McLoughlin <markmc@redhat.com> Acked-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/xen/setup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-x86.q/arch/x86/xen/setup.c =================================================================== --- linux-x86.q.orig/arch/x86/xen/setup.c +++ linux-x86.q/arch/x86/xen/setup.c @@ -38,7 +38,8 @@ char * __init xen_memory_setup(void) unsigned long max_pfn = xen_start_info->nr_pages; e820.nr_map = 0; - add_memory_region(0, PFN_PHYS(max_pfn), E820_RAM); + add_memory_region(0, LOWMEMSIZE(), E820_RAM); + add_memory_region(HIGH_MEMORY, PFN_PHYS(max_pfn)-HIGH_MEMORY, E820_RAM); return "Xen"; } ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-03-04 14:49 ` Ingo Molnar @ 2008-03-04 15:16 ` Mark McLoughlin 2008-03-04 15:24 ` Ingo Molnar 0 siblings, 1 reply; 62+ messages in thread From: Mark McLoughlin @ 2008-03-04 15:16 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander van Heukelum, Ian Campbell, Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML On Tue, 2008-03-04 at 15:49 +0100, Ingo Molnar wrote: > * Mark McLoughlin <markmc@redhat.com> wrote: > > > > This is the second attempt at a i386-version of the ebda patch. I > > > hope that one of the Xen people will be able to check that this does > > > not break their setups, but I think it will be fine after their > > > patch to exclude the 0x9f000- 0x100000 area explicitly in their > > > setup. > > > > Confirmed that with Ian's e820 map patch and your patch, Xen DomU > > boots fine. > > hm, for now i've only got the patch below queued up for v2.6.25. > > Could you check whether just the patch below ontop of -rc3-ish upstream > solves the problem too? The EBDA patch would be a bit risky now - it's > queued up for v2.6.26 at the moment. Sorry, perhaps I wasn't clear. Ian's patch to add a hole in the e820 map fixes the problem without any other patch. I was just confirming to Alexander that his EBDA patch didn't break anything further. Thanks, Mark. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-03-04 15:16 ` Mark McLoughlin @ 2008-03-04 15:24 ` Ingo Molnar 0 siblings, 0 replies; 62+ messages in thread From: Ingo Molnar @ 2008-03-04 15:24 UTC (permalink / raw) To: Mark McLoughlin Cc: Alexander van Heukelum, Ian Campbell, Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, Jeremy Fitzhardinge, LKML * Mark McLoughlin <markmc@redhat.com> wrote: > > hm, for now i've only got the patch below queued up for v2.6.25. > > > > Could you check whether just the patch below ontop of -rc3-ish > > upstream solves the problem too? The EBDA patch would be a bit risky > > now - it's queued up for v2.6.26 at the moment. > > Sorry, perhaps I wasn't clear. > > Ian's patch to add a hole in the e820 map fixes the problem without > any other patch. > > I was just confirming to Alexander that his EBDA patch didn't break > anything further. ah, great. You got me worried for a minute :) Ingo ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-03-04 11:44 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Mark McLoughlin 2008-03-04 13:31 ` Alexander van Heukelum 2008-03-04 14:49 ` Ingo Molnar @ 2008-03-04 15:18 ` Jeremy Fitzhardinge 2008-03-04 16:51 ` Alexander van Heukelum 2 siblings, 1 reply; 62+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-04 15:18 UTC (permalink / raw) To: Mark McLoughlin Cc: Alexander van Heukelum, Ingo Molnar, Ian Campbell, Alexander van Heukelum, H. Peter Anvin, Andi Kleen, Thomas Gleixner, LKML Mark McLoughlin wrote: > This is a bit magic, is it worth splitting it out as something like > is_paravirt_environment() ? > Yes, is_paravirt() already exists for this purpose. The code looking at the boot params will only work if we actually booted via the paravirt Linux boot protocol, which Xen doesn't at present. J ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-03-04 15:18 ` Jeremy Fitzhardinge @ 2008-03-04 16:51 ` Alexander van Heukelum 2008-03-04 17:05 ` H. Peter Anvin 2008-03-04 17:11 ` Jeremy Fitzhardinge 0 siblings, 2 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-03-04 16:51 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Mark McLoughlin, Alexander van Heukelum, Ingo Molnar, Ian Campbell, H. Peter Anvin, Andi Kleen, Thomas Gleixner, LKML On Tue, 04 Mar 2008 07:18:48 -0800, "Jeremy Fitzhardinge" <jeremy@goop.org> said: > Mark McLoughlin wrote: > > This is a bit magic, is it worth splitting it out as something like > > is_paravirt_environment() ? > > > > Yes, is_paravirt() already exists for this purpose. Hi, If it exists, it is well-hidden. A grep for is_paravirt on the testing tree turns up nothing. Did you get the name right? > The code looking at the boot params will only work if we actually booted > via the paravirt Linux boot protocol, which Xen doesn't at present. I chose to code it exactly this way, because it is what is used in head_32.S to choose how to start the kernel. Or is this code not executed at all? [excerpt form head_32.S] cmpw $0x207, pa(boot_params + BP_version) jb default_entry /* Paravirt-compatible boot parameters. Look to see what architecture we're booting under. */ movl pa(boot_params + BP_hardware_subarch), %eax cmpl $num_subarch_entries, %eax jae bad_subarch movl pa(subarch_entries)(,%eax,4), %eax subl $__PAGE_OFFSET, %eax jmp *%eax [and] subarch_entries: .long default_entry /* normal x86/PC */ .long lguest_entry /* lguest hypervisor */ .long xen_entry /* Xen hypervisor */ num_subarch_entries = (. - subarch_entries) / 4 [end] If this is indeed not executed, is there a way to detect whether we can expect the environment to behave like a normal pc in terms of magic addresses, bios areas, isa reserved address space and so on? Greetings, Alexander -- Alexander van Heukelum heukelum@fastmail.fm -- http://www.fastmail.fm - mmm... Fastmail... ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-03-04 16:51 ` Alexander van Heukelum @ 2008-03-04 17:05 ` H. Peter Anvin 2008-03-04 17:11 ` Jeremy Fitzhardinge 1 sibling, 0 replies; 62+ messages in thread From: H. Peter Anvin @ 2008-03-04 17:05 UTC (permalink / raw) To: Alexander van Heukelum Cc: Jeremy Fitzhardinge, Mark McLoughlin, Alexander van Heukelum, Ingo Molnar, Ian Campbell, Andi Kleen, Thomas Gleixner, LKML Alexander van Heukelum wrote: > > If this is indeed not executed, is there a way to detect whether > we can expect the environment to behave like a normal pc in terms > of magic addresses, bios areas, isa reserved address space and so > on? > The subarch stuff (boot_params.hdr.hardware_subarch) is indeed executed, at least with newer PV guests. However, as far as this kind of stuff, one really have to wonder to what extent the PV users really care about how much they perturb the guests. The canonical view of virtualization is that you should perturb your guests as little as possible, but that doesn't really seem to be considered in the observable bits of the PV universe as far as I can tell. A *massive* issue with hooks -- including paravirt_ops -- is that they are largely undocumented both in code and in specification, and usually hard-code the underlying implemenentation at a specific point in time: I have yet to see any sort of specification document for paravirt_ops, and most of the hooks are simply empty on hardware. This means that the burden has shifted onto the kernel maintainers to test every possible paravirt_ops client, because it is quite literally the only way to know when it's broken. I'm starting to feel that the PV clients need to document their environments and their constraints better for the benefit of the core maintainers. -hpa ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 2008-03-04 16:51 ` Alexander van Heukelum 2008-03-04 17:05 ` H. Peter Anvin @ 2008-03-04 17:11 ` Jeremy Fitzhardinge 2008-03-04 18:57 ` [PATCH] reserve end-of-conventional-memory to 1MB, 32-bit, use paravirt_enabled Alexander van Heukelum 1 sibling, 1 reply; 62+ messages in thread From: Jeremy Fitzhardinge @ 2008-03-04 17:11 UTC (permalink / raw) To: Alexander van Heukelum Cc: Mark McLoughlin, Alexander van Heukelum, Ingo Molnar, Ian Campbell, H. Peter Anvin, Andi Kleen, Thomas Gleixner, LKML Alexander van Heukelum wrote: > On Tue, 04 Mar 2008 07:18:48 -0800, "Jeremy Fitzhardinge" > <jeremy@goop.org> said: > >> Mark McLoughlin wrote: >> >>> This is a bit magic, is it worth splitting it out as something like >>> is_paravirt_environment() ? >>> >>> >> Yes, is_paravirt() already exists for this purpose. >> > > Hi, > > If it exists, it is well-hidden. A grep for is_paravirt on the testing > tree turns up nothing. Did you get the name right? > Nope. paravirt_enabled(). >> The code looking at the boot params will only work if we actually booted >> via the paravirt Linux boot protocol, which Xen doesn't at present. >> > > I chose to code it exactly this way, because it is what is used in > head_32.S to choose how to start the kernel. Or is this code not > executed at all? > No, this isn't executed when booting under Xen. Xen is a bit magic, and has its own kernel entrypoint which the domain builder finds via Xen-specific ELF notes on the vmlinux. We plan to move to booting via this path at some point. > [excerpt form head_32.S] > cmpw $0x207, pa(boot_params + BP_version) > jb default_entry > > /* Paravirt-compatible boot parameters. Look to see what > architecture > we're booting under. */ > movl pa(boot_params + BP_hardware_subarch), %eax > cmpl $num_subarch_entries, %eax > jae bad_subarch > > movl pa(subarch_entries)(,%eax,4), %eax > subl $__PAGE_OFFSET, %eax > jmp *%eax > [and] > subarch_entries: > .long default_entry /* normal x86/PC */ > .long lguest_entry /* lguest hypervisor */ > .long xen_entry /* Xen hypervisor */ > num_subarch_entries = (. - subarch_entries) / 4 > [end] > > If this is indeed not executed, is there a way to detect whether > we can expect the environment to behave like a normal pc in terms > of magic addresses, bios areas, isa reserved address space and so > on? > Just because something is paravirtualized and uses a non-PC hardware_subarch doesn't mean this stuff isn't present. Even the paravirt_enabled() test isn't accurate, because the environment may still choose to emulate these things (or in the Xen dom0 case, it may directly expose the real hardware). J ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB, 32-bit, use paravirt_enabled 2008-03-04 17:11 ` Jeremy Fitzhardinge @ 2008-03-04 18:57 ` Alexander van Heukelum 2008-03-04 19:12 ` [PATCH] reserve end-of-conventional-memory to 1MB, 64-bit, " Alexander van Heukelum 0 siblings, 1 reply; 62+ messages in thread From: Alexander van Heukelum @ 2008-03-04 18:57 UTC (permalink / raw) To: Jeremy Fitzhardinge, Ingo Molnar Cc: Alexander van Heukelum, Mark McLoughlin, Ian Campbell, H. Peter Anvin, Andi Kleen, Thomas Gleixner, LKML Jeremy Fitzhardinge pointed out that looking at the boot_params struct to determine if the system is running in a paravirtual environment is not reliable for the Xen case, currently. He also points out that there already exists a function to determine if the system is running in a paravirtual environment. So let's use that instead. This gets rid of the preprocessor test too. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- On Tue, Mar 04, 2008 at 09:11:33AM -0800, Jeremy Fitzhardinge wrote: > Alexander van Heukelum wrote: > >On Tue, 04 Mar 2008 07:18:48 -0800, "Jeremy Fitzhardinge" > ><jeremy@goop.org> said: > >>Yes, is_paravirt() already exists for this purpose. > > > >Hi, > > > >If it exists, it is well-hidden. A grep for is_paravirt on the testing > >tree turns up nothing. Did you get the name right? > > > > Nope. paravirt_enabled(). Duh, I should have been able to find that :-/. Thanks for the pointer. > Just because something is paravirtualized and uses a non-PC > hardware_subarch doesn't mean this stuff isn't present. Even the > paravirt_enabled() test isn't accurate, because the environment may > still choose to emulate these things (or in the Xen dom0 case, it may > directly expose the real hardware). paravirt_enabled() returns 0 if CONFIG_PARAVIRT is not set, or if a paravirt-enabled kernel is run on bare hardware: dom0. If that is right, then it is exactly what is needed. Greetings, Alexander diff --git a/arch/x86/kernel/setup_32.c b/arch/x86/kernel/setup_32.c index 20e537b..f595d7b 100644 --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -64,6 +64,7 @@ #include <setup_arch.h> #include <bios_ebda.h> #include <asm/cacheflush.h> +#include <asm/processor.h> /* This value is set up by the early boot code to point to the value immediately after the boot time page tables. It contains a *physical* @@ -408,12 +409,8 @@ static void __init reserve_ebda_region(void) /* that area is absent. We'll just have to assume */ /* that the paravirt case can handle memory setup */ /* correctly, without our help. */ -#ifdef CONFIG_PARAVIRT - if ((boot_params.hdr.version >= 0x207) && - (boot_params.hdr.hardware_subarch != 0)) { + if (paravirt_enabled()) return; - } -#endif /* end of low (conventional) memory */ lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] reserve end-of-conventional-memory to 1MB, 64-bit, use paravirt_enabled 2008-03-04 18:57 ` [PATCH] reserve end-of-conventional-memory to 1MB, 32-bit, use paravirt_enabled Alexander van Heukelum @ 2008-03-04 19:12 ` Alexander van Heukelum 0 siblings, 0 replies; 62+ messages in thread From: Alexander van Heukelum @ 2008-03-04 19:12 UTC (permalink / raw) To: Jeremy Fitzhardinge, Ingo Molnar Cc: Alexander van Heukelum, Mark McLoughlin, Ian Campbell, H. Peter Anvin, Andi Kleen, Thomas Gleixner, LKML Jeremy Fitzhardinge pointed out that looking at the boot_params struct to determine if the system is running in a paravirtual environment is not reliable for the Xen case, currently. He also points out that there already exists a function to determine if the system is running in a paravirtual environment. So let's use that instead. This gets rid of the preprocessor test too. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> --- Same, for 64 bit. Greetings, Alexander diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 269a6b4..48be76c 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -72,12 +72,8 @@ static void __init reserve_ebda_region(void) /* that area is absent. We'll just have to assume */ /* that the paravirt case can handle memory setup */ /* correctly, without our help. */ -#ifdef CONFIG_PARAVIRT - if ((boot_params.hdr.version >= 0x207) && - (boot_params.hdr.hardware_subarch != 0)) { + if (paravirt_enabled()) return; - } -#endif /* end of low (conventional) memory */ lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [PATCH] Fix alignment of early reservation for EBDA 2008-02-25 16:54 ` Alexander van Heukelum 2008-02-25 17:01 ` Ingo Molnar @ 2008-02-27 14:25 ` Andi Kleen 1 sibling, 0 replies; 62+ messages in thread From: Andi Kleen @ 2008-02-27 14:25 UTC (permalink / raw) To: Alexander van Heukelum Cc: H. Peter Anvin, Ingo Molnar, Thomas Gleixner, LKML, Alexander van Heukelum Alexander van Heukelum wrote: > On Sun, 24 Feb 2008 18:18:16 -0800, "H. Peter Anvin" <hpa@zytor.com> > said: >> Alexander van Heukelum wrote: >>> early res: 3 [9f000-9ffff] EBDA >>> >>> Is it really necessary to force the allocation to a page boundary? >> It is, but that rounding gets done in reserve_bootmem() anyway, so there >> is no need for the arch-specific code to do it. >> >> The 32-bit EBDA code hard-codes a size of 4K, which is probably equally >> wrong; my gut feel is that the right thing to do is to reserve from the >> EBDA up to the 640K mark (some BIOSes use an area like that for SMM >> stuff), possibly with some sanity checking. It's not needed, the e820 maps are always correct for modern systems in this case as far as I know. > > /* reserve all memory between lowmem and the 1MB mark */ > reserve_early(lowmem, 0x100000, "BIOS reserved"); The i386 kernel did this always, but I intentionally removed it from the 64bit kernel because all the modern BIOS seem to correctly report holes in this area. Only didn't do it on i386 because there were some concerns of very old systems not doing this correctly. My suspicion is that modern Windows systems rely on this, that is why BIOSes typically get it correct now. I think it should be only undone if you have a concrete case where it breaks not just based on someone's gut feel. Sure it's not a lot of memory, but why waste memory unnecessarily? -Andi ^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2008-03-05 17:43 UTC | newest] Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-02-24 17:46 [PATCH] Fix alignment of early reservation for EBDA Alexander van Heukelum 2008-02-24 19:27 ` Andi Kleen 2008-02-24 19:41 ` Ingo Molnar 2008-02-24 20:53 ` Alexander van Heukelum 2008-02-25 2:18 ` H. Peter Anvin 2008-02-25 16:54 ` Alexander van Heukelum 2008-02-25 17:01 ` Ingo Molnar 2008-02-25 18:07 ` [PATCH] reserve_early end-of-conventional-memory to 1MB Alexander van Heukelum 2008-02-25 18:13 ` H. Peter Anvin 2008-02-25 19:46 ` Alexander van Heukelum 2008-02-25 21:17 ` H. Peter Anvin 2008-02-26 9:30 ` Ingo Molnar 2008-02-27 14:26 ` Andi Kleen 2008-02-27 14:38 ` [PATCH] reserve_early end-of-conventional-memory to 1MB II - some numbers to put it into perspective Andi Kleen 2008-02-27 16:44 ` H. Peter Anvin 2008-02-27 20:01 ` Alexander van Heukelum 2008-02-28 13:13 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit Alexander van Heukelum 2008-02-28 13:28 ` [RFC] use realmode code to reserve end-of-conventional-memory to 1MB Alexander van Heukelum 2008-02-28 21:12 ` Ian Campbell 2008-02-28 21:14 ` H. Peter Anvin 2008-02-28 23:16 ` Ian Campbell 2008-02-29 20:00 ` Ingo Molnar 2008-03-04 11:41 ` Mark McLoughlin 2008-03-04 14:33 ` Ingo Molnar 2008-03-04 15:12 ` Ian Campbell 2008-03-04 15:13 ` Jeremy Fitzhardinge 2008-03-04 15:25 ` Ingo Molnar 2008-03-04 16:02 ` Jeremy Fitzhardinge 2008-03-04 16:15 ` Ingo Molnar 2008-03-04 16:15 ` Mark McLoughlin 2008-03-04 16:23 ` Ingo Molnar 2008-03-04 17:44 ` Jeremy Fitzhardinge 2008-03-05 15:59 ` Eduardo Habkost 2008-03-05 16:08 ` H. Peter Anvin 2008-03-05 16:53 ` Eduardo Habkost 2008-03-05 17:28 ` H. Peter Anvin 2008-03-05 17:28 ` Jeremy Fitzhardinge 2008-03-05 17:38 ` H. Peter Anvin 2008-03-05 16:38 ` Jeremy Fitzhardinge 2008-03-05 17:27 ` H. Peter Anvin 2008-02-28 21:09 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit Ian Campbell 2008-02-29 11:49 ` Alexander van Heukelum 2008-02-29 17:14 ` Mark McLoughlin 2008-02-29 18:38 ` Alexander van Heukelum 2008-02-29 18:44 ` H. Peter Anvin 2008-02-29 18:56 ` Alexander van Heukelum 2008-02-29 22:06 ` Mark McLoughlin 2008-02-29 22:26 ` Alexander van Heukelum 2008-03-01 16:09 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Alexander van Heukelum 2008-03-01 16:12 ` [PATCH] reserve end-of-conventional-memory to 1MB on 64-bit add-on Alexander van Heukelum 2008-03-04 11:44 ` [PATCH] reserve end-of-conventional-memory to 1MB on 32-bit v2 Mark McLoughlin 2008-03-04 13:31 ` Alexander van Heukelum 2008-03-04 14:49 ` Ingo Molnar 2008-03-04 15:16 ` Mark McLoughlin 2008-03-04 15:24 ` Ingo Molnar 2008-03-04 15:18 ` Jeremy Fitzhardinge 2008-03-04 16:51 ` Alexander van Heukelum 2008-03-04 17:05 ` H. Peter Anvin 2008-03-04 17:11 ` Jeremy Fitzhardinge 2008-03-04 18:57 ` [PATCH] reserve end-of-conventional-memory to 1MB, 32-bit, use paravirt_enabled Alexander van Heukelum 2008-03-04 19:12 ` [PATCH] reserve end-of-conventional-memory to 1MB, 64-bit, " Alexander van Heukelum 2008-02-27 14:25 ` [PATCH] Fix alignment of early reservation for EBDA Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).