LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, lcapitulino@redhat.com,
	keescook@chromium.org, tglx@linutronix.de, x86@kernel.org,
	hpa@zytor.com, fanc.fnst@cn.fujitsu.com, yasu.isimatu@gmail.com,
	indou.takao@jp.fujitsu.com, douly.fnst@cn.fujitsu.com
Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization
Date: Fri, 18 May 2018 20:14:55 +0800	[thread overview]
Message-ID: <20180518121455.GT24627@MiWiFi-R3L-srv> (raw)
In-Reply-To: <20180518112836.GS24627@MiWiFi-R3L-srv>

On 05/18/18 at 07:28pm, Baoquan He wrote:
> On 05/18/18 at 10:19am, Ingo Molnar wrote:
> > 
> > * Baoquan He <bhe@redhat.com> wrote:
> > 
> > > OK, I realized my saying above is misled because I didn't explain the
> > > background clearly. Let me add it:
> > > 
> > > Previously, FJ reported the movable_node issue that KASLR will put
> > > kernel into movable_node. That cause those movable_nodes can't be hot
> > > plugged any more. So finally we plannned to solve it by adding a new
> > > kernel parameter :
> > > 
> > > 	kaslr_boot_mem=nn[KMG]@ss[KMG]
> > > 
> > > We want customer to specify memory regions which KASLR can make use to
> > > randomize kernel into.
> > 
> > *WHY* should the "customer" care?
> > 
> > This is a _bug_: movable, hotpluggable zones of physical memory should not be 
> > randomized into.
> 
> Yes, for movable zones, agreed.
> 
> But for huge pages, it's only related to memory layout.
> 
> > 
> > > [...] Outside of the specified regions, we need avoid to put kernel into those 
> > > regions even though they are also available RAM. As for movable_node issue, we 
> > > can add immovable regions into kaslr_boot_mem=nn[KMG]@ss[KMG].
> > > 
> > > During this hotplug issue reviewing, Luiz's team reported this 1GB hugepages
> > > regression bug, I reproduced the bug and found out the root cause, then
> > > realized that I can utilize kaslr_boot_mem=nn[KMG]@ss[KMG] parameter to
> > > fix it too. E.g the KVM guest with 4GB RAM, we have a good 1GB huge
> > > page, then we can add "kaslr_boot_mem=1G@0, kaslr_boot_mem=3G@2G" to
> > > kernel command-line, then the good 1GB region [1G, 2G) won't be taken
> > > into account for kernel physical randomization.
> > > 
> > > Later, you pointed out that 'kaslr_boot_mem=' way need user to specify
> > > memory region manually, it's not good, suggested to solve them by
> > > getting information and solving them in KASLR boot code. So they are two
> > > issues now, for the movable_node issue, we need get hotplug information
> > > from SRAT table and then avoid them; for this 1GB hugepage issue, we
> > > need get information from kernel command-line, then avoid them.
> > > 
> > > This patch is for the hugepage issue only. Since FJ reported the hotplug
> > > issue and they assigned engineers to work on it, I would like to wait
> > > for them to post according to your suggestion.
> > 
> > All of this is handling it the wrong way about. This is *not* primarily about 
> > KASLR at all, and the user should not be required to specify some weird KASLR 
> > parameters.
> > 
> > This is a basic _memory map enumeration_ problem in both cases:
> > 
> >  - in the hotplug case KASLR doesn't know that it's a movable zone and relocates 
> >    into it,
> 
> Yes, in boot KASLR, we haven't parsed ACPI table to get hotplug
> information. If decide to read SRAT table, we can get if memory region
> is hotpluggable, then avoid them. This can be consistent with the later
> code after entering kernel.
> 
> > 
> >  - and in the KVM case KASLR doesn't know that it's a valuable 1GB page that
> >    shouldn't be broken up.
> > 
> > Note that it's not KASLR specific: if we had some other kernel feature that tried 
> > to allocate a piece of memory from what appears to be perfectly usable generic RAM 
> > we'd have the same problems!
> 
> Hmm, this may not be the situation for 1GB huge pages. For 1GB huge
> pages, the bug is that on KVM guest with 4GB ram, when user adds
> 'default_hugepagesz=1G hugepagesz=1G hugepages=1' to kernel
> command-line, if 'nokaslr' is specified, they can get 1GB huge page
> allocated successfully. If remove 'nokaslr', namely KASLR is enabled,
> the 1GB huge page allocation failed.
> 
> In hugetlb_nrpages_setup(), you can see that the current huge page code
> relies on memblock to get 1GB huge pages. Below is the e820 memory
> map from Luiz's bug report. In fact there are two good 1GB huge pages,
> one is [0x40000000, 0x7fffffff], the 2nd one is
> [0x100000000, 0x13fffffff]. by default memblock will allocate top-down
> if movable_node is set, then [0x100000000, 0x13fffffff] will be broken
		    ~not
Sorry, missed 'not'. 

void __init setup_arch(char **cmdline_p)
{
	...
#ifdef CONFIG_MEMORY_HOTPLUG
        if (movable_node_is_enabled())
                memblock_set_bottom_up(true);
#endif
	...
}

> when system initialization goes into hugetlb_nrpages_setup() invocation.
> So normally huge page can only get one good 1GB huge page, whether KASLR
> is enanled or not. This is not bug, but decided by the current huge page
> implementation. In this case, KASLR boot code can see two good 1GB huge
> pages, and try to avoid them. Besides, if it's a good 1GB huge page,
> it's not defined in memory map and also not attribute. It's only decided
> by the memory layout and also decided by the memory usage situation in
> the running system. If want to keep all good 1GB huge pages untouched,
> we may need to adjust the current memblock allocation code, to avoid
> any possibility to step into good 1GB huge pages before huge page
> allocation. However this comes to the improvement area of huge pages
> implementation, not related to KASLR.
> 
> [  +0.000000] e820: BIOS-provided physical RAM map:
> [  +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> [  +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> [  +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> [  +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
> [  +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
> [  +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> [  +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> [  +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
> 
> Furthermore, on bare-metal with large memory, e.g with 100GB memory, if
> user specifies 'default_hugepagesz=1G hugepagesz=1G hugepages=2' to only
> expect two 1GB huge pages reserved, if we save all those tens of good
> 1GB huge pages untouched, it seems to be over reactive.
> 
> Not sure if I understand your point correctly, this is my thought about
> the huge page issue, please help to point out anything wrong if any.
> 
> Thanks
> Baoquan
> > 
> > We need to fix the real root problem, which is lack of knowledge about crutial 
> > attributes of physical memory. Once that knowledge is properly represented at this 
> > early boot stage both KASLR and other memory allocators can make use of it to 
> > avoid those regions.
> > 
> > Thanks,
> > 
> > 	Ingo

  reply	other threads:[~2018-05-18 12:15 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-16 10:05 [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization Baoquan He
2018-05-16 10:05 ` [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling Baoquan He
2018-05-17  3:27   ` Chao Fan
2018-05-17  4:03     ` Baoquan He
2018-05-17  5:53       ` Chao Fan
2018-05-17  6:13         ` Baoquan He
2018-05-17  5:12   ` damian
2018-05-17  5:38     ` Baoquan He
2018-06-21 15:01   ` Ingo Molnar
2018-06-22 12:14     ` Baoquan He
2018-06-24  7:13       ` Ingo Molnar
2018-05-16 10:05 ` [PATCH 2/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization Baoquan He
2018-05-18  7:00 ` [PATCH 0/2] " Ingo Molnar
2018-05-18  7:43   ` Baoquan He
2018-05-18  8:19     ` Ingo Molnar
2018-05-18 11:28       ` Baoquan He
2018-05-18 12:14         ` Baoquan He [this message]
2018-05-23 19:10         ` Luiz Capitulino
2018-05-28  9:54           ` Baoquan He
2018-05-29 13:27             ` Luiz Capitulino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180518121455.GT24627@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=douly.fnst@cn.fujitsu.com \
    --cc=fanc.fnst@cn.fujitsu.com \
    --cc=hpa@zytor.com \
    --cc=indou.takao@jp.fujitsu.com \
    --cc=keescook@chromium.org \
    --cc=lcapitulino@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yasu.isimatu@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).