LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com> To: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org, lcapitulino@redhat.com, keescook@chromium.org, tglx@linutronix.de, x86@kernel.org, hpa@zytor.com, fanc.fnst@cn.fujitsu.com, yasu.isimatu@gmail.com, indou.takao@jp.fujitsu.com, douly.fnst@cn.fujitsu.com Subject: Re: [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization Date: Fri, 18 May 2018 20:14:55 +0800 [thread overview] Message-ID: <20180518121455.GT24627@MiWiFi-R3L-srv> (raw) In-Reply-To: <20180518112836.GS24627@MiWiFi-R3L-srv> On 05/18/18 at 07:28pm, Baoquan He wrote: > On 05/18/18 at 10:19am, Ingo Molnar wrote: > > > > * Baoquan He <bhe@redhat.com> wrote: > > > > > OK, I realized my saying above is misled because I didn't explain the > > > background clearly. Let me add it: > > > > > > Previously, FJ reported the movable_node issue that KASLR will put > > > kernel into movable_node. That cause those movable_nodes can't be hot > > > plugged any more. So finally we plannned to solve it by adding a new > > > kernel parameter : > > > > > > kaslr_boot_mem=nn[KMG]@ss[KMG] > > > > > > We want customer to specify memory regions which KASLR can make use to > > > randomize kernel into. > > > > *WHY* should the "customer" care? > > > > This is a _bug_: movable, hotpluggable zones of physical memory should not be > > randomized into. > > Yes, for movable zones, agreed. > > But for huge pages, it's only related to memory layout. > > > > > > [...] Outside of the specified regions, we need avoid to put kernel into those > > > regions even though they are also available RAM. As for movable_node issue, we > > > can add immovable regions into kaslr_boot_mem=nn[KMG]@ss[KMG]. > > > > > > During this hotplug issue reviewing, Luiz's team reported this 1GB hugepages > > > regression bug, I reproduced the bug and found out the root cause, then > > > realized that I can utilize kaslr_boot_mem=nn[KMG]@ss[KMG] parameter to > > > fix it too. E.g the KVM guest with 4GB RAM, we have a good 1GB huge > > > page, then we can add "kaslr_boot_mem=1G@0, kaslr_boot_mem=3G@2G" to > > > kernel command-line, then the good 1GB region [1G, 2G) won't be taken > > > into account for kernel physical randomization. > > > > > > Later, you pointed out that 'kaslr_boot_mem=' way need user to specify > > > memory region manually, it's not good, suggested to solve them by > > > getting information and solving them in KASLR boot code. So they are two > > > issues now, for the movable_node issue, we need get hotplug information > > > from SRAT table and then avoid them; for this 1GB hugepage issue, we > > > need get information from kernel command-line, then avoid them. > > > > > > This patch is for the hugepage issue only. Since FJ reported the hotplug > > > issue and they assigned engineers to work on it, I would like to wait > > > for them to post according to your suggestion. > > > > All of this is handling it the wrong way about. This is *not* primarily about > > KASLR at all, and the user should not be required to specify some weird KASLR > > parameters. > > > > This is a basic _memory map enumeration_ problem in both cases: > > > > - in the hotplug case KASLR doesn't know that it's a movable zone and relocates > > into it, > > Yes, in boot KASLR, we haven't parsed ACPI table to get hotplug > information. If decide to read SRAT table, we can get if memory region > is hotpluggable, then avoid them. This can be consistent with the later > code after entering kernel. > > > > > - and in the KVM case KASLR doesn't know that it's a valuable 1GB page that > > shouldn't be broken up. > > > > Note that it's not KASLR specific: if we had some other kernel feature that tried > > to allocate a piece of memory from what appears to be perfectly usable generic RAM > > we'd have the same problems! > > Hmm, this may not be the situation for 1GB huge pages. For 1GB huge > pages, the bug is that on KVM guest with 4GB ram, when user adds > 'default_hugepagesz=1G hugepagesz=1G hugepages=1' to kernel > command-line, if 'nokaslr' is specified, they can get 1GB huge page > allocated successfully. If remove 'nokaslr', namely KASLR is enabled, > the 1GB huge page allocation failed. > > In hugetlb_nrpages_setup(), you can see that the current huge page code > relies on memblock to get 1GB huge pages. Below is the e820 memory > map from Luiz's bug report. In fact there are two good 1GB huge pages, > one is [0x40000000, 0x7fffffff], the 2nd one is > [0x100000000, 0x13fffffff]. by default memblock will allocate top-down > if movable_node is set, then [0x100000000, 0x13fffffff] will be broken ~not Sorry, missed 'not'. void __init setup_arch(char **cmdline_p) { ... #ifdef CONFIG_MEMORY_HOTPLUG if (movable_node_is_enabled()) memblock_set_bottom_up(true); #endif ... } > when system initialization goes into hugetlb_nrpages_setup() invocation. > So normally huge page can only get one good 1GB huge page, whether KASLR > is enanled or not. This is not bug, but decided by the current huge page > implementation. In this case, KASLR boot code can see two good 1GB huge > pages, and try to avoid them. Besides, if it's a good 1GB huge page, > it's not defined in memory map and also not attribute. It's only decided > by the memory layout and also decided by the memory usage situation in > the running system. If want to keep all good 1GB huge pages untouched, > we may need to adjust the current memblock allocation code, to avoid > any possibility to step into good 1GB huge pages before huge page > allocation. However this comes to the improvement area of huge pages > implementation, not related to KASLR. > > [ +0.000000] e820: BIOS-provided physical RAM map: > [ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable > [ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved > [ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved > [ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable > [ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved > [ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved > [ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved > [ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable > > Furthermore, on bare-metal with large memory, e.g with 100GB memory, if > user specifies 'default_hugepagesz=1G hugepagesz=1G hugepages=2' to only > expect two 1GB huge pages reserved, if we save all those tens of good > 1GB huge pages untouched, it seems to be over reactive. > > Not sure if I understand your point correctly, this is my thought about > the huge page issue, please help to point out anything wrong if any. > > Thanks > Baoquan > > > > We need to fix the real root problem, which is lack of knowledge about crutial > > attributes of physical memory. Once that knowledge is properly represented at this > > early boot stage both KASLR and other memory allocators can make use of it to > > avoid those regions. > > > > Thanks, > > > > Ingo
next prev parent reply other threads:[~2018-05-18 12:15 UTC|newest] Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-05-16 10:05 [PATCH 0/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization Baoquan He 2018-05-16 10:05 ` [PATCH 1/2] x86/boot/KASLR: Add two functions for 1GB huge pages handling Baoquan He 2018-05-17 3:27 ` Chao Fan 2018-05-17 4:03 ` Baoquan He 2018-05-17 5:53 ` Chao Fan 2018-05-17 6:13 ` Baoquan He 2018-05-17 5:12 ` damian 2018-05-17 5:38 ` Baoquan He 2018-06-21 15:01 ` Ingo Molnar 2018-06-22 12:14 ` Baoquan He 2018-06-24 7:13 ` Ingo Molnar 2018-05-16 10:05 ` [PATCH 2/2] x86/boot/KASLR: Skip specified number of 1GB huge pages when do physical randomization Baoquan He 2018-05-18 7:00 ` [PATCH 0/2] " Ingo Molnar 2018-05-18 7:43 ` Baoquan He 2018-05-18 8:19 ` Ingo Molnar 2018-05-18 11:28 ` Baoquan He 2018-05-18 12:14 ` Baoquan He [this message] 2018-05-23 19:10 ` Luiz Capitulino 2018-05-28 9:54 ` Baoquan He 2018-05-29 13:27 ` Luiz Capitulino
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180518121455.GT24627@MiWiFi-R3L-srv \ --to=bhe@redhat.com \ --cc=douly.fnst@cn.fujitsu.com \ --cc=fanc.fnst@cn.fujitsu.com \ --cc=hpa@zytor.com \ --cc=indou.takao@jp.fujitsu.com \ --cc=keescook@chromium.org \ --cc=lcapitulino@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=mingo@kernel.org \ --cc=tglx@linutronix.de \ --cc=x86@kernel.org \ --cc=yasu.isimatu@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).