LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org> To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "peterz@infradead.org" <peterz@infradead.org>, "keescook@chromium.org" <keescook@chromium.org>, "Weiny, Ira" <ira.weiny@intel.com>, "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>, "vbabka@suse.cz" <vbabka@suse.cz>, "x86@kernel.org" <x86@kernel.org>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "rppt@linux.ibm.com" <rppt@linux.ibm.com>, "Lutomirski, Andy" <luto@kernel.org> Subject: Re: [RFC PATCH 0/4] mm/page_alloc: cache pte-mapped allocations Date: Tue, 24 Aug 2021 16:03:26 +0300 [thread overview] Message-ID: <YSTuHr7d//L3Ysjx@kernel.org> (raw) In-Reply-To: <1b49324674fd75294625f725c7f074efd8480efc.camel@intel.com> On Mon, Aug 23, 2021 at 08:02:55PM +0000, Edgecombe, Rick P wrote: > Mon, 2021-08-23 at 16:25 +0300, Mike Rapoport wrote: > > > > There are usecases that need to remove pages from the direct map or > > at > > least map them with 4K granularity. Whenever this is done e.g. with > > set_memory/set_direct_map APIs, the PUD and PMD sized mappings in the > > direct map are split into smaller pages. > > > > To reduce the performance hit caused by the fragmentation of the > > direct map > > it make sense to group and/or cache the 4K pages removed from the > > direct > > map so that the split large pages won't be all over the place. > > > If you tied this into debug page alloc, you shouldn't need to group the > pages. Are you thinking this PKS-less page table usage would be a > security feature or debug time thing? I consider the PKS-less page table protection as an example user of the grouped pages/pte-mapped cache rather than an actual security feature or even a debug thing. With PKS we still have the same trade-off of allocation flexibility vs direct map fragmentation and I hoped to focus the discussion of the mm part of the series rather than on page table protection. Apparently it didn't work :) > > == TODOs == > > > > Whenever pte-mapped cache is being shrunk, it is possible to add some > > kind > > of compaction to move all the free pages into PMD-sized chunks, free > > these > > chunks at once and restore large page in the direct map. > > I had made a POC to do this a while back that hooked into the buddy > code in the page allocator where this coalescing is already happening > for freed pages. The problem was that most pages that get their direct > map alias broken, end up using a page from the same 2MB page for the > page table in the split. But then the direct map page table never gets > freed so it never can restore the large page when checking the the > allocation page getting freed. Grouping permissioned pages OR page > tables would resolve that and it was my plan to try again after > something like this happened. This suggests that one global cache won't be good enough, at least for the case when page tables are taken from that cache. > Was just an experiment, but can share if you are interested. Yes, please. > > == Alternatives == > > > > Current implementation uses a single global cache. > > > > Another option is to have per-user caches, e.g one for the page > > tables, > > another for vmalloc etc. This approach provides better control of > > the > > permissions of the pages allocated from these caches and allows the > > user to > > decide when (if at all) the pages can be accessed, e.g. for cache > > compaction. The down side of this approach is that it complicates the > > freeing path. A page allocated from a dedicated cache cannot be freed > > with > > put_page()/free_page() etc but it has to be freed with a dedicated > > API or > > there should be some back pointer in struct page so that page > > allocator > > will know what cache this page came from. > > This needs to reset the permissions before freeing, so doesn't seem too > different than freeing them a special way. Not quite. For instance, when freeing page table pages with mmu_gather, we can reset the permission at or near pgtable_pxy_page_dtor() and continue to the batched free. > > Yet another possibility to make pte-mapped cache a migratetype of its > > own. > > Creating a new migratetype would allow higher order allocations of > > pte-mapped pages, but I don't have enough understanding of page > > allocator > > and reclaim internals to estimate the complexity associated with this > > approach. > > > I've been thinking about two categories of direct map permission > usages. > > One is limiting the use of the direct map alias when it's not in use > and the primary alias is getting some other permission. Examples are > modules, secretmem, xpfo, KVM guest memory unmapping stuff, etc. In > this case re-allocations can share unmapped pages without doing any > expensive maintenance and it helps to have one big cache. If you are > going to convert pages to 4k and cache them, you might as well convert > them to NP at the time, since it's cheap to restore them or set their > permission from that state. > > Two is setting permissions on the direct map as the only alias to be > used. This includes this rfc, some PKS usages, but also possibly some > set_pages_uc() callers and the like. It seems that this category could > still make use of a big unmapped cache of pages. Just ask for unmapped > pages and convert them without a flush. > > So like something would have a big cache of grouped unmapped pages > that category one usages could share. And then little category two > allocators could have their own caches that feed on it too. What do you > think? This is regardless if they live in the page allocator or not. I can say I see how category two cache would use a global unmapped cache. I would envision that these caches can share the implementation, but there will be different instances - one for the global cache and another one (or several) for users that cannot share the global cache for some reason. -- Sincerely yours, Mike.
next prev parent reply other threads:[~2021-08-24 13:03 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-23 13:25 [RFC PATCH 0/4] mm/page_alloc: cache pte-mapped allocations Mike Rapoport 2021-08-23 13:25 ` [RFC PATCH 1/4] list: Support getting most recent element in list_lru Mike Rapoport 2021-08-23 13:25 ` [RFC PATCH 2/4] list: Support list head not in object for list_lru Mike Rapoport 2021-08-23 13:25 ` [RFC PATCH 3/4] mm/page_alloc: introduce __GFP_PTE_MAPPED flag to allocate pte-mapped pages Mike Rapoport 2021-08-23 20:29 ` Edgecombe, Rick P 2021-08-24 13:02 ` Mike Rapoport 2021-08-24 16:38 ` Edgecombe, Rick P 2021-08-24 16:54 ` Mike Rapoport 2021-08-24 17:23 ` Edgecombe, Rick P 2021-08-24 17:37 ` Mike Rapoport 2021-08-24 16:12 ` Vlastimil Babka 2021-08-25 8:43 ` David Hildenbrand 2021-08-23 13:25 ` [RFC PATCH 4/4] x86/mm: write protect (most) page tables Mike Rapoport 2021-08-23 20:08 ` Edgecombe, Rick P 2021-08-23 23:50 ` Dave Hansen 2021-08-24 3:34 ` Andy Lutomirski 2021-08-25 14:59 ` Dave Hansen 2021-08-24 13:32 ` Mike Rapoport 2021-08-25 8:38 ` David Hildenbrand 2021-08-26 8:02 ` Mike Rapoport 2021-08-26 9:01 ` Vlastimil Babka [not found] ` <FB6C09CD-9CEA-4FE8-B179-98DB63EBDD68@gmail.com> 2021-08-24 5:34 ` Nadav Amit 2021-08-24 13:36 ` Mike Rapoport 2021-08-23 20:02 ` [RFC PATCH 0/4] mm/page_alloc: cache pte-mapped allocations Edgecombe, Rick P 2021-08-24 13:03 ` Mike Rapoport [this message] 2021-08-24 16:09 ` Vlastimil Babka 2021-08-29 7:06 ` Mike Rapoport
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YSTuHr7d//L3Ysjx@kernel.org \ --to=rppt@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=dave.hansen@linux.intel.com \ --cc=ira.weiny@intel.com \ --cc=keescook@chromium.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=luto@kernel.org \ --cc=peterz@infradead.org \ --cc=rick.p.edgecombe@intel.com \ --cc=rppt@linux.ibm.com \ --cc=vbabka@suse.cz \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).