LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Evan Green <evgreen@chromium.org> To: David Hildenbrand <david@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Michal Hocko <mhocko@suse.com>, Pavel Machek <pavel@ucw.cz>, linux-api@vger.kernel.org, Alex Shi <alexs@kernel.org>, Alistair Popple <apopple@nvidia.com>, Johannes Weiner <hannes@cmpxchg.org>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, Miaohe Lin <linmiaohe@huawei.com>, Minchan Kim <minchan@kernel.org>, Suren Baghdasaryan <surenb@google.com>, Vlastimil Babka <vbabka@suse.cz>, LKML <linux-kernel@vger.kernel.org>, linux-mm@kvack.org Subject: Re: [PATCH v4] mm: Enable suspend-only swap spaces Date: Tue, 27 Jul 2021 09:31:33 -0700 [thread overview] Message-ID: <CAE=gft7567-2Lq7raJKrOpQ8UAvXTFWwPci=_GCRPET3nS=9SA@mail.gmail.com> (raw) In-Reply-To: <6ff28cfe-1107-347b-0327-ad36e256141b@redhat.com> On Tue, Jul 27, 2021 at 5:21 AM David Hildenbrand <david@redhat.com> wrote: > > On 27.07.21 11:48, David Hildenbrand wrote: > > On 27.07.21 02:12, Evan Green wrote: > >> Add a new SWAP_FLAG_HIBERNATE_ONLY that adds a swap region but refuses > >> to allow generic swapping to it. This region can still be wired up for > >> use in suspend-to-disk activities, but will never have regular pages > >> swapped to it. This flag will be passed in by utilities like swapon(8), > >> usage would probably look something like: swapon -o hibernate /dev/sda2. > >> > >> Currently it's not possible to enable hibernation without also enabling > >> generic swap for a given area. One semi-workaround for this is to delay > >> the call to swapon() until just before attempting to hibernate, and then > >> call swapoff() just after hibernate completes. This is somewhat kludgy, > >> and also doesn't really work to keep swap out of the hibernate region. > >> When hibernate begins, it starts by allocating a large chunk of memory > >> for itself. This often ends up forcing a lot of data out into swap. By > >> this time the hibernate region is eligible for generic swap, so swap > >> ends up leaking into the hibernate region even with the workaround. > >> > >> There are a few reasons why usermode might want to be able to > >> exclusively steer swap and hibernate. One reason relates to SSD wearing. > >> Hibernate's endurance and speed requirements are different from swap. > >> It may for instance be advantageous to keep hibernate in primary > >> storage, but put swap in an SLC namespace. These namespaces are faster > >> and have better endurance, but cost 3-4x in terms of capacity. > >> Exclusively steering hibernate and swap enables system designers to > >> accurately partition their storage without either wearing out their > >> primary storage, or overprovisioning their fast swap area. > >> > >> Another reason to allow exclusive steering has to do with security. > >> The requirements for designing systems with resilience against > >> offline attacks are different between swap and hibernate. Swap > >> effectively requires a dictionary of hashes, as pages can be added and > >> removed arbitrarily, whereas hibernate only needs a single hash for the > >> entire image. If you've set up block-level integrity for swap and > >> image-level integrity for hibernate, then allowing swap blocks to > >> possibly leak out to the hibernate region is problematic, since it > >> creates swap pages not protected by any integrity. > >> > >> Swap regions with SWAP_FLAG_HIBERNATE_ONLY set will not appear in > >> /proc/meminfo under SwapTotal and SwapFree, since they are not usable as > >> general swap. These regions do still appear in /proc/swaps. > > > > Right, and they also don't account towards the memory overcommit > > calculations. > > > > Thanks for extending the patch description! No problem, thanks for all the brainwaves directed at this. > > > > [...] > > > >> + if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY) { > >> + if (IS_ENABLED(CONFIG_HIBERNATION)) { > >> + if (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS) > >> + return -EINVAL; > >> + > >> + } else { > >> + return -EINVAL; > >> + } > >> + } > > > > We could do short > > > > if ((swap_flags & SWAP_FLAG_HIBERNATE_ONLY) && > > (!IS_ENABLED(CONFIG_HIBERNATION) || > > (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS))) > > return -EINVAL; > > > > or > > > > if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY)) > > if (!IS_ENABLED(CONFIG_HIBERNATION) || > > (swap_flags & ~SWAP_HIBERNATE_ONLY_VALID_FLAGS)) > > return -EINVAL; > > > >> + > >> if (!capable(CAP_SYS_ADMIN)) > >> return -EPERM; > >> > >> @@ -3335,16 +3366,20 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > >> if (swap_flags & SWAP_FLAG_PREFER) > >> prio = > >> (swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT; > >> + > >> + if (swap_flags & SWAP_FLAG_HIBERNATE_ONLY) > >> + p->flags |= SWP_HIBERNATE_ONLY; > >> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map); > >> > >> - pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n", > >> + pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s%s\n", > >> p->pages<<(PAGE_SHIFT-10), name->name, p->prio, > >> nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10), > >> (p->flags & SWP_SOLIDSTATE) ? "SS" : "", > >> (p->flags & SWP_DISCARDABLE) ? "D" : "", > >> (p->flags & SWP_AREA_DISCARD) ? "s" : "", > >> (p->flags & SWP_PAGE_DISCARD) ? "c" : "", > >> - (frontswap_map) ? "FS" : ""); > >> + (frontswap_map) ? "FS" : "", > >> + (p->flags & SWP_HIBERNATE_ONLY) ? "H" : ""); > >> > >> mutex_unlock(&swapon_mutex); > >> atomic_inc(&proc_poll_event); > >> > > > > Looks like the cleanest alternative to me, as long as we don't want to > > invent new interfaces. > > > > Acked-by: David Hildenbrand <david@redhat.com> > > > > Pavel just mentioned uswsusp, and I wonder if it would be a possible > alternative to this patch. I think you're right that it would be possible to isolate the hibernate image with uswsusp if you avoid using the SNAPSHOT_*SWAP* ioctls. But I'd expect performance to suffer noticeably, since now every page is making a round trip out to usermode and back. I'd still very much use the HIBERNATE_ONLY flag if it were accepted, I think there's value to it. -Evan
next prev parent reply other threads:[~2021-07-27 16:40 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-27 0:12 [PATCH v4] mm: Enable suspend-only swap spaces Evan Green 2021-07-27 9:48 ` David Hildenbrand 2021-07-27 12:21 ` David Hildenbrand 2021-07-27 16:31 ` Evan Green [this message] 2021-07-27 21:18 ` Andrew Morton 2021-08-03 18:00 ` Evan Green 2021-08-17 18:32 ` Pavel Machek 2021-07-30 13:15 ` Karel Zak
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAE=gft7567-2Lq7raJKrOpQ8UAvXTFWwPci=_GCRPET3nS=9SA@mail.gmail.com' \ --to=evgreen@chromium.org \ --cc=akpm@linux-foundation.org \ --cc=alexs@kernel.org \ --cc=apopple@nvidia.com \ --cc=david@redhat.com \ --cc=hannes@cmpxchg.org \ --cc=iamjoonsoo.kim@lge.com \ --cc=linmiaohe@huawei.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@suse.com \ --cc=minchan@kernel.org \ --cc=pavel@ucw.cz \ --cc=surenb@google.com \ --cc=vbabka@suse.cz \ --cc=willy@infradead.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).