LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	tdmackey@twitter.com, "Andrew Morton" <akpm@linux-foundation.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Linux MM" <linux-mm@kvack.org>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm: hwpoison: don't drop slab caches for offlining non-LRU page
Date: Mon, 16 Aug 2021 12:37:27 -0700	[thread overview]
Message-ID: <CAHbLzkoyYwvGPaoxPKU1dG_riPPqvP+L5QUz38AVvXbD1y3c8g@mail.gmail.com> (raw)
In-Reply-To: <08a5ad43-7922-8cf8-31ed-4f6e0c346516@redhat.com>

On Mon, Aug 16, 2021 at 12:15 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 16.08.21 20:09, Yang Shi wrote:
> > In the current implementation of soft offline, if non-LRU page is met,
> > all the slab caches will be dropped to free the page then offline.  But
> > if the page is not slab page all the effort is wasted in vain.  Even
> > though it is a slab page, it is not guaranteed the page could be freed
> > at all.
>
> ... but there is a chance it could be and the current behavior is
> actually helpful in some setups.

I don't disagree it is kind of helpful for some cases, but the
question is how likely it is helpful and if the cost is worth it or
not. For non-slab page (of course, non-lru too), dropping slab doesn't
make any sense. Even though it is slab page, it must be a reclaimable
slab. Even though it is a reclaimable slab, dropping slab can't
guarantee all objects on the same page are dropped.

IMHO the likelihood is not worth the cost and side effect, for example
the unsuable system.

>
> [...]
>
> > The lockup made the machine is quite unusable.  And it also made the
> > most workingset gone, the reclaimabled slab caches were reduced from 12G
> > to 300MB, the page caches were decreased from 17G to 4G.
> >
> > But the most disappointing thing is all the effort doesn't make the page
> > offline, it just returns:
> >
> > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 ()
> >
>
> In your example, yes. I had a look at the introducing commit:
> facb6011f399 ("HWPOISON: Add soft page offline support")
>
> "
>      When the page is not free or LRU we try to free pages
>      from slab and other caches. The slab freeing is currently
>      quite dumb and does not try to focus on the specific slab
>      cache which might own the page. This could be potentially
>      improved later.
> "
>
> I wonder, if instead of removing it altogether, we could actually
> improve it as envisioned.
>
> To be precise, for alloc_contig_range() it would also make sense to be
> able to shrink only in a specific physical memory range; this here seems
> to be a similar thing. (actually, alloc_contig_range(), actual memory
> offlining and hw poisoning/soft-offlining have a lot in common)
>
> Unfortunately, the last time I took a brief look at teaching shrinkers
> to be range-aware, it turned out to be a lot of work ... so maybe this
> is really a long term goal to be mitigated in the meantime by disabling
> it, if it turns out to be more of a problem than actually help.

Do you mean physical page range? Yes, it would need a lot of work.
TBH, I don't think it is quite feasible for the time being.

The problem is slabs for shrinker are managed by objects rather than
pages. For example, dentry and inode objects (the most consumed
reclaimable slabs) are linked to lru, and shrinkers traverse the lru
to shrink the objects. The objects in a certain range can not be
guaranteed in the same range of physical pages.

>
> --
> Thanks,
>
> David / dhildenb
>

  reply	other threads:[~2021-08-16 19:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 18:09 Yang Shi
2021-08-16 18:09 ` [PATCH 2/2] doc: hwpoison: correct the support for hugepage Yang Shi
2021-08-18  6:36   ` HORIGUCHI NAOYA(堀口 直也)
2021-08-16 19:02 ` [PATCH 1/2] mm: hwpoison: don't drop slab caches for offlining non-LRU page David Hildenbrand
2021-08-16 19:04   ` David Hildenbrand
2021-08-16 19:15 ` David Hildenbrand
2021-08-16 19:37   ` Yang Shi [this message]
2021-08-16 19:40     ` David Hildenbrand
2021-08-16 19:37 ` Matthew Wilcox
2021-08-16 20:24   ` Yang Shi
2021-08-18  5:02     ` HORIGUCHI NAOYA(堀口 直也)
2021-08-18 17:45       ` Yang Shi
2021-08-18  6:30 ` Naoya Horiguchi
2021-08-18  7:24   ` David Hildenbrand
2021-08-18  7:53     ` HORIGUCHI NAOYA(堀口 直也)
2021-08-18  7:55       ` David Hildenbrand
2021-08-18 17:04         ` Yang Shi
2021-08-18 17:02   ` Yang Shi
2021-08-18 18:01   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkoyYwvGPaoxPKU1dG_riPPqvP+L5QUz38AVvXbD1y3c8g@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tdmackey@twitter.com \
    --subject='Re: [PATCH 1/2] mm: hwpoison: don'\''t drop slab caches for offlining non-LRU page' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).