LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Yang Shi <shy828301@gmail.com>
Cc: "osalvador@suse.de" <osalvador@suse.de>,
	"hughd@google.com" <hughd@google.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: hwpoison: deal with page cache THP
Date: Thu, 26 Aug 2021 06:17:25 +0000	[thread overview]
Message-ID: <20210826061724.GA2864786@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20210824221322.7663-1-shy828301@gmail.com>

On Tue, Aug 24, 2021 at 03:13:22PM -0700, Yang Shi wrote:
> Currently hwpoison doesn't handle page cache THP, just give up and return
> error.  It is just because the hwpoison THP support was added before
> page cache THP was supported.
> 
> Handling page cache THP is simple, they could be offlined by splitting THP,
> just like anonymous THP.

I think that this patch is not enough to contain an error because page table
entries pointing to subpages in shmem thp are removed during thp splitting.
Then the processes using the file newly allocates another (zeroed) page for
the poisoned address, which results in slient data lost.

According to the comment in unmap_page() at mm/huge_memory.c, file pages are
supposed to be faulted back on demand:

  static void unmap_page(struct page *page)
  ...
          /*
           * Anon pages need migration entries to preserve them, but file
           * pages can simply be left unmapped, then faulted back on demand.
           * If that is ever changed (perhaps for mlock), update remap_page().
           */
          if (PageAnon(page))
                  try_to_migrate(page, ttu_flags);
          else
                  try_to_unmap(page, ttu_flags | TTU_IGNORE_MLOCK);

, but I don't think that faulting back during memory error handling might be
hard because it does not have any direct information about mapping processes
of an error page.  memory_failure() uses try_to_unmap() to find them but
splitting of shmem thp makes this impossible, because it removes the related
page table entries.

There was a discussion about another approach of keeping error pages in page
cache for filesystem without backend storage.
https://lore.kernel.org/lkml/alpine.LSU.2.11.2103111312310.7859@eggly.anvils/
This approach seems to me less complicated, but one concern is that this
change affects user-visible behavior of memory errors.  Keeping error pages
in page cache means that the errors are persistent until next system reboot,
so we might need to define the way to clear the errors to continue to use
the error file.  Current implementation is just to send SIGBUS to the
mapping processes (at least once), then forget about the error, so there is
no such issue.

Another thought of possible solution might be to send SIGBUS immediately when
a memory error happens on a shmem thp. We can find all the mapping processes
before splitting shmem thp, so send SIGBUS first, then split it and contain
the error page.  This is not elegant (giving up any optional actions) but
anyway we can avoid the silent data lost.

- Naoya Horiguchi

> 
> The question is how to distinguish them with allocating and freeing THP
> which can't be handled by hwpoison properly.  It seems page->mapping is a
> good indicator, both anonymous page and file page have it populated, but
> it won't be populated until the page is added to rmap or page cache, in
> other word, instantiated.  If page->mapping is populated it is
> definitely not in allocating or freeing.
> 
> The later get_page_unless_zero() could serialize against page free
> paths.
> 
> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
>  mm/memory-failure.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 60df8fcd0444..caa0b0c1f5b8 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1149,13 +1149,16 @@ static int __get_hwpoison_page(struct page *page)
>  
>  	if (PageTransHuge(head)) {
>  		/*
> -		 * Non anonymous thp exists only in allocation/free time. We
> -		 * can't handle such a case correctly, so let's give it up.
> -		 * This should be better than triggering BUG_ON when kernel
> -		 * tries to touch the "partially handled" page.
> +		 * We can't handle allocating or freeing THPs, so let's give
> +		 * it up. This should be better than triggering BUG_ON when
> +		 * kernel tries to touch the "partially handled" page.
> +		 *
> +		 * page->mapping won't be initialized until the page is added
> +		 * to rmap or page cache.  Use this as an indicator for if
> +		 * this is an instantiated page.
>  		 */
> -		if (!PageAnon(head)) {
> -			pr_err("Memory failure: %#lx: non anonymous thp\n",
> +		if (!head->mapping) {
> +			pr_err("Memory failure: %#lx: non instantiated thp\n",
>  				page_to_pfn(page));
>  			return 0;
>  		}
> @@ -1414,12 +1417,12 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>  static int try_to_split_thp_page(struct page *page, const char *msg)
>  {
>  	lock_page(page);
> -	if (!PageAnon(page) || unlikely(split_huge_page(page))) {
> +	if (!page->mapping || unlikely(split_huge_page(page))) {
>  		unsigned long pfn = page_to_pfn(page);
>  
>  		unlock_page(page);
> -		if (!PageAnon(page))
> -			pr_info("%s: %#lx: non anonymous thp\n", msg, pfn);
> +		if (!page->mapping)
> +			pr_info("%s: %#lx: not instantiated thp\n", msg, pfn);
>  		else
>  			pr_info("%s: %#lx: thp split failed\n", msg, pfn);
>  		put_page(page);
> -- 
> 2.26.2
> 

  reply	other threads:[~2021-08-26  6:17 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 22:13 Yang Shi
2021-08-26  6:17 ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-08-26 20:03   ` Yang Shi
2021-08-26 22:03     ` Yang Shi
2021-08-27  3:57       ` HORIGUCHI NAOYA(堀口 直也)
2021-08-27  5:02         ` Yang Shi
2021-08-30 23:44           ` Yang Shi
2021-09-02  3:07             ` HORIGUCHI NAOYA(堀口 直也)
2021-09-02 18:32               ` Yang Shi
2021-09-03 11:53                 ` HORIGUCHI NAOYA(堀口 直也)
2021-09-03 18:01                   ` Yang Shi
2021-09-04  0:03                     ` Yang Shi
2021-09-07 21:34                       ` Yang Shi
2021-09-08  2:50                         ` ##freemail## " HORIGUCHI NAOYA(堀口 直也)
2021-09-08  3:14                           ` Yang Shi
2021-09-08  4:25                             ` HORIGUCHI NAOYA(堀口 直也)
2021-09-09 23:07                               ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210826061724.GA2864786@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=shy828301@gmail.com \
    --subject='Re: [PATCH] mm: hwpoison: deal with page cache THP' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).