LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Yang Shi <shy828301@gmail.com>
Cc: "osalvador@suse.de" <osalvador@suse.de>,
	"hughd@google.com" <hughd@google.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: hwpoison: deal with page cache THP
Date: Fri, 27 Aug 2021 03:57:39 +0000	[thread overview]
Message-ID: <20210827035739.GA3247360@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <CAHbLzkpvR77xfs+ip1T8G09=ixz4Ko3E-6iKTEZkFCfGTxi6Aw@mail.gmail.com>

On Thu, Aug 26, 2021 at 03:03:57PM -0700, Yang Shi wrote:
> On Thu, Aug 26, 2021 at 1:03 PM Yang Shi <shy828301@gmail.com> wrote:
> >
> > On Wed, Aug 25, 2021 at 11:17 PM HORIGUCHI NAOYA(堀口 直也)
> > <naoya.horiguchi@nec.com> wrote:
> > >
> > > On Tue, Aug 24, 2021 at 03:13:22PM -0700, Yang Shi wrote:
...
> > >
> > > There was a discussion about another approach of keeping error pages in page
> > > cache for filesystem without backend storage.
> > > https://lore.kernel.org/lkml/alpine.LSU.2.11.2103111312310.7859@eggly.anvils/
> > > This approach seems to me less complicated, but one concern is that this
> > > change affects user-visible behavior of memory errors.  Keeping error pages
> > > in page cache means that the errors are persistent until next system reboot,
> > > so we might need to define the way to clear the errors to continue to use
> > > the error file.  Current implementation is just to send SIGBUS to the
> > > mapping processes (at least once), then forget about the error, so there is
> > > no such issue.
> > >
> > > Another thought of possible solution might be to send SIGBUS immediately when
> > > a memory error happens on a shmem thp. We can find all the mapping processes
> > > before splitting shmem thp, so send SIGBUS first, then split it and contain
> > > the error page.  This is not elegant (giving up any optional actions) but
> > > anyway we can avoid the silent data lost.
> >
> > Thanks a lot. I apologize I didn't notice you already posted a similar
> > patch before.
> >
> > Yes, I think I focused on the soft offline part too much and missed
> > the uncorrected error part and I admit I did underestimate the
> > problem.
> >
> > I think Hugh's suggestion makes sense if we treat tmpfs as a regular
> > filesystem (just memory backed). AFAIK, some filesystem, e.g. btrfs,
> > may do checksum after reading from storage block then return an error
> > if checksum is not right since it may indicate hardware failure on
> > disk. Then the syscalls or page fault return error or SIGBUS.
> >
> > So in shmem/tmpfs case, if hwpoisoned page is met, just return error
> > (-EIO or whatever) for syscall or SIGBUS for page fault. It does align
> > with the behavior of other filesystems. It is definitely applications'
> > responsibility to check the return value of read/write syscalls.
> 
> BTW, IIUC the dirty regular page cache (storage backed) would be left
> in the page cache too, the clean page cache would be truncated since
> they can be just reread from storage, right?

A dirty page cache is also removed on error (me_pagecache_dirty() falls
through me_pagecache_clean(), then truncate_error_page() is called).
The main purpose of this is to separate off the error page from exising
data structures to minimize the risk of later accesses (maybe by race or bug).
But we can change this behavior for specific file systems by updating
error_remove_page() callbacks in address_space_operation.

Honestly, it seems to me that how dirty data is lost does not depend on
file system, and I'm still not sure that this is really a right approach
for the current issue.

Thanks,
Naoya Horiguchi

  reply	other threads:[~2021-08-27  3:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-24 22:13 Yang Shi
2021-08-26  6:17 ` HORIGUCHI NAOYA(堀口 直也)
2021-08-26 20:03   ` Yang Shi
2021-08-26 22:03     ` Yang Shi
2021-08-27  3:57       ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-08-27  5:02         ` Yang Shi
2021-08-30 23:44           ` Yang Shi
2021-09-02  3:07             ` HORIGUCHI NAOYA(堀口 直也)
2021-09-02 18:32               ` Yang Shi
2021-09-03 11:53                 ` HORIGUCHI NAOYA(堀口 直也)
2021-09-03 18:01                   ` Yang Shi
2021-09-04  0:03                     ` Yang Shi
2021-09-07 21:34                       ` Yang Shi
2021-09-08  2:50                         ` ##freemail## " HORIGUCHI NAOYA(堀口 直也)
2021-09-08  3:14                           ` Yang Shi
2021-09-08  4:25                             ` HORIGUCHI NAOYA(堀口 直也)
2021-09-09 23:07                               ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210827035739.GA3247360@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=shy828301@gmail.com \
    --subject='Re: [PATCH] mm: hwpoison: deal with page cache THP' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).