LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>,
	Oscar Salvador <osalvador@suse.de>,
	Muchun Song <songmuchun@bytedance.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 1/2] mm,hwpoison: fix race with hugetlb page allocation
Date: Fri, 13 Aug 2021 06:29:51 +0000	[thread overview]
Message-ID: <20210813062951.GA203438@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20210812152548.GA1579021@agluck-desk2.amr.corp.intel.com>

On Thu, Aug 12, 2021 at 08:25:48AM -0700, Luck, Tony wrote:
> On Thu, Aug 12, 2021 at 09:03:04AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > Sorry for the failures. I think that the following patch (and dependencies)
> > should solve the issue.
> > https://lore.kernel.org/linux-mm/20210614021212.223326-6-nao.horiguchi@gmail.com/.
> > I'll submit the update (maybe the patchset will be smaller by feedbacks)
> > later soon.
>
> I was uncertain about which dependencies you meant. So I followed
> the advice in the cover letter for the patch series containing that
> patch and did:
>
> $ git fetch https://github.com/nhoriguchi/linux hwpoison
>
> This kernel still has some odd issues and the poison page
> did not get unmapped from my test user application.
>
> See git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git
> for my test program. In this case I was just running with default settings
> to inject an error into a user data page and then consume it.
>
> Here's the dmesg output. There are multiple calls to memory_failure()
> because the poison address is signalled both by the memory controller
> (CMCI with UCNA signature) and the DCU (#MC with SRAR signature).

Thanks for the details.

The following dmesg implies that the failure happened in einj_mem_uc
which has 13 sub-testcases (specified by "testname"). In which one
the "unknonw page" event was triggered?

>
> Note that the first message says: "recovery action for unknown page: Ignored"
>
> [   70.331253] EINJ: Error INJection is initialized.
> [   76.949490] process '/aegl/ras-tools/einj_mem_uc' started with executable stack
> [   77.481846] Disabling lock debugging due to kernel taint
> [   77.482004] mce: [Hardware Error]: Machine check events logged
> [   77.487176] mce: Uncorrected hardware memory error in user-access at 7e025e400
> [   77.493225] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
> [   77.508704] {1}[Hardware Error]: event severity: recoverable
> [   77.514361] {1}[Hardware Error]:  Error 0, type: recoverable
> [   77.520011] {1}[Hardware Error]:  fru_text: Card01, ChnG, DIMM0
> [   77.525921] {1}[Hardware Error]:   section_type: memory error
> [   77.531659] {1}[Hardware Error]:   error_status: 0x0000000000000400

Accourding to https://www.kernel.org/doc/html/latest/firmware-guide/acpi/apei/einj.html
this error status means "Platform Uncorrectable non-fatal", so memory_failure()
is called with MF_ACTION_REQUIRED set?

> [   77.537914] {1}[Hardware Error]:   physical_address: 0x00000007e025e400
> [   77.544518] {1}[Hardware Error]:   node: 0 card: 6 module: 0 rank: 1 bank: 8 device: 0 row: 15105 column: 896
> [   77.554503] {1}[Hardware Error]:   error_type: 4, single-symbol chipkill ECC
> [   77.561548] {1}[Hardware Error]:   DIMM location: NODE 3 CPU0_DIMM_G1
> [   77.568135] Memory failure: 0x7e025e: recovery action for unknown page: Ignored

This "unknown page" should come from the following line in memory_failure():

   int memory_failure(unsigned long pfn, int flags)
   ...
           if (!(flags & MF_COUNT_INCREASED)) {
                   res = get_hwpoison_page(p, flags);
                   if (!res) {
                           ...  // code for HWPoisonHandlable pages.
                   } else if (res < 0) {
                           action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED); /// HERE
                           res = -EBUSY;
                           goto unlock_mutex;
                   }
           }

This path is chosen when HWPoisonHandlable() returns false in get_hwpoison_page()
and HWPoisonHandlable() is like this in the current version:

    static inline bool HWPoisonHandlable(struct page *page)
    {
            return PageLRU(page) || __PageMovable(page) ||
                    PageSlab(page) || PageTable(page) || PageReserved(page);
    }

So I wonder why HWPoisonHandlable() returned false in your case ("to inject
an error into a user data page and then consume it" sounds a basic testcase
so it's expected to be detected by PageLRU() or __PageMovable().)

I think that we are going to the direction of stopping taking refcount of
error pages blindly to reduce the risk of critical races.  In order to
improve HWPoisonHandlable() check instead of reverting it, I'd like to
understand error page's state in the failure case.  Could you try testing
again with inserting a dump_page() like below?  (I'll try to reproduce by
myself somehow, but it might be hard because I have no machine with firmware
supporting EINJ.)

    @@ -1703,6 +1703,7 @@ int memory_failure(unsigned long pfn, int flags)
                            goto unlock_mutex;
                    } else if (res < 0) {
                            action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
    +                       dump_page(p, "hwpoison unknown page");
                            res = -EBUSY;
                            goto unlock_mutex;
                    }

Thanks,
Naoya Horiguchi

> [   77.575445] Memory failure: 0x7e025e: already hardware poisoned
> [   77.627894] EDAC skx MC3: HANDLING MCE MEMORY ERROR
> [   77.633600] EDAC skx MC3: CPU 0: Machine Check Event: 0x0 Bank 25: 0xac00000200a00090
> [   77.633601] EDAC skx MC3: TSC 0x18d57ce28f4f25
> [   77.633602] EDAC skx MC3: ADDR 0x7e025e400
> [   77.633603] EDAC skx MC3: MISC 0x9000201d809c086
> [   77.633603] EDAC skx MC3: PROCESSOR 0:0x606a6 TIME 1628780833 SOCKET 0 APIC 0x0
> [   77.633608] EDAC MC3: 1 UE memory read error on CPU_SrcID#0_MC#3_Chan#0_DIMM#0 (channel:0 slot:0 page:0x7e025e offset:0x400 grain:32 -  err_code:0x00a0:0x0090  SystemAddress:0x7e025e400 ProcessorSocketId:0x0 MemoryControllerId:0x3 ChannelAddress:0xec04bc00 ChannelId:0x0 RankAddress:0x76025c00 PhysicalRankId:0x1 DimmSlotId:0x0 Row:0x3b01 Column:0x380 Bank:0x0 BankGroup:0x2 ChipSelect:0x1 ChipId:0x0)
> [   77.633611] Memory failure: 0x7e025e: Sending SIGBUS to einj_mem_uc:12283 due to hardware memory corruption
> [   77.668827] mce: [Hardware Error]: Machine check events logged
> [   77.678605] Memory failure: 0x7e025e: already hardware poisoned
> [   77.736685] EDAC skx MC3: HANDLING MCE MEMORY ERROR
> [   77.742392] EDAC skx MC3: CPU 0: Machine Check Event: 0x0 Bank 255: 0xb40000000000009f
> [   77.742394] EDAC skx MC3: TSC 0x0
> [   77.742394] EDAC skx MC3: ADDR 0x7e025e400
> [   77.742395] EDAC skx MC3: MISC 0x0
> [   77.742395] EDAC skx MC3: PROCESSOR 0:0x606a6 TIME 1628780833 SOCKET 0 APIC 0x0
> [   77.742397] EDAC MC3: 1 UE memory read error on CPU_SrcID#0_MC#3_Chan#0_DIMM#0 (channel:0 slot:0 page:0x7e025e offset:0x400 grain:32 -  err_code:0x0000:0x009f  SystemAddress:0x7e025e400 ProcessorSocketId:0x0 MemoryControllerId:0x3 ChannelAddress:0xec04bc00 ChannelId:0x0 RankAddress:0x76025c00 PhysicalRankId:0x1 DimmSlotId:0x0 Row:0x3b01 Column:0x380 Bank:0x0 BankGroup:0x2 ChipSelect:0x1 ChipId:0x0)
> [   77.777612] Memory failure: 0x7e025e: already hardware poisoned
>
> -Tony
>

  reply	other threads:[~2021-08-13  6:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 23:36 [PATCH v6 0/2] hwpoison: " Naoya Horiguchi
2021-06-03 23:36 ` [PATCH v6 1/2] mm,hwpoison: " Naoya Horiguchi
2021-06-04 23:55   ` Mike Kravetz
2021-08-12  4:28   ` Luck, Tony
2021-08-12  9:03     ` HORIGUCHI NAOYA(堀口 直也)
2021-08-12 15:25       ` Luck, Tony
2021-08-13  6:29         ` HORIGUCHI NAOYA(堀口 直也) [this message]
2021-08-13 15:07           ` Luck, Tony
2021-08-16 17:12             ` Naoya Horiguchi
2021-08-16 17:56               ` Luck, Tony
2021-08-17  5:40                 ` HORIGUCHI NAOYA(堀口 直也)
2021-06-03 23:36 ` [PATCH v6 2/2] mm,hwpoison: make get_hwpoison_page() call get_any_page() Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210813062951.GA203438@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=songmuchun@bytedance.com \
    --cc=tony.luck@intel.com \
    --subject='Re: [PATCH v6 1/2] mm,hwpoison: fix race with hugetlb page allocation' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).