LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [GIT PULL] tracing: Fixes to bootconfig memory management
Date: Wed, 15 Sep 2021 11:28:26 +0200	[thread overview]
Message-ID: <8a32b437-4cea-f265-b26e-509466d5290b@suse.cz> (raw)
In-Reply-To: <CAHk-=wimTmUcYC_BPvwv-48OFwpzJhzrX-_9afk--ND6en81Xg@mail.gmail.com>

On 9/15/21 01:29, Linus Torvalds wrote:
> On Tue, Sep 14, 2021 at 3:48 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> Well, looks like I can't. Commit 77e02cf57b6cf does boot fine for me,
>> multiple times. But so now does the parent commit 6a4746ba06191. Looks like
>> the magic is gone. I'm now surprised how deterministic it was during the
>> bisect (most bad cases manifested on first boot, only few at second).
> 
> Well, your report was clearly memory corruption by the invalid
> memblock_free() just ending up causing random problems later on.

> So it could easily be 100% deterministic with a certain memory layout
> at a particular commit. And then enough other changes later, and it's
> all gone, because the memory corruption now hits something else that
> didn't even care.
> 
> The code for your oops was
> 
>    0: 48 8b 17              mov    (%rdi),%rdx
>    3: 48 39 d7              cmp    %rdx,%rdi
>    6: 74 43                je     0x4b
>    8: 48 8b 47 08          mov    0x8(%rdi),%rax
>    c: 48 85 c0              test   %rax,%rax
>    f: 74 23                je     0x34
>   11: 49 89 c0              mov    %rax,%r8
>   14:* 48 8b 40 10          mov    0x10(%rax),%rax <-- trapping instruction
> 
> and that's the start of rb_next(), so what's going on is that
> "rb->rb_right" (the second word of 'struct rb_node') ends up having
> that value in %rax:
> 
>   RAX: 343479726f6d656d
> 
> which is ASCII "44yromem" rather than a valid pointer if I looked that up right.

Yep, I was pretty sure it was related to the
"/sys/bus/memory/devices/memory44" sysfs object and bisection would lead to
kobject/sysfs or some memory hotplug related changes. So the result was a
surprise.

> And just _slightly_ different allocation patterns, and your 'struct
> rb_node' gets allocated somewhere else, and you don't see the oops at
> all, or you get it later in some different place.
> 
> Most memory corruption doesn't cause oopses, because most memory isn't
> used as pointers etc.
> 
> What you _could_ try if you care enough is
> 
>  - go back to the thing you bisectted to where you can still hopefully
> recreate the problem
> 
>  - apply that patch at that point with no other changes
> 
> and then the test would hopefully be closer to the state you could
> re-create the problem.
> 
> And hopefully it would still not reproduce, just because the bug is
> fixed, of course ;)

Yeah, that worked! Commit 40caa127f3c7 was still broken, and cherry-pick of
77e02cf57b6cf on top fixed it. Thanks!

> The very unlikely alternative is that your bisect was just pure random
> bad luck and hit the wrong commit entirely, and the oops was due to
> some other problem.
> 
> But it does seem unlikely to be something else. Usually when bisects
> go off into the weeds due to not being reproducible, they go very
> obviously off into the weeds rather than point to something that ends
> up having a very similar bug.
> 
>            Linus
> 


  reply	other threads:[~2021-09-15  9:28 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 14:56 Steven Rostedt
2021-09-14 18:01 ` Linus Torvalds
2021-09-14 18:59   ` Steven Rostedt
2021-09-14 19:05     ` Linus Torvalds
2021-09-14 19:14       ` Steven Rostedt
2021-09-14 19:23       ` Linus Torvalds
2021-09-14 19:38         ` Linus Torvalds
2021-09-14 20:48           ` Linus Torvalds
2021-09-14 21:05             ` Steven Rostedt
2021-09-14 22:47               ` Vlastimil Babka
2021-09-14 23:29                 ` Linus Torvalds
2021-09-15  9:28                   ` Vlastimil Babka [this message]
2021-09-14 23:44               ` Masami Hiramatsu
2021-09-17 20:10   ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a32b437-4cea-f265-b26e-509466d5290b@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --subject='Re: [GIT PULL] tracing: Fixes to bootconfig memory management' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).