LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Mike Rapoport <rppt@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: [GIT PULL] tracing: Fixes to bootconfig memory management
Date: Wed, 15 Sep 2021 11:28:26 +0200 [thread overview]
Message-ID: <8a32b437-4cea-f265-b26e-509466d5290b@suse.cz> (raw)
In-Reply-To: <CAHk-=wimTmUcYC_BPvwv-48OFwpzJhzrX-_9afk--ND6en81Xg@mail.gmail.com>
On 9/15/21 01:29, Linus Torvalds wrote:
> On Tue, Sep 14, 2021 at 3:48 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> Well, looks like I can't. Commit 77e02cf57b6cf does boot fine for me,
>> multiple times. But so now does the parent commit 6a4746ba06191. Looks like
>> the magic is gone. I'm now surprised how deterministic it was during the
>> bisect (most bad cases manifested on first boot, only few at second).
>
> Well, your report was clearly memory corruption by the invalid
> memblock_free() just ending up causing random problems later on.
> So it could easily be 100% deterministic with a certain memory layout
> at a particular commit. And then enough other changes later, and it's
> all gone, because the memory corruption now hits something else that
> didn't even care.
>
> The code for your oops was
>
> 0: 48 8b 17 mov (%rdi),%rdx
> 3: 48 39 d7 cmp %rdx,%rdi
> 6: 74 43 je 0x4b
> 8: 48 8b 47 08 mov 0x8(%rdi),%rax
> c: 48 85 c0 test %rax,%rax
> f: 74 23 je 0x34
> 11: 49 89 c0 mov %rax,%r8
> 14:* 48 8b 40 10 mov 0x10(%rax),%rax <-- trapping instruction
>
> and that's the start of rb_next(), so what's going on is that
> "rb->rb_right" (the second word of 'struct rb_node') ends up having
> that value in %rax:
>
> RAX: 343479726f6d656d
>
> which is ASCII "44yromem" rather than a valid pointer if I looked that up right.
Yep, I was pretty sure it was related to the
"/sys/bus/memory/devices/memory44" sysfs object and bisection would lead to
kobject/sysfs or some memory hotplug related changes. So the result was a
surprise.
> And just _slightly_ different allocation patterns, and your 'struct
> rb_node' gets allocated somewhere else, and you don't see the oops at
> all, or you get it later in some different place.
>
> Most memory corruption doesn't cause oopses, because most memory isn't
> used as pointers etc.
>
> What you _could_ try if you care enough is
>
> - go back to the thing you bisectted to where you can still hopefully
> recreate the problem
>
> - apply that patch at that point with no other changes
>
> and then the test would hopefully be closer to the state you could
> re-create the problem.
>
> And hopefully it would still not reproduce, just because the bug is
> fixed, of course ;)
Yeah, that worked! Commit 40caa127f3c7 was still broken, and cherry-pick of
77e02cf57b6cf on top fixed it. Thanks!
> The very unlikely alternative is that your bisect was just pure random
> bad luck and hit the wrong commit entirely, and the oops was due to
> some other problem.
>
> But it does seem unlikely to be something else. Usually when bisects
> go off into the weeds due to not being reproducible, they go very
> obviously off into the weeds rather than point to something that ends
> up having a very similar bug.
>
> Linus
>
next prev parent reply other threads:[~2021-09-15 9:28 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-14 14:56 Steven Rostedt
2021-09-14 18:01 ` Linus Torvalds
2021-09-14 18:59 ` Steven Rostedt
2021-09-14 19:05 ` Linus Torvalds
2021-09-14 19:14 ` Steven Rostedt
2021-09-14 19:23 ` Linus Torvalds
2021-09-14 19:38 ` Linus Torvalds
2021-09-14 20:48 ` Linus Torvalds
2021-09-14 21:05 ` Steven Rostedt
2021-09-14 22:47 ` Vlastimil Babka
2021-09-14 23:29 ` Linus Torvalds
2021-09-15 9:28 ` Vlastimil Babka [this message]
2021-09-14 23:44 ` Masami Hiramatsu
2021-09-17 20:10 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a32b437-4cea-f265-b26e-509466d5290b@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=torvalds@linux-foundation.org \
--subject='Re: [GIT PULL] tracing: Fixes to bootconfig memory management' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).