LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: Ingo Molnar <mingo@elte.hu>
Cc: linux-kernel@vger.kernel.org, Yinghai Lu <yhlu.kernel@gmail.com>,
	Jacob Shin <jacob.shin@amd.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Alexander Viro <viro@ftp.linux.org.uk>
Subject: Re: [PATCH 2/5] x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c
Date: Mon, 28 Jan 2008 09:37:06 -0800	[thread overview]
Message-ID: <20080128173706.GA17495@suse.de> (raw)
In-Reply-To: <20080128122434.GA24757@elte.hu>

On Mon, Jan 28, 2008 at 01:24:34PM +0100, Ingo Molnar wrote:
> 
> * Greg Kroah-Hartman <gregkh@suse.de> wrote:
> 
> > This problem is due to the kobject rework recently done in this file.
> > 
> > The mce_amd_64.c code uses some wierd forward calls to back out of the 
> > recursive way the code creates kobjects.  Because of this, we need to 
> > verify that we have really created a kobject before calling 
> > kobject_uevent().
> > 
> > Many thanks to Yinghai Lu <yhlu.kernel@gmail.com> for reporting the 
> > problem and testing.
> 
> > -	kobject_uevent(&b->kobj, KOBJ_ADD);
> > +	if (b)
> > +		kobject_uevent(&b->kobj, KOBJ_ADD);
> 
> Acked-by: Ingo Molnar <mingo@elte.hu>
> 
> and please let me insert a minor kobject rant here: i do think it's way 
> too hard to figure out relatively minor-looking kobj bugs like this. It 
> took serious dedication from Yinghai Lu and yourself to nail this down, 
> and we wont always have that much luck in the future.
> 
> CONFIG_DEBUG_KOBJECT is way too feeble and does not actually attempt to 
> catch bugs like this. The effects of the bug were quite serious, and 
> this hasnt been the first time such hard-to-find kobj bugs occured - 
> every one of those incidents was in essence unnecessary.
> 
> Couldnt DEBUG_KOBJECT do better than this - just like we have list 
> debugging, PAGEALLOC and all the other debug checks?

In fact, it is exactly what enabled me to catch this bug.  The fact that
the kobject_uevent() debugging statment did not print out anything, and
oopsed, showed me that this was a simpler problem than I was originally
thinking it was.

Problem was we were passing in 0x18 for a kobject pointer to the
kobject_uevent() function.  Even checking for NULL there would not have
caught this.

I have added a lot more "robustness" checks to the kobject core now, see
the lkml messages about the maple bus for examples of where it is
catching real problems already.  And the kobject debugging code is now
"unified", printing out everything in a standard, easy to understand
manner.

I really don't want to get into adding a "magic value" to a kobject
field, and then checking it for every call, that too would have died on
this kind of bug, just like the debug printk did :)

But if there's anything that you think I can add to make it easier to
understand, please let me know.

The root problem of this bug was us using a goto to call forward into a
major code block within a function.  Not just using a goto for error
cleanup, like the kernel code "normally" does.  Because of that, I
totally missed this code path when reading the function many times.
It's nasty code complexity for no reason at all that causes more
problems than is needed, combined with a total lack of documentation for
how this kobject userspace interface is supposed to be used, that needs
to be fixed.

thanks,

greg k-h

  reply	other threads:[~2008-01-28 17:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-27 23:37 [GIT PATCH] driver core fixes against 2.6.24 Greg KH
2008-01-27 23:38 ` [PATCH 1/5] Driver core: Fix up build when CONFIG_BLOCK=N Greg Kroah-Hartman
2008-01-27 23:38 ` [PATCH 2/5] x86: fix runtime error in arch/x86/kernel/cpu/mcheck/mce_amd_64.c Greg Kroah-Hartman
2008-01-28 12:24   ` Ingo Molnar
2008-01-28 17:37     ` Greg KH [this message]
2008-01-28 17:57       ` Ingo Molnar
2008-01-28 18:32         ` Greg KH
2008-01-28 19:01           ` Ingo Molnar
2008-01-28 19:29             ` Cyrill Gorcunov
2008-01-28 19:42             ` Cyrill Gorcunov
2008-01-27 23:38 ` [PATCH 3/5] Module: check to see if we have a built in module with the same name Greg Kroah-Hartman
2008-01-28 23:54   ` Jan Engelhardt
2008-01-29  6:20   ` Rusty Russell
2008-01-27 23:38 ` [PATCH 4/5] Driver core: add bus_find_device_by_name function Greg Kroah-Hartman
2008-01-27 23:38 ` [PATCH 5/5] PPC: Fix powerpc vio_find_name to not use devices_subsys Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080128173706.GA17495@suse.de \
    --to=gregkh@suse.de \
    --cc=jacob.shin@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ftp.linux.org.uk \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).