Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Luis Chamberlain <firstname.lastname@example.org>
To: "email@example.com" <firstname.lastname@example.org>
Cc: David Laight <David.Laight@aculab.com>,
Douglas Gilbert <email@example.com>,
Hannes Reinecke <firstname.lastname@example.org>,
Subject: Re: [PATCH] kernel/module: add documentation for try_module_get()
Date: Tue, 27 Jul 2021 11:18:03 -0700 [thread overview]
Message-ID: <YQBN2/K4Ne5orgzS@bombadil.infradead.org> (raw)
On Tue, Jul 27, 2021 at 07:46:34PM +0200, email@example.com wrote:
> On Tue, Jul 27, 2021 at 10:30:36AM -0700, Luis Chamberlain wrote:
> > On Sat, Jul 24, 2021 at 12:15:10PM +0000, David Laight wrote:
> > > From: Luis Chamberlain
> > > > Sent: 22 July 2021 23:19
> > > >
> > > > There is quite a bit of tribal knowledge around proper use of
> > > > try_module_get() and that it must be used only in a context which
> > > > can ensure the module won't be gone during the operation. Document
> > > > this little bit of tribal knowledge.
> > > >
> > > ...
> > >
> > > Some typos.
> > >
> > > > +/**
> > > > + * try_module_get - yields to module removal and bumps reference count otherwise
> > > > + * @module: the module we should check for
> > > > + *
> > > > + * This can be used to check if userspace has requested to remove a module,
> > > a module be removed
> > > > + * and if so let the caller give up. Otherwise it takes a reference count to
> > > > + * ensure a request from userspace to remove the module cannot happen.
> > > > + *
> > > > + * Care must be taken to ensure the module cannot be removed during
> > > > + * try_module_get(). This can be done by having another entity other than the
> > > > + * module itself increment the module reference count, or through some other
> > > > + * means which gaurantees the module could not be removed during an operation.
> > > guarantees
> > > > + * An example of this later case is using this call in a sysfs file which the
> > > > + * module created. The sysfs store / read file operation is ensured to exist
> > > ^^^^^^^^^^^^^^^^^^^
> > > Not sure what that is supposed to mean.
> > I'll clarify further. How about:
> > The sysfs store / read file operations are gauranteed to exist using
> > kernfs's active reference (see kernfs_active()).
> But that has nothing to do with module reference counts. kernfs knows
> nothing about modules.
Yes but we are talking about sysfs files which the module creates. So
but inference again, an active reference protects a module.
> > > So there is a potentially horrid race:
> > > The module unload is going to do:
> > > driver_data->module_ref = 0;
> > > and elsewhere there'll be:
> > > ref = driver_data->module_ref;
> > > if (!ref || !try_module_get(ref))
> > > return -error;
> > >
> > > You have to have try_module_get() to allow the module unload
> > > function to sleep.
> > > But the above code still needs a driver lock to ensure the
> > > unload code doesn't race with the try_module_get() and the
> > > 'ref' be invalidated before try_module_get() looks at it.
> > > (eg if an interrupt defers processing.)
> > >
> > > So there can be no 'yielding'.
> > Oh but there is. Consider access to a random sysfs file 'add_new_device'
> > which takes as input a name, for driver foo, and so foo's
> > add_new_foobar_device(name="bar") is called. Unless sysfs file
> > "yields" by using try_module_get() before trying to add a new
> > foo device called "bar", it will essentially be racing with the
> > exit routine of module foo, and depending on how locking is implemented
> > (most drivers get it wrong), this easily leads to crashes.
> > In fact, this documentation patch was motivated by my own solution to a
> > possible deadlock when sysfs is used. Using the same example above, if
> > the same sysfs file uses *any* lock, which is *also* used on the exit
> > routine, you can easily trigger a deadlock. This can happen for example
> > by the lock being obtained by the removal routine, then the sysfs file
> > gets called, waits for the lock to complete, then the module's exit
> > routine starts cleaning up and removing sysfs files, but we won't be
> > able to remove the sysfs file (due to kernefs active reference) until
> > the sysfs file complets, but it cannot complete because the lock is
> > already held.
> > Yes, this is a generic problem. Yes I have proof . Yes, a generic
> > solution has been proposed , and because Greg is not convinced and I
> > need to move on with life, I am suggesting a temporary driver specific
> > solution (to which Greg is still NACK'ing, without even proposing any
> > alternatives) .
> >  https://firstname.lastname@example.org
> >  https://lkml.kernel.org/r/20210401235925.GR4332@42.do-not-panic.com
> >  https://lkml.kernel.org/r/20210723174919.ka3tzyre432uilf7@garbanzo
> My problem with your proposed solution is that it is still racy, you can
> not increment your own module reference count from 0 -> 1 and expect it
> to work properly. You need external code to do that somewhere.
You are not providing *any* proof for this. And even so, I believe I
have clarified as best as possible how a kernfs active reference
implicitly protects the module when we are talking about sysfs files.
> Now trying to tie sysfs files to the modules that own them would be
> nice, but as we have seen, that way lies way too many kernel changes,
It's not a one-liner fix. Yes.
> Hm, maybe. Did we think about this from the kobj_attribute level? If
> we use the "wrapper" logic there and the use of the macros we already
> have for attributes, we might be able to get the module pointer directly
> "for free".
> Did we try that?
That was my hope. I tried that first. Last year in November I determined
kernfs is kobject stupid. But more importantly *neither* are struct device
specific, so neither of them have semantics for modules or even devices.
> this thread has been going on for so long I can't
> remember anymore...
next prev parent reply other threads:[~2021-07-27 18:18 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-22 22:19 Luis Chamberlain
2021-07-22 22:39 ` Stephen Hemminger
2021-07-23 2:33 ` Bart Van Assche
2021-07-24 12:15 ` David Laight
2021-07-27 17:30 ` Luis Chamberlain
2021-07-27 17:46 ` gregkh
2021-07-27 18:18 ` Luis Chamberlain [this message]
2021-07-27 18:38 ` gregkh
2021-07-27 20:54 ` Luis Chamberlain
2021-07-28 8:28 ` David Laight
2021-07-28 13:49 ` Luis Chamberlain
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--subject='Re: [PATCH] kernel/module: add documentation for try_module_get()' \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).