LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC PATCH 0/2] driver core: kick deferred probe from delayed context
@ 2021-08-17 19:00 Pierre-Louis Bossart
  2021-08-17 19:00 ` [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger() Pierre-Louis Bossart
  2021-08-17 19:00 ` [RFC PATCH 2/2] ASoC: SOF: trigger re-probing of deferred devices from workqueue Pierre-Louis Bossart
  0 siblings, 2 replies; 17+ messages in thread
From: Pierre-Louis Bossart @ 2021-08-17 19:00 UTC (permalink / raw)
  To: alsa-devel
  Cc: tiwai, broonie, vkoul, liam.r.girdwood, Andy Shevchenko,
	Dan Williams, Jason Gunthorpe, Christoph Hellwig,
	Greg Kroah-Hartman, Rafael J . Wysocki, linux-kernel,
	Pierre-Louis Bossart

The deferred probe mechanism uses a successful driver probe/attach as
a trigger to revisit the list of deferred probe devices. This works in
most cases, except when the probe success is not a valid indicator of
resources being available.

In that case, a race condition may occur, where the device/driver core
framework will attempt to probe a device that depends on resources
before those resources are available, resulting in a -EPROBE_DEFER
error and a deferred probe device that will never be initialized.

The example provided in this RFC relies on the probe workqueue used
for the HDaudio support where we simultaneously:
a) need to use request_module()
b) cannot use an async probe due to the use of request_module()
c) cannot block the probe of other drivers
In this example, the deferred probe can be kicked when the workqueue
completes.

The use of request_firmware_nowait() is another conceptual example,
where a domain-specific callback can enable resources *after* the
probe returns, for example by downloading the firmware, booting a
processor and waiting for the processor to be ready for interaction
with the Linux host. In this second example, the deferred probe could
be kicked when the 'cont' callback completes.

This patchset suggests a 7-line change to solve race conditions in
these examples with delayed work.

Discussion:

a) During Intel internal reviews, Andy Shevchenko pointed out another
known issue with deferred probe [1]. This patchset is unrelated and
does not claim to solve the problem raised by Andy.

b) one possible objection is that this patchset does not suppress a
possibly unnecessary round of evaluation of deferred probe devices. It
did not feel necessary to any of us to minimize the occurrences of
EPROBE_DEFER but instead to make sure the device waiting for
resources successfully probes in the end.

c) another objection might be that the driver core should know about
such dependencies. It would be desirable but in the cases we've
encountered such dependencies are highly domain-specific and not
necessarily straightforward to describe. There's been multiple
endeavors to improve the description of dependencies, this patchset
only focuses on the deferred probe framework, with an improvement when
the provider of resources makes these resources available after its
probe returns.

[1] https://lore.kernel.org/lkml/20200324175719.62496-1-andriy.shevchenko@linux.intel.com/T/#u

Pierre-Louis Bossart (2):
  driver core: export driver_deferred_probe_trigger()
  ASoC: SOF: trigger re-probing of deferred devices from workqueue

 drivers/base/dd.c             | 3 ++-
 include/linux/device/driver.h | 1 +
 sound/soc/sof/core.c          | 3 +++
 3 files changed, 6 insertions(+), 1 deletion(-)


base-commit: 8d1998893cd5e3488cd95529f60a187e3009d14b
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-17 19:00 [RFC PATCH 0/2] driver core: kick deferred probe from delayed context Pierre-Louis Bossart
@ 2021-08-17 19:00 ` Pierre-Louis Bossart
  2021-08-18  5:44   ` Greg Kroah-Hartman
  2021-08-17 19:00 ` [RFC PATCH 2/2] ASoC: SOF: trigger re-probing of deferred devices from workqueue Pierre-Louis Bossart
  1 sibling, 1 reply; 17+ messages in thread
From: Pierre-Louis Bossart @ 2021-08-17 19:00 UTC (permalink / raw)
  To: alsa-devel
  Cc: tiwai, broonie, vkoul, liam.r.girdwood, Andy Shevchenko,
	Dan Williams, Jason Gunthorpe, Christoph Hellwig,
	Greg Kroah-Hartman, Rafael J . Wysocki, linux-kernel,
	Pierre-Louis Bossart, Geert Uytterhoeven

The premise of the deferred probe implementation is that a successful
driver binding is a proxy for the resources provided by this driver
becoming available. While this is a correct assumption in most of the
cases, there are exceptions to the rule such as

a) the use of request_firmware_nowait(). In this case, the resources
may become available when the 'cont' callback completes, for example
when if the firmware needs to be downloaded and executed on a SoC
core or DSP.

b) a split implementation of the probe with a workqueue when one or
ore request_module() calls are required: a synchronous probe prevents
other drivers from probing, impacting boot time, and an async probe is
not allowed to avoid a deadlock. This is the case on all Intel audio
platforms, with request_module() being required for the i915 display
audio and HDaudio external codecs.

In these cases, there is no way to notify the deferred probe
infrastructure of the enablement of resources after the driver
binding.

The driver_deferred_probe_trigger() function is currently used
'anytime a driver is successfully bound to a device', this patch
suggest exporing by exporting it so that drivers can kick-off
re-probing of deferred devices at the end of a deferred processing.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 drivers/base/dd.c             | 3 ++-
 include/linux/device/driver.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 437cd61343b2..33eca45aa65a 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -171,7 +171,7 @@ static bool driver_deferred_probe_enable = false;
  * changes in the midst of a probe, then deferred processing should be triggered
  * again.
  */
-static void driver_deferred_probe_trigger(void)
+void driver_deferred_probe_trigger(void)
 {
 	if (!driver_deferred_probe_enable)
 		return;
@@ -193,6 +193,7 @@ static void driver_deferred_probe_trigger(void)
 	 */
 	queue_work(system_unbound_wq, &deferred_probe_work);
 }
+EXPORT_SYMBOL_GPL(driver_deferred_probe_trigger);
 
 /**
  * device_block_probing() - Block/defer device's probes
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index a498ebcf4993..2eec79d752a9 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -240,6 +240,7 @@ extern int driver_deferred_probe_timeout;
 void driver_deferred_probe_add(struct device *dev);
 int driver_deferred_probe_check_state(struct device *dev);
 void driver_init(void);
+void driver_deferred_probe_trigger(void);
 
 /**
  * module_driver() - Helper macro for drivers that don't do anything
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/2] ASoC: SOF: trigger re-probing of deferred devices from workqueue
  2021-08-17 19:00 [RFC PATCH 0/2] driver core: kick deferred probe from delayed context Pierre-Louis Bossart
  2021-08-17 19:00 ` [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger() Pierre-Louis Bossart
@ 2021-08-17 19:00 ` Pierre-Louis Bossart
  2021-08-18 12:07   ` Mark Brown
  1 sibling, 1 reply; 17+ messages in thread
From: Pierre-Louis Bossart @ 2021-08-17 19:00 UTC (permalink / raw)
  To: alsa-devel
  Cc: tiwai, broonie, vkoul, liam.r.girdwood, Andy Shevchenko,
	Dan Williams, Jason Gunthorpe, Christoph Hellwig,
	Greg Kroah-Hartman, Rafael J . Wysocki, linux-kernel,
	Pierre-Louis Bossart, Liam Girdwood, Ranjani Sridharan,
	Kai Vehmanen, Daniel Baluta, Jaroslav Kysela, Takashi Iwai,
	moderated list:SOUND - SOUND OPEN FIRMWARE (SOF) DRIVERS

Audio drivers such as HDaudio legacy and SOF rely on a workqueue to
split the probe into two, with a first pass returning success
immediately, and the second pass taking a lot more time due to the use
of request_module() and the DSP initializations.

This workqueue-based solution helps deal with conflicting requirements
a) other drivers should not be blocked by a long probe
b) a PROBE_PREFER_ASYNCHRONOUS probe_type is explicitly not allowed
to avoid a deadlock when request_module() is used.

This patch makes sure the deferred probe framework is triggered when
the provider of resources successfully completes its initialization.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 sound/soc/sof/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/sound/soc/sof/core.c b/sound/soc/sof/core.c
index 3e4dd4a86363..cecc0e914807 100644
--- a/sound/soc/sof/core.c
+++ b/sound/soc/sof/core.c
@@ -251,6 +251,9 @@ static int sof_probe_continue(struct snd_sof_dev *sdev)
 
 	sdev->probe_completed = true;
 
+	/* kick-off re-probing of deferred devices */
+	driver_deferred_probe_trigger();
+
 	return 0;
 
 fw_trace_err:
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-17 19:00 ` [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger() Pierre-Louis Bossart
@ 2021-08-18  5:44   ` Greg Kroah-Hartman
  2021-08-18 11:57     ` Mark Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-18  5:44 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, tiwai, broonie, vkoul, liam.r.girdwood,
	Andy Shevchenko, Dan Williams, Jason Gunthorpe,
	Christoph Hellwig, Rafael J . Wysocki, linux-kernel,
	Geert Uytterhoeven

On Tue, Aug 17, 2021 at 02:00:56PM -0500, Pierre-Louis Bossart wrote:
> The premise of the deferred probe implementation is that a successful
> driver binding is a proxy for the resources provided by this driver
> becoming available. While this is a correct assumption in most of the
> cases, there are exceptions to the rule such as
> 
> a) the use of request_firmware_nowait(). In this case, the resources
> may become available when the 'cont' callback completes, for example
> when if the firmware needs to be downloaded and executed on a SoC
> core or DSP.
> 
> b) a split implementation of the probe with a workqueue when one or
> ore request_module() calls are required: a synchronous probe prevents
> other drivers from probing, impacting boot time, and an async probe is
> not allowed to avoid a deadlock. This is the case on all Intel audio
> platforms, with request_module() being required for the i915 display
> audio and HDaudio external codecs.
> 
> In these cases, there is no way to notify the deferred probe
> infrastructure of the enablement of resources after the driver
> binding.

Then just wait for it to happen naturally?

> The driver_deferred_probe_trigger() function is currently used
> 'anytime a driver is successfully bound to a device', this patch
> suggest exporing by exporting it so that drivers can kick-off
> re-probing of deferred devices at the end of a deferred processing.

I really do not want to export this as it will get really messy very
quickly with different drivers/busses attempting to call this.

Either handle it in your driver (why do you have to defer probe at all,
just succeed and move on to register the needed stuff after you are
initialized) or rely on the driver core here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18  5:44   ` Greg Kroah-Hartman
@ 2021-08-18 11:57     ` Mark Brown
  2021-08-18 13:22       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Brown @ 2021-08-18 11:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pierre-Louis Bossart, alsa-devel, tiwai, vkoul, liam.r.girdwood,
	Andy Shevchenko, Dan Williams, Jason Gunthorpe,
	Christoph Hellwig, Rafael J . Wysocki, linux-kernel,
	Geert Uytterhoeven

[-- Attachment #1: Type: text/plain, Size: 2504 bytes --]

On Wed, Aug 18, 2021 at 07:44:39AM +0200, Greg Kroah-Hartman wrote:
> On Tue, Aug 17, 2021 at 02:00:56PM -0500, Pierre-Louis Bossart wrote:

> > In these cases, there is no way to notify the deferred probe
> > infrastructure of the enablement of resources after the driver
> > binding.

> Then just wait for it to happen naturally?

Through what mechanism will it happen naturally?  Deferred probe
currently only does things if things are being registered or if probes
complete.

> > The driver_deferred_probe_trigger() function is currently used
> > 'anytime a driver is successfully bound to a device', this patch
> > suggest exporing by exporting it so that drivers can kick-off
> > re-probing of deferred devices at the end of a deferred processing.

> I really do not want to export this as it will get really messy very
> quickly with different drivers/busses attempting to call this.

I'm not sure I see the mess here - it's just queueing some work, one of
the things that the workqueue stuff does well is handle things getting
scheduled while they're already queued.  Honestly having understood
their problem I think we need to be adding these calls into all the
resource provider APIs.

> Either handle it in your driver (why do you have to defer probe at all,
> just succeed and move on to register the needed stuff after you are
> initialized) or rely on the driver core here.

That's exactly what they're doing currently and the driver core isn't
delivering.

Driver A is slow to start up and providing a resource to driver B, this
gets handled in driver A by succeeding immediately and then registering
the resource once the startup has completed.  Unfortunately while that
was happening not only has driver B registered and deferred but the rest
of the probes/defers in the system have completed so the deferred probe
mechanism is idle.  Nothing currently tells the deferred probe mechanism
that a new resource is now available so it never retries the probe of
driver B.  The only way I can see to fix this without modifying the
driver core is to make driver A block during probe but that would at
best slow down boot.

The issue is that the driver core is using drivers completing probe as a
proxy for resources becoming available.  That works most of the time
because most probes are fully synchronous but it breaks down if a
resource provider registers resources outside of probe, we might still
be fine if system boot is still happening and something else probes but
only through luck.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] ASoC: SOF: trigger re-probing of deferred devices from workqueue
  2021-08-17 19:00 ` [RFC PATCH 2/2] ASoC: SOF: trigger re-probing of deferred devices from workqueue Pierre-Louis Bossart
@ 2021-08-18 12:07   ` Mark Brown
  2021-08-18 15:25     ` Pierre-Louis Bossart
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Brown @ 2021-08-18 12:07 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, tiwai, vkoul, liam.r.girdwood, Andy Shevchenko,
	Dan Williams, Jason Gunthorpe, Christoph Hellwig,
	Greg Kroah-Hartman, Rafael J . Wysocki, linux-kernel,
	Liam Girdwood, Ranjani Sridharan, Kai Vehmanen, Daniel Baluta,
	Jaroslav Kysela, Takashi Iwai,
	moderated list:SOUND - SOUND OPEN FIRMWARE (SOF) DRIVERS

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]

On Tue, Aug 17, 2021 at 02:00:57PM -0500, Pierre-Louis Bossart wrote:

> +++ b/sound/soc/sof/core.c
> @@ -251,6 +251,9 @@ static int sof_probe_continue(struct snd_sof_dev *sdev)
>  
>  	sdev->probe_completed = true;
>  
> +	/* kick-off re-probing of deferred devices */
> +	driver_deferred_probe_trigger();
> +

I think we should move this into snd_soc_register_component() - the same
issue could occur with any other component, the only other thing I can
see kicking in here is the machine driver registration but that ought to
kick probe itself anyway.  Or is there some other case here?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 11:57     ` Mark Brown
@ 2021-08-18 13:22       ` Greg Kroah-Hartman
  2021-08-18 13:48         ` Mark Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-18 13:22 UTC (permalink / raw)
  To: Mark Brown
  Cc: Pierre-Louis Bossart, alsa-devel, tiwai, vkoul, liam.r.girdwood,
	Andy Shevchenko, Dan Williams, Jason Gunthorpe,
	Christoph Hellwig, Rafael J . Wysocki, linux-kernel,
	Geert Uytterhoeven

On Wed, Aug 18, 2021 at 12:57:36PM +0100, Mark Brown wrote:
> On Wed, Aug 18, 2021 at 07:44:39AM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Aug 17, 2021 at 02:00:56PM -0500, Pierre-Louis Bossart wrote:
> 
> > > In these cases, there is no way to notify the deferred probe
> > > infrastructure of the enablement of resources after the driver
> > > binding.
> 
> > Then just wait for it to happen naturally?
> 
> Through what mechanism will it happen naturally?  Deferred probe
> currently only does things if things are being registered or if probes
> complete.
> 
> > > The driver_deferred_probe_trigger() function is currently used
> > > 'anytime a driver is successfully bound to a device', this patch
> > > suggest exporing by exporting it so that drivers can kick-off
> > > re-probing of deferred devices at the end of a deferred processing.
> 
> > I really do not want to export this as it will get really messy very
> > quickly with different drivers/busses attempting to call this.
> 
> I'm not sure I see the mess here - it's just queueing some work, one of
> the things that the workqueue stuff does well is handle things getting
> scheduled while they're already queued.  Honestly having understood
> their problem I think we need to be adding these calls into all the
> resource provider APIs.
> 
> > Either handle it in your driver (why do you have to defer probe at all,
> > just succeed and move on to register the needed stuff after you are
> > initialized) or rely on the driver core here.
> 
> That's exactly what they're doing currently and the driver core isn't
> delivering.
> 
> Driver A is slow to start up and providing a resource to driver B, this
> gets handled in driver A by succeeding immediately and then registering
> the resource once the startup has completed.  Unfortunately while that
> was happening not only has driver B registered and deferred but the rest
> of the probes/defers in the system have completed so the deferred probe
> mechanism is idle.  Nothing currently tells the deferred probe mechanism
> that a new resource is now available so it never retries the probe of
> driver B.  The only way I can see to fix this without modifying the
> driver core is to make driver A block during probe but that would at
> best slow down boot.
> 
> The issue is that the driver core is using drivers completing probe as a
> proxy for resources becoming available.  That works most of the time
> because most probes are fully synchronous but it breaks down if a
> resource provider registers resources outside of probe, we might still
> be fine if system boot is still happening and something else probes but
> only through luck.

The driver core is not using that as a proxy, that is up to the driver
itself or not.  All probe means is "yes, this driver binds to this
device, thank you!" for that specific bus/class type.  That's all, if
the driver needs to go off and do real work before it can properly
control the device, wonderful, have it go and do that async.

So if you know you should be binding to the device, great, kick off some
other work and return success from probe.  There's no reason you have to
delay or defer for no good reason, right?

But yes, if you do get new resources, the probe should be called again,
that's what the deferred logic is for (or is that the link logic, I
can't recall)  This shouldn't be a new thing, no needing to call the
driver core directly like this at all, it should "just happen", right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 13:22       ` Greg Kroah-Hartman
@ 2021-08-18 13:48         ` Mark Brown
  2021-08-18 14:51           ` Pierre-Louis Bossart
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Brown @ 2021-08-18 13:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pierre-Louis Bossart, alsa-devel, tiwai, vkoul, liam.r.girdwood,
	Andy Shevchenko, Dan Williams, Jason Gunthorpe,
	Christoph Hellwig, Rafael J . Wysocki, linux-kernel,
	Geert Uytterhoeven

[-- Attachment #1: Type: text/plain, Size: 2242 bytes --]

On Wed, Aug 18, 2021 at 03:22:19PM +0200, Greg Kroah-Hartman wrote:
> On Wed, Aug 18, 2021 at 12:57:36PM +0100, Mark Brown wrote:

> > The issue is that the driver core is using drivers completing probe as a
> > proxy for resources becoming available.  That works most of the time
> > because most probes are fully synchronous but it breaks down if a
> > resource provider registers resources outside of probe, we might still
> > be fine if system boot is still happening and something else probes but
> > only through luck.

> The driver core is not using that as a proxy, that is up to the driver
> itself or not.  All probe means is "yes, this driver binds to this
> device, thank you!" for that specific bus/class type.  That's all, if
> the driver needs to go off and do real work before it can properly
> control the device, wonderful, have it go and do that async.

Right, which is what is happening here - but the deferred probe
machinery in the core is reading more into the probe succeeding than it
should.

> So if you know you should be binding to the device, great, kick off some
> other work and return success from probe.  There's no reason you have to
> delay or defer for no good reason, right?

The driver that's deferring isn't the one that takes a long time to
probe - the driver that's deferring depends on the driver that takes a
long time to probe, it defers because the resource it needs isn't
available when it tries to probe as the slow device is still doing it's
thing asynchronously.  The problem is that the driver core isn't going
back and attempting to probe the deferred device again once the driver
that took a long time has provided resources.

> But yes, if you do get new resources, the probe should be called again,
> that's what the deferred logic is for (or is that the link logic, I
> can't recall)  This shouldn't be a new thing, no needing to call the
> driver core directly like this at all, it should "just happen", right?

How specifically does new resources becoming available directly cause
a new probe deferral run at the moment?  I can't see anything that
resource provider APIs are doing to say that a new resource has become
available, this patch is trying to provide something they can do.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 13:48         ` Mark Brown
@ 2021-08-18 14:51           ` Pierre-Louis Bossart
  2021-08-18 14:59             ` Dan Williams
  2021-08-18 15:28             ` Greg Kroah-Hartman
  0 siblings, 2 replies; 17+ messages in thread
From: Pierre-Louis Bossart @ 2021-08-18 14:51 UTC (permalink / raw)
  To: Mark Brown, Greg Kroah-Hartman
  Cc: alsa-devel, Rafael J . Wysocki, tiwai, linux-kernel,
	liam.r.girdwood, vkoul, Geert Uytterhoeven, Jason Gunthorpe,
	Dan Williams, Andy Shevchenko, Christoph Hellwig



>>> The issue is that the driver core is using drivers completing probe as a
>>> proxy for resources becoming available.  That works most of the time
>>> because most probes are fully synchronous but it breaks down if a
>>> resource provider registers resources outside of probe, we might still
>>> be fine if system boot is still happening and something else probes but
>>> only through luck.
> 
>> The driver core is not using that as a proxy, that is up to the driver
>> itself or not.  All probe means is "yes, this driver binds to this
>> device, thank you!" for that specific bus/class type.  That's all, if
>> the driver needs to go off and do real work before it can properly
>> control the device, wonderful, have it go and do that async.
> 
> Right, which is what is happening here - but the deferred probe
> machinery in the core is reading more into the probe succeeding than it
> should.

I think Greg was referring to the use of the PROBE_PREFER_ASYNCHRONOUS
probe type. We tried just that and got a nice WARN_ON because we are
using request_module() to deal with HDaudio codecs. The details are in
[1] but the kernel code is unambiguous...

        /*
	 * We don't allow synchronous module loading from async.  Module
	 * init may invoke async_synchronize_full() which will end up
	 * waiting for this task which already is waiting for the module
	 * loading to complete, leading to a deadlock.
	 */
	WARN_ON_ONCE(wait && current_is_async());


The reason why we use a workqueue is because we are otherwise painted in
a corner by conflicting requirements.

a) we have to use request_module()
b) we cannot use the async probe because of the request_module()
c) we have to avoid blocking on boot

I understand the resistance to exporting this function, no one in our
team was really happy about it, but no one could find an alternate
solution. If there is something better, I am all ears.

Thanks
-Pierre

[1] https://github.com/thesofproject/linux/pull/3079

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 14:51           ` Pierre-Louis Bossart
@ 2021-08-18 14:59             ` Dan Williams
  2021-08-18 15:28             ` Greg Kroah-Hartman
  1 sibling, 0 replies; 17+ messages in thread
From: Dan Williams @ 2021-08-18 14:59 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Mark Brown, Greg Kroah-Hartman, alsa-devel, Rafael J . Wysocki,
	Takashi Iwai, Linux Kernel Mailing List, Liam Girdwood,
	Vinod Koul, Geert Uytterhoeven, Jason Gunthorpe, Andy Shevchenko,
	Christoph Hellwig

On Wed, Aug 18, 2021 at 7:52 AM Pierre-Louis Bossart
<pierre-louis.bossart@linux.intel.com> wrote:
>
>
>
> >>> The issue is that the driver core is using drivers completing probe as a
> >>> proxy for resources becoming available.  That works most of the time
> >>> because most probes are fully synchronous but it breaks down if a
> >>> resource provider registers resources outside of probe, we might still
> >>> be fine if system boot is still happening and something else probes but
> >>> only through luck.
> >
> >> The driver core is not using that as a proxy, that is up to the driver
> >> itself or not.  All probe means is "yes, this driver binds to this
> >> device, thank you!" for that specific bus/class type.  That's all, if
> >> the driver needs to go off and do real work before it can properly
> >> control the device, wonderful, have it go and do that async.
> >
> > Right, which is what is happening here - but the deferred probe
> > machinery in the core is reading more into the probe succeeding than it
> > should.
>
> I think Greg was referring to the use of the PROBE_PREFER_ASYNCHRONOUS
> probe type. We tried just that and got a nice WARN_ON because we are
> using request_module() to deal with HDaudio codecs. The details are in
> [1] but the kernel code is unambiguous...
>
>         /*
>          * We don't allow synchronous module loading from async.  Module
>          * init may invoke async_synchronize_full() which will end up
>          * waiting for this task which already is waiting for the module
>          * loading to complete, leading to a deadlock.
>          */
>         WARN_ON_ONCE(wait && current_is_async());
>
>
> The reason why we use a workqueue is because we are otherwise painted in
> a corner by conflicting requirements.
>
> a) we have to use request_module()
> b) we cannot use the async probe because of the request_module()
> c) we have to avoid blocking on boot
>
> I understand the resistance to exporting this function, no one in our
> team was really happy about it, but no one could find an alternate
> solution. If there is something better, I am all ears.

Additionally you mentioned that the consumer is unknown to the
producer, so you are not able, for example, to use the newly exported
device_driver_attach() to directly trigger the unblocked dependency.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/2] ASoC: SOF: trigger re-probing of deferred devices from workqueue
  2021-08-18 12:07   ` Mark Brown
@ 2021-08-18 15:25     ` Pierre-Louis Bossart
  0 siblings, 0 replies; 17+ messages in thread
From: Pierre-Louis Bossart @ 2021-08-18 15:25 UTC (permalink / raw)
  To: Mark Brown
  Cc: alsa-devel, Kai Vehmanen, Rafael J . Wysocki, tiwai,
	Greg Kroah-Hartman, Takashi Iwai, linux-kernel, Liam Girdwood,
	liam.r.girdwood, vkoul, Ranjani Sridharan, Jason Gunthorpe,
	Dan Williams, Andy Shevchenko, Daniel Baluta, Christoph Hellwig,
	moderated list:SOUND - SOUND OPEN FIRMWARE (SOF) DRIVERS



On 8/18/21 7:07 AM, Mark Brown wrote:
> On Tue, Aug 17, 2021 at 02:00:57PM -0500, Pierre-Louis Bossart wrote:
> 
>> +++ b/sound/soc/sof/core.c
>> @@ -251,6 +251,9 @@ static int sof_probe_continue(struct snd_sof_dev *sdev)
>>  
>>  	sdev->probe_completed = true;
>>  
>> +	/* kick-off re-probing of deferred devices */
>> +	driver_deferred_probe_trigger();
>> +
> 
> I think we should move this into snd_soc_register_component() - the same
> issue could occur with any other component, the only other thing I can
> see kicking in here is the machine driver registration but that ought to
> kick probe itself anyway.  Or is there some other case here?

Thanks for the suggestion Mark, it would be more consistent indeed to
kick a re-evaluation of the deferred probe list when ASoC components are
successfully registered with something like this:

diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index c830e96afba2..9d6feea7719c 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -2677,7 +2677,14 @@ int snd_soc_register_component(struct device *dev,
        if (ret < 0)
                return ret;

-       return snd_soc_add_component(component, dai_drv, num_dai);
+       ret = snd_soc_add_component(component, dai_drv, num_dai);
+       if (ret < 0)
+               return ret;
+
+       /* kick-off re-probing of deferred devices */
+       driver_deferred_probe_trigger();
+
+       return 0;
 }
 EXPORT_SYMBOL_GPL(snd_soc_register_component);

In the case of this SOF driver, it'd be completely equivalent to what
this patch suggested, the snd_soc_register_component() is what we do
last in the workqueue.

In the case of 'regular' drivers, the component registration is
typically done last as well before the end of the probe. This would
result in 2 evaluations (one on successful ASoC component registration
and one on successful probe), and maybe on the second evaluation there's
nothing to do.

I can't think of any negative side-effects.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 14:51           ` Pierre-Louis Bossart
  2021-08-18 14:59             ` Dan Williams
@ 2021-08-18 15:28             ` Greg Kroah-Hartman
  2021-08-18 15:53               ` Pierre-Louis Bossart
  1 sibling, 1 reply; 17+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-18 15:28 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Mark Brown, alsa-devel, Rafael J . Wysocki, tiwai, linux-kernel,
	liam.r.girdwood, vkoul, Geert Uytterhoeven, Jason Gunthorpe,
	Dan Williams, Andy Shevchenko, Christoph Hellwig

On Wed, Aug 18, 2021 at 09:51:51AM -0500, Pierre-Louis Bossart wrote:
> 
> 
> >>> The issue is that the driver core is using drivers completing probe as a
> >>> proxy for resources becoming available.  That works most of the time
> >>> because most probes are fully synchronous but it breaks down if a
> >>> resource provider registers resources outside of probe, we might still
> >>> be fine if system boot is still happening and something else probes but
> >>> only through luck.
> > 
> >> The driver core is not using that as a proxy, that is up to the driver
> >> itself or not.  All probe means is "yes, this driver binds to this
> >> device, thank you!" for that specific bus/class type.  That's all, if
> >> the driver needs to go off and do real work before it can properly
> >> control the device, wonderful, have it go and do that async.
> > 
> > Right, which is what is happening here - but the deferred probe
> > machinery in the core is reading more into the probe succeeding than it
> > should.
> 
> I think Greg was referring to the use of the PROBE_PREFER_ASYNCHRONOUS
> probe type. We tried just that and got a nice WARN_ON because we are
> using request_module() to deal with HDaudio codecs. The details are in
> [1] but the kernel code is unambiguous...
> 
>         /*
> 	 * We don't allow synchronous module loading from async.  Module
> 	 * init may invoke async_synchronize_full() which will end up
> 	 * waiting for this task which already is waiting for the module
> 	 * loading to complete, leading to a deadlock.
> 	 */
> 	WARN_ON_ONCE(wait && current_is_async());
> 
> 
> The reason why we use a workqueue is because we are otherwise painted in
> a corner by conflicting requirements.
> 
> a) we have to use request_module()

Wait, why?

module loading is async, use auto-loading when the hardware/device is
found and reported to userspace.  Forcing a module to load by the kernel
is not always wise as the module is not always present in the filesystem
at that point in time at boot (think modules on the filesystem, not in
the initramfs).

Try fixing this issue and maybe it will resolve itself as you should be
working async.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 15:28             ` Greg Kroah-Hartman
@ 2021-08-18 15:53               ` Pierre-Louis Bossart
  2021-08-18 16:49                 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 17+ messages in thread
From: Pierre-Louis Bossart @ 2021-08-18 15:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Mark Brown, alsa-devel, Rafael J . Wysocki, tiwai, linux-kernel,
	liam.r.girdwood, vkoul, Geert Uytterhoeven, Jason Gunthorpe,
	Dan Williams, Andy Shevchenko, Christoph Hellwig




>> a) we have to use request_module()
> 
> Wait, why?
> 
> module loading is async, use auto-loading when the hardware/device is
> found and reported to userspace.  Forcing a module to load by the kernel
> is not always wise as the module is not always present in the filesystem
> at that point in time at boot (think modules on the filesystem, not in
> the initramfs).
> 
> Try fixing this issue and maybe it will resolve itself as you should be
> working async.

It's been that way for a very long time (2015?) for HDAudio support, see
sound/pci/hda/hda_bind.c. It's my understanding that it was a conscious
design decision to use vendor-specific modules, if available, and
fallback to generic modules if the first pass failed.

Takashi, you may want to chime in...





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 15:53               ` Pierre-Louis Bossart
@ 2021-08-18 16:49                 ` Greg Kroah-Hartman
  2021-08-18 17:52                   ` Mark Brown
  2021-08-18 18:09                   ` Pierre-Louis Bossart
  0 siblings, 2 replies; 17+ messages in thread
From: Greg Kroah-Hartman @ 2021-08-18 16:49 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Mark Brown, alsa-devel, Rafael J . Wysocki, tiwai, linux-kernel,
	liam.r.girdwood, vkoul, Geert Uytterhoeven, Jason Gunthorpe,
	Dan Williams, Andy Shevchenko, Christoph Hellwig

On Wed, Aug 18, 2021 at 10:53:07AM -0500, Pierre-Louis Bossart wrote:
> 
> 
> 
> >> a) we have to use request_module()
> > 
> > Wait, why?
> > 
> > module loading is async, use auto-loading when the hardware/device is
> > found and reported to userspace.  Forcing a module to load by the kernel
> > is not always wise as the module is not always present in the filesystem
> > at that point in time at boot (think modules on the filesystem, not in
> > the initramfs).
> > 
> > Try fixing this issue and maybe it will resolve itself as you should be
> > working async.
> 
> It's been that way for a very long time (2015?) for HDAudio support, see
> sound/pci/hda/hda_bind.c. It's my understanding that it was a conscious
> design decision to use vendor-specific modules, if available, and
> fallback to generic modules if the first pass failed.

If it has been this way for so long, what has caused the sudden change
to need to export this and call this function?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 16:49                 ` Greg Kroah-Hartman
@ 2021-08-18 17:52                   ` Mark Brown
  2021-08-18 18:09                   ` Pierre-Louis Bossart
  1 sibling, 0 replies; 17+ messages in thread
From: Mark Brown @ 2021-08-18 17:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Pierre-Louis Bossart, alsa-devel, Rafael J . Wysocki, tiwai,
	linux-kernel, liam.r.girdwood, vkoul, Geert Uytterhoeven,
	Jason Gunthorpe, Dan Williams, Andy Shevchenko,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 630 bytes --]

On Wed, Aug 18, 2021 at 06:49:51PM +0200, Greg Kroah-Hartman wrote:
> On Wed, Aug 18, 2021 at 10:53:07AM -0500, Pierre-Louis Bossart wrote:

> > It's been that way for a very long time (2015?) for HDAudio support, see
> > sound/pci/hda/hda_bind.c. It's my understanding that it was a conscious
> > design decision to use vendor-specific modules, if available, and
> > fallback to generic modules if the first pass failed.

> If it has been this way for so long, what has caused the sudden change
> to need to export this and call this function?

The usage predates the hardware that requires firmware downloads -
that's very new.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 16:49                 ` Greg Kroah-Hartman
  2021-08-18 17:52                   ` Mark Brown
@ 2021-08-18 18:09                   ` Pierre-Louis Bossart
  2021-08-18 18:28                     ` Mark Brown
  1 sibling, 1 reply; 17+ messages in thread
From: Pierre-Louis Bossart @ 2021-08-18 18:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: alsa-devel, Rafael J . Wysocki, tiwai, linux-kernel,
	liam.r.girdwood, vkoul, Mark Brown, Geert Uytterhoeven,
	Jason Gunthorpe, Dan Williams, Andy Shevchenko,
	Christoph Hellwig


>>>> a) we have to use request_module()
>>>
>>> Wait, why?
>>>
>>> module loading is async, use auto-loading when the hardware/device is
>>> found and reported to userspace.  Forcing a module to load by the kernel
>>> is not always wise as the module is not always present in the filesystem
>>> at that point in time at boot (think modules on the filesystem, not in
>>> the initramfs).
>>>
>>> Try fixing this issue and maybe it will resolve itself as you should be
>>> working async.
>>
>> It's been that way for a very long time (2015?) for HDAudio support, see
>> sound/pci/hda/hda_bind.c. It's my understanding that it was a conscious
>> design decision to use vendor-specific modules, if available, and
>> fallback to generic modules if the first pass failed.
> 
> If it has been this way for so long, what has caused the sudden change
> to need to export this and call this function?

Fair question, I did not provide all the context with a cover letter
that was already quite long. Here are more details:

In the existing Intel audio drivers, we have a PCI device that first get
probed. The PCI driver initializes the DSP and exposes what the audio
DSP can do, but the platform-specific configuration for a given board is
handled by a child device [1]. We have all kinds of hard-coded lookup
tables to figure out what the board is and what machine driver should be
used based on the presence of other ACPI devices and/or DMI quirks
[2][3]. We must have used this solution since 2010, mainly because 'the
other OS' does not rely on platform firmware for a description of the
audio capabilities.

In the 'soon' future, that machine driver will probed with its own ACPI
ID and become generic, with all the information related to the board
described in platform firmware and parsed by the driver. This is how the
'simple card' works today in Device Tree environments, platform firmware
describes how host-provided components are connected to 3rd-party
components. I cannot provide more details at this time since this is a
not yet a publicly-available specification (this specification work does
take place in a standardization body).

That change in how the machine driver gets probed creates a new problem
we didn't have before: this generic machine driver will probe in the
early stages of the boot, long before the DSP and audio codecs are
initialized/available.

I initially looked at the component framework to try to express
dependencies. It's really not clear to me if this is the 'right'
direction, for ASoC-based solutions we already have components that
register with a core.

I also started looking at other proposals that were made over the years,
this problem of expressing dependencies is not new. No real luck.

In the end, since the DeviceTree-based solutions based on deferred
probes work fine for the same type of usages, I tried to reuse the same
deferred probe mechanism. The only reason why I needed to export this
function is to work-around the request_module() use.

I am not claiming any award for architecture, this is clearly a
domain-specific corner case. I did try the async probe, I consulted with
Marc Brown, had an internal review with Dan Williams and Andy
Shevchenko. While nobody cheered, it seemed like this export was
'reasonable' compared to a re-architecture of the HDaudio/HDMI support -
which is a really scary proposition.

There is no immediate rush to make this change in this kernel cycle or
the next, I am open to alternatives, but I wanted to make sure we don't
have any Linux plumbing issues by the time the specification becomes
public and is used by 'the other OS'.

Does this help get more context?

[1] https://elixir.bootlin.com/linux/latest/source/sound/soc/sof/core.c#L234

[2]
https://elixir.bootlin.com/linux/latest/source/sound/soc/intel/common/soc-acpi-intel-tgl-match.c#L323

[3]
https://elixir.bootlin.com/linux/latest/source/sound/soc/intel/boards/sof_sdw.c#L50





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger()
  2021-08-18 18:09                   ` Pierre-Louis Bossart
@ 2021-08-18 18:28                     ` Mark Brown
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Brown @ 2021-08-18 18:28 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Greg Kroah-Hartman, alsa-devel, Rafael J . Wysocki, tiwai,
	linux-kernel, liam.r.girdwood, vkoul, Geert Uytterhoeven,
	Jason Gunthorpe, Dan Williams, Andy Shevchenko,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 1422 bytes --]

On Wed, Aug 18, 2021 at 01:09:44PM -0500, Pierre-Louis Bossart wrote:

> I initially looked at the component framework to try to express
> dependencies. It's really not clear to me if this is the 'right'
> direction, for ASoC-based solutions we already have components that
> register with a core.

Historically (long before both deferred probe and the component
framework) ASoC used to implement a mechanism that essentially did
deferred probe for the dependencies - it'd maintain it's own lists of
dependencies and then tell the machine driver and all the components
when the card was ready.  Once deferred probe was there we dropped all
the open coded deferral stuff since it was just reimplementing what
deferred probe does in a slightly more complicated fashion (it tracked
the dependencies in a finer grained manner, though the result wasn't any
different).  See b19e6e7b76 (ASoC: core: Use driver core probe deferral)
for the conversion.

What ASoC is doing with the cards is fundamentally the same thing as
what the component helpers are doing, we could in theory convert to
using that but unlike with probe deferral it doesn't really save us any
work and we'd still need all the card level tracking we've got to
connect the various bits of the card together and order things.  If we
were starting from scratch we would probably use components but there's
far more pressing things to be getting on with otherwise.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-08-18 18:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17 19:00 [RFC PATCH 0/2] driver core: kick deferred probe from delayed context Pierre-Louis Bossart
2021-08-17 19:00 ` [RFC PATCH 1/2] driver core: export driver_deferred_probe_trigger() Pierre-Louis Bossart
2021-08-18  5:44   ` Greg Kroah-Hartman
2021-08-18 11:57     ` Mark Brown
2021-08-18 13:22       ` Greg Kroah-Hartman
2021-08-18 13:48         ` Mark Brown
2021-08-18 14:51           ` Pierre-Louis Bossart
2021-08-18 14:59             ` Dan Williams
2021-08-18 15:28             ` Greg Kroah-Hartman
2021-08-18 15:53               ` Pierre-Louis Bossart
2021-08-18 16:49                 ` Greg Kroah-Hartman
2021-08-18 17:52                   ` Mark Brown
2021-08-18 18:09                   ` Pierre-Louis Bossart
2021-08-18 18:28                     ` Mark Brown
2021-08-17 19:00 ` [RFC PATCH 2/2] ASoC: SOF: trigger re-probing of deferred devices from workqueue Pierre-Louis Bossart
2021-08-18 12:07   ` Mark Brown
2021-08-18 15:25     ` Pierre-Louis Bossart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).