LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC 0/2] Allow client to recover crashed processor
@ 2020-03-11 10:54 Loic Pallardy
  2020-03-11 10:54 ` [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed Loic Pallardy
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Loic Pallardy @ 2020-03-11 10:54 UTC (permalink / raw)
  To: bjorn.andersson, ohad, mathieu.poirier
  Cc: linux-remoteproc, linux-kernel, arnaud.pouliquen,
	benjamin.gaignard, fabien.dessenne, s-anna, Loic Pallardy

The following 2 patches propose some changes to allow user space
client to shutdown and restart a crashed co-processor.
This is required when auto recovery is disabled at framework level or
when auto recovery procedure failed.

Sent as RFC as may be part of Mathieu's proposal for early boot/late
attach support

Loic Pallardy (2):
  remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
  remoteproc: core: keep rproc in crash state in case of recovery
    failure

 drivers/remoteproc/remoteproc_core.c  | 8 +++++++-
 drivers/remoteproc/remoteproc_sysfs.c | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
  2020-03-11 10:54 [RFC 0/2] Allow client to recover crashed processor Loic Pallardy
@ 2020-03-11 10:54 ` Loic Pallardy
  2020-03-11 21:45   ` Mathieu Poirier
                     ` (2 more replies)
  2020-03-11 10:54 ` [RFC 2/2] remoteproc: core: keep rproc in crash state in case of recovery failure Loic Pallardy
  2020-03-11 14:56 ` [RFC 0/2] Allow client to recover crashed processor Mathieu Poirier
  2 siblings, 3 replies; 12+ messages in thread
From: Loic Pallardy @ 2020-03-11 10:54 UTC (permalink / raw)
  To: bjorn.andersson, ohad, mathieu.poirier
  Cc: linux-remoteproc, linux-kernel, arnaud.pouliquen,
	benjamin.gaignard, fabien.dessenne, s-anna, Loic Pallardy

When remoteproc recovery is disabled and rproc crashed, user space
client has no way to reboot co-processor except by a complete platform
reboot.
Indeed rproc_shutdown() is called by sysfs state_store() only is rproc
state is RPROC_RUNNING.

This patch offers the possibility to shutdown the co-processor if
it is in RPROC_CRASHED state and so to restart properly co-processor
from sysfs interface.

Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
---
 drivers/remoteproc/remoteproc_core.c  | 2 +-
 drivers/remoteproc/remoteproc_sysfs.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 097f33e4f1f3..7ac87a75cd1b 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1812,7 +1812,7 @@ void rproc_shutdown(struct rproc *rproc)
 	if (!atomic_dec_and_test(&rproc->power))
 		goto out;
 
-	ret = rproc_stop(rproc, false);
+	ret = rproc_stop(rproc, rproc->state == RPROC_CRASHED);
 	if (ret) {
 		atomic_inc(&rproc->power);
 		goto out;
diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
index 7f8536b73295..1029458a4678 100644
--- a/drivers/remoteproc/remoteproc_sysfs.c
+++ b/drivers/remoteproc/remoteproc_sysfs.c
@@ -101,7 +101,7 @@ static ssize_t state_store(struct device *dev,
 		if (ret)
 			dev_err(&rproc->dev, "Boot failed: %d\n", ret);
 	} else if (sysfs_streq(buf, "stop")) {
-		if (rproc->state != RPROC_RUNNING)
+		if (rproc->state != RPROC_RUNNING && rproc->state != RPROC_CRASHED)
 			return -EINVAL;
 
 		rproc_shutdown(rproc);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC 2/2] remoteproc: core: keep rproc in crash state in case of recovery failure
  2020-03-11 10:54 [RFC 0/2] Allow client to recover crashed processor Loic Pallardy
  2020-03-11 10:54 ` [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed Loic Pallardy
@ 2020-03-11 10:54 ` Loic Pallardy
  2020-05-06  2:05   ` Bjorn Andersson
  2020-03-11 14:56 ` [RFC 0/2] Allow client to recover crashed processor Mathieu Poirier
  2 siblings, 1 reply; 12+ messages in thread
From: Loic Pallardy @ 2020-03-11 10:54 UTC (permalink / raw)
  To: bjorn.andersson, ohad, mathieu.poirier
  Cc: linux-remoteproc, linux-kernel, arnaud.pouliquen,
	benjamin.gaignard, fabien.dessenne, s-anna, Loic Pallardy

When an error occurs during recovery procedure, internal rproc
variables may be unaligned:
- state is set to RPROC_OFFLINE
- power atomic not equal to 0
which is normal as only rproc_stop() has been executed and not
rproc_shutdown()

In such case, rproc_boot() can be re-executed by client to
reboot co-processor.

This patch proposes to keep rproc in RPROC_CRASHED state in case
of recovery failure to be coherent with recovery disabled mode.

Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
---
 drivers/remoteproc/remoteproc_core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 7ac87a75cd1b..def4f9fc881d 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1679,6 +1679,12 @@ int rproc_trigger_recovery(struct rproc *rproc)
 	release_firmware(firmware_p);
 
 unlock_mutex:
+	/*
+	 * In case of error during recovery sequence restore rproc
+	 * state in CRASHED
+	 */
+	if (ret)
+		rproc->state = RPROC_CRASHED;
 	mutex_unlock(&rproc->lock);
 	return ret;
 }
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 0/2] Allow client to recover crashed processor
  2020-03-11 10:54 [RFC 0/2] Allow client to recover crashed processor Loic Pallardy
  2020-03-11 10:54 ` [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed Loic Pallardy
  2020-03-11 10:54 ` [RFC 2/2] remoteproc: core: keep rproc in crash state in case of recovery failure Loic Pallardy
@ 2020-03-11 14:56 ` Mathieu Poirier
  2 siblings, 0 replies; 12+ messages in thread
From: Mathieu Poirier @ 2020-03-11 14:56 UTC (permalink / raw)
  To: Loic Pallardy
  Cc: Bjorn Andersson, Ohad Ben-Cohen, linux-remoteproc,
	Linux Kernel Mailing List, Arnaud POULIQUEN, Benjamin Gaignard,
	Fabien DESSENNE, Suman Anna

On Wed, 11 Mar 2020 at 04:54, Loic Pallardy <loic.pallardy@st.com> wrote:
>
> The following 2 patches propose some changes to allow user space
> client to shutdown and restart a crashed co-processor.
> This is required when auto recovery is disabled at framework level or
> when auto recovery procedure failed.
>
> Sent as RFC as may be part of Mathieu's proposal for early boot/late
> attach support

Perfect timing - thanks for sending those out.

Mathieu

>
> Loic Pallardy (2):
>   remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
>   remoteproc: core: keep rproc in crash state in case of recovery
>     failure
>
>  drivers/remoteproc/remoteproc_core.c  | 8 +++++++-
>  drivers/remoteproc/remoteproc_sysfs.c | 2 +-
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
  2020-03-11 10:54 ` [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed Loic Pallardy
@ 2020-03-11 21:45   ` Mathieu Poirier
  2020-03-12  8:00     ` Loic PALLARDY
  2020-03-11 23:27   ` Bjorn Andersson
  2020-03-25 17:57   ` Mathieu Poirier
  2 siblings, 1 reply; 12+ messages in thread
From: Mathieu Poirier @ 2020-03-11 21:45 UTC (permalink / raw)
  To: Loic Pallardy
  Cc: bjorn.andersson, ohad, linux-remoteproc, linux-kernel,
	arnaud.pouliquen, benjamin.gaignard, fabien.dessenne, s-anna

Hi Loic,

On Wed, Mar 11, 2020 at 11:54:31AM +0100, Loic Pallardy wrote:
> When remoteproc recovery is disabled and rproc crashed, user space
> client has no way to reboot co-processor except by a complete platform
> reboot.
> Indeed rproc_shutdown() is called by sysfs state_store() only is rproc
> state is RPROC_RUNNING.
> 
> This patch offers the possibility to shutdown the co-processor if
> it is in RPROC_CRASHED state and so to restart properly co-processor
> from sysfs interface.

And it is not possible to use the debugfs interface [1] to restart the MCU?

[1]. https://elixir.bootlin.com/linux/v5.6-rc2/source/drivers/remoteproc/remoteproc_debugfs.c#L147


> 
> Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
> ---
>  drivers/remoteproc/remoteproc_core.c  | 2 +-
>  drivers/remoteproc/remoteproc_sysfs.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 097f33e4f1f3..7ac87a75cd1b 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1812,7 +1812,7 @@ void rproc_shutdown(struct rproc *rproc)
>  	if (!atomic_dec_and_test(&rproc->power))
>  		goto out;
>  
> -	ret = rproc_stop(rproc, false);
> +	ret = rproc_stop(rproc, rproc->state == RPROC_CRASHED);
>  	if (ret) {
>  		atomic_inc(&rproc->power);
>  		goto out;
> diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
> index 7f8536b73295..1029458a4678 100644
> --- a/drivers/remoteproc/remoteproc_sysfs.c
> +++ b/drivers/remoteproc/remoteproc_sysfs.c
> @@ -101,7 +101,7 @@ static ssize_t state_store(struct device *dev,
>  		if (ret)
>  			dev_err(&rproc->dev, "Boot failed: %d\n", ret);
>  	} else if (sysfs_streq(buf, "stop")) {
> -		if (rproc->state != RPROC_RUNNING)
> +		if (rproc->state != RPROC_RUNNING && rproc->state != RPROC_CRASHED)
>  			return -EINVAL;
>  
>  		rproc_shutdown(rproc);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
  2020-03-11 10:54 ` [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed Loic Pallardy
  2020-03-11 21:45   ` Mathieu Poirier
@ 2020-03-11 23:27   ` Bjorn Andersson
  2020-03-12  8:12     ` Loic PALLARDY
  2020-03-25 17:57   ` Mathieu Poirier
  2 siblings, 1 reply; 12+ messages in thread
From: Bjorn Andersson @ 2020-03-11 23:27 UTC (permalink / raw)
  To: Loic Pallardy
  Cc: ohad, mathieu.poirier, linux-remoteproc, linux-kernel,
	arnaud.pouliquen, benjamin.gaignard, fabien.dessenne, s-anna

On Wed 11 Mar 03:54 PDT 2020, Loic Pallardy wrote:

> When remoteproc recovery is disabled and rproc crashed, user space
> client has no way to reboot co-processor except by a complete platform
> reboot.
> Indeed rproc_shutdown() is called by sysfs state_store() only is rproc
> state is RPROC_RUNNING.
> 
> This patch offers the possibility to shutdown the co-processor if
> it is in RPROC_CRASHED state and so to restart properly co-processor
> from sysfs interface.
> 

I did recently run into a similar problem when I fed my remoteproc
faulty firmware, which lead to it recovering immediately upon boot. The
amount of time spent in !CRASHED state was minimal, so I didn't have any
way to stop the remoteproc.

> Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
> ---
>  drivers/remoteproc/remoteproc_core.c  | 2 +-
>  drivers/remoteproc/remoteproc_sysfs.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 097f33e4f1f3..7ac87a75cd1b 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1812,7 +1812,7 @@ void rproc_shutdown(struct rproc *rproc)
>  	if (!atomic_dec_and_test(&rproc->power))
>  		goto out;
>  
> -	ret = rproc_stop(rproc, false);
> +	ret = rproc_stop(rproc, rproc->state == RPROC_CRASHED);

Afaict this is unrelated to the problem you're describing in the commit
message.

>  	if (ret) {
>  		atomic_inc(&rproc->power);
>  		goto out;
> diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
> index 7f8536b73295..1029458a4678 100644
> --- a/drivers/remoteproc/remoteproc_sysfs.c
> +++ b/drivers/remoteproc/remoteproc_sysfs.c
> @@ -101,7 +101,7 @@ static ssize_t state_store(struct device *dev,
>  		if (ret)
>  			dev_err(&rproc->dev, "Boot failed: %d\n", ret);
>  	} else if (sysfs_streq(buf, "stop")) {
> -		if (rproc->state != RPROC_RUNNING)
> +		if (rproc->state != RPROC_RUNNING && rproc->state != RPROC_CRASHED)

Analogous to the problem reported by Alex here
https://patchwork.kernel.org/patch/11413161/ the handling of stop seems
racy.

In particular, I believe you're failing to protect against a race
with a just scheduled rproc_crash_handler_work() being executed after
the mutex_unlock() in rproc_shutdown()...

With Alex fix that should be less of a problem though...

Regards,
Bjorn

>  			return -EINVAL;
>  
>  		rproc_shutdown(rproc);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
  2020-03-11 21:45   ` Mathieu Poirier
@ 2020-03-12  8:00     ` Loic PALLARDY
  0 siblings, 0 replies; 12+ messages in thread
From: Loic PALLARDY @ 2020-03-12  8:00 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: bjorn.andersson, ohad, linux-remoteproc, linux-kernel,
	Arnaud POULIQUEN, benjamin.gaignard, Fabien DESSENNE, s-anna

Hi Mathieu,

> -----Original Message-----
> From: Mathieu Poirier <mathieu.poirier@linaro.org>
> Sent: mercredi 11 mars 2020 22:45
> To: Loic PALLARDY <loic.pallardy@st.com>
> Cc: bjorn.andersson@linaro.org; ohad@wizery.com; linux-
> remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud
> POULIQUEN <arnaud.pouliquen@st.com>; benjamin.gaignard@linaro.org;
> Fabien DESSENNE <fabien.dessenne@st.com>; s-anna@ti.com
> Subject: Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when
> rproc is crashed
> 
> Hi Loic,
> 
> On Wed, Mar 11, 2020 at 11:54:31AM +0100, Loic Pallardy wrote:
> > When remoteproc recovery is disabled and rproc crashed, user space
> > client has no way to reboot co-processor except by a complete platform
> > reboot.
> > Indeed rproc_shutdown() is called by sysfs state_store() only is rproc
> > state is RPROC_RUNNING.
> >
> > This patch offers the possibility to shutdown the co-processor if
> > it is in RPROC_CRASHED state and so to restart properly co-processor
> > from sysfs interface.
> 
> And it is not possible to use the debugfs interface [1] to restart the MCU?
> 
> [1]. https://elixir.bootlin.com/linux/v5.6-
> rc2/source/drivers/remoteproc/remoteproc_debugfs.c#L147

Debugfs interface is optional and on final product it is often disabled.
The used control interfaces are in kernel API and sysfs one.

Regards,
Loic
> 
> 
> >
> > Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
> > ---
> >  drivers/remoteproc/remoteproc_core.c  | 2 +-
> >  drivers/remoteproc/remoteproc_sysfs.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c
> b/drivers/remoteproc/remoteproc_core.c
> > index 097f33e4f1f3..7ac87a75cd1b 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1812,7 +1812,7 @@ void rproc_shutdown(struct rproc *rproc)
> >  	if (!atomic_dec_and_test(&rproc->power))
> >  		goto out;
> >
> > -	ret = rproc_stop(rproc, false);
> > +	ret = rproc_stop(rproc, rproc->state == RPROC_CRASHED);
> >  	if (ret) {
> >  		atomic_inc(&rproc->power);
> >  		goto out;
> > diff --git a/drivers/remoteproc/remoteproc_sysfs.c
> b/drivers/remoteproc/remoteproc_sysfs.c
> > index 7f8536b73295..1029458a4678 100644
> > --- a/drivers/remoteproc/remoteproc_sysfs.c
> > +++ b/drivers/remoteproc/remoteproc_sysfs.c
> > @@ -101,7 +101,7 @@ static ssize_t state_store(struct device *dev,
> >  		if (ret)
> >  			dev_err(&rproc->dev, "Boot failed: %d\n", ret);
> >  	} else if (sysfs_streq(buf, "stop")) {
> > -		if (rproc->state != RPROC_RUNNING)
> > +		if (rproc->state != RPROC_RUNNING && rproc->state !=
> RPROC_CRASHED)
> >  			return -EINVAL;
> >
> >  		rproc_shutdown(rproc);
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
  2020-03-11 23:27   ` Bjorn Andersson
@ 2020-03-12  8:12     ` Loic PALLARDY
  0 siblings, 0 replies; 12+ messages in thread
From: Loic PALLARDY @ 2020-03-12  8:12 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: ohad, mathieu.poirier, linux-remoteproc, linux-kernel,
	Arnaud POULIQUEN, benjamin.gaignard, Fabien DESSENNE, s-anna

Hi Bjorn,

> -----Original Message-----
> From: Bjorn Andersson <bjorn.andersson@linaro.org>
> Sent: jeudi 12 mars 2020 00:27
> To: Loic PALLARDY <loic.pallardy@st.com>
> Cc: ohad@wizery.com; mathieu.poirier@linaro.org; linux-
> remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud
> POULIQUEN <arnaud.pouliquen@st.com>; benjamin.gaignard@linaro.org;
> Fabien DESSENNE <fabien.dessenne@st.com>; s-anna@ti.com
> Subject: Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when
> rproc is crashed
> 
> On Wed 11 Mar 03:54 PDT 2020, Loic Pallardy wrote:
> 
> > When remoteproc recovery is disabled and rproc crashed, user space
> > client has no way to reboot co-processor except by a complete platform
> > reboot.
> > Indeed rproc_shutdown() is called by sysfs state_store() only is rproc
> > state is RPROC_RUNNING.
> >
> > This patch offers the possibility to shutdown the co-processor if
> > it is in RPROC_CRASHED state and so to restart properly co-processor
> > from sysfs interface.
> >
> 
> I did recently run into a similar problem when I fed my remoteproc
> faulty firmware, which lead to it recovering immediately upon boot. The
> amount of time spent in !CRASHED state was minimal, so I didn't have any
> way to stop the remoteproc.
> 
> > Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
> > ---
> >  drivers/remoteproc/remoteproc_core.c  | 2 +-
> >  drivers/remoteproc/remoteproc_sysfs.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c
> b/drivers/remoteproc/remoteproc_core.c
> > index 097f33e4f1f3..7ac87a75cd1b 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1812,7 +1812,7 @@ void rproc_shutdown(struct rproc *rproc)
> >  	if (!atomic_dec_and_test(&rproc->power))
> >  		goto out;
> >
> > -	ret = rproc_stop(rproc, false);
> > +	ret = rproc_stop(rproc, rproc->state == RPROC_CRASHED);
> 
> Afaict this is unrelated to the problem you're describing in the commit
> message.
Right, it is because now rproc_shudown could be could in a context where rproc is in RPROC_CRASHED state and so false is no more the default value.
Could be split in another patch.

> 
> >  	if (ret) {
> >  		atomic_inc(&rproc->power);
> >  		goto out;
> > diff --git a/drivers/remoteproc/remoteproc_sysfs.c
> b/drivers/remoteproc/remoteproc_sysfs.c
> > index 7f8536b73295..1029458a4678 100644
> > --- a/drivers/remoteproc/remoteproc_sysfs.c
> > +++ b/drivers/remoteproc/remoteproc_sysfs.c
> > @@ -101,7 +101,7 @@ static ssize_t state_store(struct device *dev,
> >  		if (ret)
> >  			dev_err(&rproc->dev, "Boot failed: %d\n", ret);
> >  	} else if (sysfs_streq(buf, "stop")) {
> > -		if (rproc->state != RPROC_RUNNING)
> > +		if (rproc->state != RPROC_RUNNING && rproc->state !=
> RPROC_CRASHED)
> 
> Analogous to the problem reported by Alex here
> https://patchwork.kernel.org/patch/11413161/ the handling of stop seems
> racy.
> 
> In particular, I believe you're failing to protect against a race
> with a just scheduled rproc_crash_handler_work() being executed after
> the mutex_unlock() in rproc_shutdown()...
> 
> With Alex fix that should be less of a problem though...
Thanks for pointing me Alex's patch. But I don't think it is exactly the same issue as it concerns the recovery procedure itself.
In my case, the recovery is disabled. On a crash detection, rproc->state is simply set to RPROC_CRASHED
and recovery is not  triggered.
Without client action, rproc will stay forever in RPROC_CRASHED test.
Today without this modification, it is not possible to shutdown rproc properly, putting coprocessor under reset,  disabling clocks...

Regards,
Loic

> 
> Regards,
> Bjorn
> 
> >  			return -EINVAL;
> >
> >  		rproc_shutdown(rproc);
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
  2020-03-11 10:54 ` [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed Loic Pallardy
  2020-03-11 21:45   ` Mathieu Poirier
  2020-03-11 23:27   ` Bjorn Andersson
@ 2020-03-25 17:57   ` Mathieu Poirier
  2020-03-25 18:30     ` Loic PALLARDY
  2 siblings, 1 reply; 12+ messages in thread
From: Mathieu Poirier @ 2020-03-25 17:57 UTC (permalink / raw)
  To: Loic Pallardy
  Cc: bjorn.andersson, ohad, linux-remoteproc, linux-kernel,
	arnaud.pouliquen, benjamin.gaignard, fabien.dessenne, s-anna

Hi Loic,

On Wed, Mar 11, 2020 at 11:54:31AM +0100, Loic Pallardy wrote:
> When remoteproc recovery is disabled and rproc crashed, user space
> client has no way to reboot co-processor except by a complete platform
> reboot.
> Indeed rproc_shutdown() is called by sysfs state_store() only is rproc
> state is RPROC_RUNNING.
> 
> This patch offers the possibility to shutdown the co-processor if
> it is in RPROC_CRASHED state and so to restart properly co-processor
> from sysfs interface.

If recovery is disabled on an rproc the platform likely intended to have a hard
reboot and as such we should not be concerned about this case.

Where I think we have a problem, something that is asserted by looking at your 2
patches, is cases where rproc_trigger_recovery() fails.  That leaves the system
in a state where it can't be recovered, something the remoteproc core should not
allow. 

> 
> Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
> ---
>  drivers/remoteproc/remoteproc_core.c  | 2 +-
>  drivers/remoteproc/remoteproc_sysfs.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 097f33e4f1f3..7ac87a75cd1b 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1812,7 +1812,7 @@ void rproc_shutdown(struct rproc *rproc)
>  	if (!atomic_dec_and_test(&rproc->power))
>  		goto out;
>  
> -	ret = rproc_stop(rproc, false);
> +	ret = rproc_stop(rproc, rproc->state == RPROC_CRASHED);

Please add a comment that explains how we can be in rproc_shutdown() when the
processor has crashed and point to rproc_trigger_recovery().  See below for more
details. 

>  	if (ret) {
>  		atomic_inc(&rproc->power);
>  		goto out;
> diff --git a/drivers/remoteproc/remoteproc_sysfs.c b/drivers/remoteproc/remoteproc_sysfs.c
> index 7f8536b73295..1029458a4678 100644
> --- a/drivers/remoteproc/remoteproc_sysfs.c
> +++ b/drivers/remoteproc/remoteproc_sysfs.c
> @@ -101,7 +101,7 @@ static ssize_t state_store(struct device *dev,
>  		if (ret)
>  			dev_err(&rproc->dev, "Boot failed: %d\n", ret);
>  	} else if (sysfs_streq(buf, "stop")) {
> -		if (rproc->state != RPROC_RUNNING)
> +		if (rproc->state != RPROC_RUNNING && rproc->state != RPROC_CRASHED)
>  			return -EINVAL;

Wouldn't it be better to just prevent the MCU to stay in a crashed state (when
recovery is not disabled)?

I like what you did in the next patch where the state of the MCU is set to
RPROC_CRASHED in case of failure, so that we keep.  I also think the hunk
above is correct.  All that is left is to call rproc_shutdown() directly in
rproc_trigger_recovery() when something goes wrong.  I would also add a
dev_err() so that users have a clue of what happened.

That would leave the system in a stable state without having to add intelligence
to state_store().

Let me know that you think...

Mathieu

>  
>  		rproc_shutdown(rproc);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
  2020-03-25 17:57   ` Mathieu Poirier
@ 2020-03-25 18:30     ` Loic PALLARDY
  2020-03-25 21:42       ` Mathieu Poirier
  0 siblings, 1 reply; 12+ messages in thread
From: Loic PALLARDY @ 2020-03-25 18:30 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: bjorn.andersson, ohad, linux-remoteproc, linux-kernel,
	Arnaud POULIQUEN, benjamin.gaignard, Fabien DESSENNE, s-anna

Hi Mathieu,

> -----Original Message-----
> From: Mathieu Poirier <mathieu.poirier@linaro.org>
> Sent: mercredi 25 mars 2020 18:58
> To: Loic PALLARDY <loic.pallardy@st.com>
> Cc: bjorn.andersson@linaro.org; ohad@wizery.com; linux-
> remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud
> POULIQUEN <arnaud.pouliquen@st.com>; benjamin.gaignard@linaro.org;
> Fabien DESSENNE <fabien.dessenne@st.com>; s-anna@ti.com
> Subject: Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when
> rproc is crashed
> 
> Hi Loic,
> 
> On Wed, Mar 11, 2020 at 11:54:31AM +0100, Loic Pallardy wrote:
> > When remoteproc recovery is disabled and rproc crashed, user space
> > client has no way to reboot co-processor except by a complete platform
> > reboot.
> > Indeed rproc_shutdown() is called by sysfs state_store() only is rproc
> > state is RPROC_RUNNING.
> >
> > This patch offers the possibility to shutdown the co-processor if
> > it is in RPROC_CRASHED state and so to restart properly co-processor
> > from sysfs interface.
> 
> If recovery is disabled on an rproc the platform likely intended to have a hard
> reboot and as such we should not be concerned about this case.
I disagree with your view. In fact, we can have a configuration for which
we don't want a silent recovery. Application layer can be involved to stop and
restart some services because it is the simplest way to resync with the coprocessor.
What's missing today is an event to notify user space application that coprocessor state
has changed. (even if we can rely on rpmsg services closure)

> 
> Where I think we have a problem, something that is asserted by looking at
> your 2
> patches, is cases where rproc_trigger_recovery() fails.  That leaves the
> system
> in a state where it can't be recovered, something the remoteproc core
> should not
> allow.
> 
Right this is a second use case we faced when user space application which provided
firmware file crashed before coprocessor. In that case firmware file may be removed
from /lib/firmware directory and coprocessor recovery failed.
Application, when restarting, can't anymore control coprocessor.

Regards,
Loic

> >
> > Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
> > ---
> >  drivers/remoteproc/remoteproc_core.c  | 2 +-
> >  drivers/remoteproc/remoteproc_sysfs.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/remoteproc/remoteproc_core.c
> b/drivers/remoteproc/remoteproc_core.c
> > index 097f33e4f1f3..7ac87a75cd1b 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1812,7 +1812,7 @@ void rproc_shutdown(struct rproc *rproc)
> >  	if (!atomic_dec_and_test(&rproc->power))
> >  		goto out;
> >
> > -	ret = rproc_stop(rproc, false);
> > +	ret = rproc_stop(rproc, rproc->state == RPROC_CRASHED);
> 
> Please add a comment that explains how we can be in rproc_shutdown()
> when the
> processor has crashed and point to rproc_trigger_recovery().  See below for
> more
> details.
> 
> >  	if (ret) {
> >  		atomic_inc(&rproc->power);
> >  		goto out;
> > diff --git a/drivers/remoteproc/remoteproc_sysfs.c
> b/drivers/remoteproc/remoteproc_sysfs.c
> > index 7f8536b73295..1029458a4678 100644
> > --- a/drivers/remoteproc/remoteproc_sysfs.c
> > +++ b/drivers/remoteproc/remoteproc_sysfs.c
> > @@ -101,7 +101,7 @@ static ssize_t state_store(struct device *dev,
> >  		if (ret)
> >  			dev_err(&rproc->dev, "Boot failed: %d\n", ret);
> >  	} else if (sysfs_streq(buf, "stop")) {
> > -		if (rproc->state != RPROC_RUNNING)
> > +		if (rproc->state != RPROC_RUNNING && rproc->state !=
> RPROC_CRASHED)
> >  			return -EINVAL;
> 
> Wouldn't it be better to just prevent the MCU to stay in a crashed state
> (when
> recovery is not disabled)?
> 
> I like what you did in the next patch where the state of the MCU is set to
> RPROC_CRASHED in case of failure, so that we keep.  I also think the hunk
> above is correct.  All that is left is to call rproc_shutdown() directly in
> rproc_trigger_recovery() when something goes wrong.  I would also add a
> dev_err() so that users have a clue of what happened.
> 
> That would leave the system in a stable state without having to add
> intelligence
> to state_store().
It is a solution we debate internally. Should rproc_shutdown() called directly in
rproc_trigger_recovery() or not? If we go in such direction, that clearly simplify 
coprocessor control as it will always be in a "stable" state. But that means user
will lost information that coprocessor crashed (mainly when recovery is disabled).
We just know that coprocessor is stopped but not why? Crashed or client action? 
For debug purpose, it could be an issue from my pov.

Regards,
Loic
> 
> Let me know that you think...
> 
> Mathieu
> 
> >
> >  		rproc_shutdown(rproc);
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed
  2020-03-25 18:30     ` Loic PALLARDY
@ 2020-03-25 21:42       ` Mathieu Poirier
  0 siblings, 0 replies; 12+ messages in thread
From: Mathieu Poirier @ 2020-03-25 21:42 UTC (permalink / raw)
  To: Loic PALLARDY
  Cc: bjorn.andersson, ohad, linux-remoteproc, linux-kernel,
	Arnaud POULIQUEN, benjamin.gaignard, Fabien DESSENNE, s-anna

On Wed, 25 Mar 2020 at 12:30, Loic PALLARDY <loic.pallardy@st.com> wrote:
>
> Hi Mathieu,
>
> > -----Original Message-----
> > From: Mathieu Poirier <mathieu.poirier@linaro.org>
> > Sent: mercredi 25 mars 2020 18:58
> > To: Loic PALLARDY <loic.pallardy@st.com>
> > Cc: bjorn.andersson@linaro.org; ohad@wizery.com; linux-
> > remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud
> > POULIQUEN <arnaud.pouliquen@st.com>; benjamin.gaignard@linaro.org;
> > Fabien DESSENNE <fabien.dessenne@st.com>; s-anna@ti.com
> > Subject: Re: [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when
> > rproc is crashed
> >
> > Hi Loic,
> >
> > On Wed, Mar 11, 2020 at 11:54:31AM +0100, Loic Pallardy wrote:
> > > When remoteproc recovery is disabled and rproc crashed, user space
> > > client has no way to reboot co-processor except by a complete platform
> > > reboot.
> > > Indeed rproc_shutdown() is called by sysfs state_store() only is rproc
> > > state is RPROC_RUNNING.
> > >
> > > This patch offers the possibility to shutdown the co-processor if
> > > it is in RPROC_CRASHED state and so to restart properly co-processor
> > > from sysfs interface.
> >
> > If recovery is disabled on an rproc the platform likely intended to have a hard
> > reboot and as such we should not be concerned about this case.
> I disagree with your view. In fact, we can have a configuration for which
> we don't want a silent recovery. Application layer can be involved to stop and
> restart some services because it is the simplest way to resync with the coprocessor.
> What's missing today is an event to notify user space application that coprocessor state
> has changed. (even if we can rely on rpmsg services closure)

I have a better understanding of the scenario now.

>
> >
> > Where I think we have a problem, something that is asserted by looking at
> > your 2
> > patches, is cases where rproc_trigger_recovery() fails.  That leaves the
> > system
> > in a state where it can't be recovered, something the remoteproc core
> > should not
> > allow.
> >
> Right this is a second use case we faced when user space application which provided
> firmware file crashed before coprocessor. In that case firmware file may be removed
> from /lib/firmware directory and coprocessor recovery failed.
> Application, when restarting, can't anymore control coprocessor.

This is a very specific use case.  It seems to me that fixing the
problem with the availability of files under /lib/firmware is where
the solution really lies.

>
> Regards,
> Loic
>
> > >
> > > Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
> > > ---
> > >  drivers/remoteproc/remoteproc_core.c  | 2 +-
> > >  drivers/remoteproc/remoteproc_sysfs.c | 2 +-
> > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/remoteproc/remoteproc_core.c
> > b/drivers/remoteproc/remoteproc_core.c
> > > index 097f33e4f1f3..7ac87a75cd1b 100644
> > > --- a/drivers/remoteproc/remoteproc_core.c
> > > +++ b/drivers/remoteproc/remoteproc_core.c
> > > @@ -1812,7 +1812,7 @@ void rproc_shutdown(struct rproc *rproc)
> > >     if (!atomic_dec_and_test(&rproc->power))
> > >             goto out;
> > >
> > > -   ret = rproc_stop(rproc, false);
> > > +   ret = rproc_stop(rproc, rproc->state == RPROC_CRASHED);
> >
> > Please add a comment that explains how we can be in rproc_shutdown()
> > when the
> > processor has crashed and point to rproc_trigger_recovery().  See below for
> > more
> > details.
> >
> > >     if (ret) {
> > >             atomic_inc(&rproc->power);
> > >             goto out;
> > > diff --git a/drivers/remoteproc/remoteproc_sysfs.c
> > b/drivers/remoteproc/remoteproc_sysfs.c
> > > index 7f8536b73295..1029458a4678 100644
> > > --- a/drivers/remoteproc/remoteproc_sysfs.c
> > > +++ b/drivers/remoteproc/remoteproc_sysfs.c
> > > @@ -101,7 +101,7 @@ static ssize_t state_store(struct device *dev,
> > >             if (ret)
> > >                     dev_err(&rproc->dev, "Boot failed: %d\n", ret);
> > >     } else if (sysfs_streq(buf, "stop")) {
> > > -           if (rproc->state != RPROC_RUNNING)
> > > +           if (rproc->state != RPROC_RUNNING && rproc->state !=
> > RPROC_CRASHED)
> > >                     return -EINVAL;
> >
> > Wouldn't it be better to just prevent the MCU to stay in a crashed state
> > (when
> > recovery is not disabled)?
> >
> > I like what you did in the next patch where the state of the MCU is set to
> > RPROC_CRASHED in case of failure, so that we keep.  I also think the hunk
> > above is correct.  All that is left is to call rproc_shutdown() directly in
> > rproc_trigger_recovery() when something goes wrong.  I would also add a
> > dev_err() so that users have a clue of what happened.
> >
> > That would leave the system in a stable state without having to add
> > intelligence
> > to state_store().
> It is a solution we debate internally. Should rproc_shutdown() called directly in
> rproc_trigger_recovery() or not? If we go in such direction, that clearly simplify
> coprocessor control as it will always be in a "stable" state. But that means user
> will lost information that coprocessor crashed (mainly when recovery is disabled).
> We just know that coprocessor is stopped but not why? Crashed or client action?
> For debug purpose, it could be an issue from my pov.

That is why I suggested to add a dev_err() so that users know recovery
of the MCU has failed.  Moreover I expect users to be aware of what is
happening on their platform, i.e if application did not switch off the
MCU and it is in the offline state, then it is fair to assume it
crashed.

>
> Regards,
> Loic
> >
> > Let me know that you think...
> >
> > Mathieu
> >
> > >
> > >             rproc_shutdown(rproc);
> > > --
> > > 2.7.4
> > >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 2/2] remoteproc: core: keep rproc in crash state in case of recovery failure
  2020-03-11 10:54 ` [RFC 2/2] remoteproc: core: keep rproc in crash state in case of recovery failure Loic Pallardy
@ 2020-05-06  2:05   ` Bjorn Andersson
  0 siblings, 0 replies; 12+ messages in thread
From: Bjorn Andersson @ 2020-05-06  2:05 UTC (permalink / raw)
  To: Loic Pallardy
  Cc: ohad, mathieu.poirier, linux-remoteproc, linux-kernel,
	arnaud.pouliquen, benjamin.gaignard, fabien.dessenne, s-anna

On Wed 11 Mar 03:54 PDT 2020, Loic Pallardy wrote:

> When an error occurs during recovery procedure, internal rproc
> variables may be unaligned:
> - state is set to RPROC_OFFLINE
> - power atomic not equal to 0
> which is normal as only rproc_stop() has been executed and not
> rproc_shutdown()
> 
> In such case, rproc_boot() can be re-executed by client to
> reboot co-processor.
> 
> This patch proposes to keep rproc in RPROC_CRASHED state in case
> of recovery failure to be coherent with recovery disabled mode.
> 
> Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
> ---
>  drivers/remoteproc/remoteproc_core.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 7ac87a75cd1b..def4f9fc881d 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1679,6 +1679,12 @@ int rproc_trigger_recovery(struct rproc *rproc)
>  	release_firmware(firmware_p);
>  
>  unlock_mutex:
> +	/*
> +	 * In case of error during recovery sequence restore rproc
> +	 * state in CRASHED
> +	 */
> +	if (ret)
> +		rproc->state = RPROC_CRASHED;

Got back to this after looking at Mathieu's synchronization series, I
think it would be cleaner if we move the rproc->state update out of
rproc_start() and rproc_stop().

That way we would leave the state in CRASHED state throughout the
recovery process, which I think makes it easier to reason about the
various states and their transitions.

Regards,
Bjorn

>  	mutex_unlock(&rproc->lock);
>  	return ret;
>  }
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-05-06  2:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 10:54 [RFC 0/2] Allow client to recover crashed processor Loic Pallardy
2020-03-11 10:54 ` [RFC 1/2] remoteproc: sysfs: authorize rproc shutdown when rproc is crashed Loic Pallardy
2020-03-11 21:45   ` Mathieu Poirier
2020-03-12  8:00     ` Loic PALLARDY
2020-03-11 23:27   ` Bjorn Andersson
2020-03-12  8:12     ` Loic PALLARDY
2020-03-25 17:57   ` Mathieu Poirier
2020-03-25 18:30     ` Loic PALLARDY
2020-03-25 21:42       ` Mathieu Poirier
2020-03-11 10:54 ` [RFC 2/2] remoteproc: core: keep rproc in crash state in case of recovery failure Loic Pallardy
2020-05-06  2:05   ` Bjorn Andersson
2020-03-11 14:56 ` [RFC 0/2] Allow client to recover crashed processor Mathieu Poirier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).