LKML Archive on
help / color / mirror / Atom feed
To: Cristian Marussi <>
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails
Date: Mon, 01 Nov 2021 09:35:42 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <20210901093558.GL13160@e120937-lin>

On 2021-09-01 02:35, Cristian Marussi wrote:
> On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote:
>> On Mon, Aug 30, 2021 at 02:09:37PM -0700, 
>> wrote:
>> > Hi Christian
>> Hi Rishabh,
>> thanks for looking into this kind of bad interactions.
>> > There seems to be another issue here. The response from agent can be delayed
>> > causing a timeout during base protocol acquire,
>> > which leads to the probe failure. What I have observed is sometimes the
>> > failure of probe and rx_callback (due to a delayed message)
>> > happens at the same time on different cpus.
>> > Because of this race, the device memory may be cleared while the
>> > interrupt(rx_callback) is executing on another cpu.
>> You are right that concurrency was not handled properly in this kind 
>> of
>> context and moreover, if you think about it, even the case of out of
>> order reception of responses and delayed_responses (type2 SCMI 
>> messages)
>> for asynchronous SCMI commands was not handled properly.
>> > How do you propose we solve this? Do you think it is better to take the
>> > setting up of base and other protocols out of probe and
>> > in some delayed work? That would imply the device memory is not released
>> > until remove is called. Or should we add locking to
>> > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid
>> > the race?
>> >
>> These issues were more easily exposed by SCMI Virtio transport, so in
>> the series where I introduced scmi-virtio:
>> (which is now queued for v5.15 ...  now on -next I think...finger 
>> crossed)
>> I took the chance to rectify a couple of other things in the SCMI core
>> in the initial commits.
>> As an example, in the above series
>>  [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and 
>> out-of-order messages
>> cares to add a refcount to xfers and some locking on xfers between TX
>> and RX path to avoid that a timed out xfer can vanish while the rx 
>> path
>> is concurrently working on it (as you said); moreover I handle the
>> condition (rare if not unplausible anyway) in which a transport 
>> delivers
>> out of order responses and delayed responses.
>> I tested this scenarios on some fake emulated SCMI Virtio transport
>> where I could play any sort of mess and tricks to stress this limit
>> conditions, but you're more than welcome to verify if the race you are
>> seeing on Base protocol time out is solved (as I would hope :D) by 
>> this
>> series of mine.
>> Let me know, any feedback is welcome.
>> Btw, in the series above there are also other minor changes, but there
>> is also another more radical change needed to ensure correctness and
>> protection against stale old messages which maybe could interest you
>> in general if you are looking into SCMI:
>> [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically 
>> increasing tokens
>> Let me know if yo have other concerns.
> Hi Rishabhb,
> just a quick remark, thinking again about your fail @probe scenario 
> above
> I realized that while the concurrency patch I mentioned above could 
> help on
> races against vanishing xfers when late timed-out responses are 
> delivered,
> here we really are then also shutting down everything on failure, so 
> there
> could be further issues between a very late invokation of 
> scmi_rx_callback
> and the core devm_ helpers freeing the underlying xfer/cinfo/etc.. 
> structs
> used by scmi-rx-callback itself (maybe this was already what you meant 
> and
> I didn't get it,...sorry)
> On the other side, I don't feel that delaying Base init to a deferred
> worker is a viable solution since we need Base protocol init to be
> initialized and we need to just give up if we cannot communicate with
> the SCMI platform fw in such early stages. (Base protocol is really the
> only mandatory proto is I remember correctly the spec)
> Currenly I'm off and only glancing at mails but I'll have a thought 
> about
> these issues once back in a few weeks time.
> Thanks,
> Cristian
Hi Cristian
I hope you enjoyed your vacation. Did you get a chance to look at the 
issue stated above
and have some idea as to how to solve this?
> _______________________________________________
> linux-arm-kernel mailing list

  reply	other threads:[~2021-11-01 16:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-04 21:19 Rishabh Bhatnagar
2021-08-05 10:54 ` Cristian Marussi
2021-08-30 21:09   ` rishabhb
2021-08-31  5:48     ` Cristian Marussi
2021-09-01  9:35       ` Cristian Marussi
2021-11-01 16:35         ` rishabhb [this message]
2021-11-02 11:32           ` Sudeep Holla
2021-11-04 23:40             ` rishabhb
2021-11-05  9:43               ` Cristian Marussi
2021-11-05 17:40                 ` rishabhb
2021-11-07 10:34                   ` Cristian Marussi
2021-11-07 18:22                     ` Cristian Marussi
2021-11-08 17:58                       ` rishabhb
2021-11-09 14:38                         ` Cristian Marussi
2021-08-09  5:00 ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).