LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
@ 2021-05-11  7:03 Wesley Cheng
  2021-05-11  8:13 ` Felipe Balbi
  0 siblings, 1 reply; 8+ messages in thread
From: Wesley Cheng @ 2021-05-11  7:03 UTC (permalink / raw)
  To: balbi, gregkh, peter.chen; +Cc: linux-usb, linux-kernel, jackp, Wesley Cheng

The list_for_each_entry_safe() macro saves the current item (n) and
the item after (n+1), so that n can be safely removed without
corrupting the list.  However, when traversing the list and removing
items using gadget giveback, the DWC3 lock is briefly released,
allowing other routines to execute.  There is a situation where, while
items are being removed from the cancelled_list using
dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
routine is running in parallel (due to UDC unbind).  As the cleanup
routine removes n, and the pullup disable removes n+1, once the
cleanup retakes the DWC3 lock, it references a request who was already
removed/handled.  With list debug enabled, this leads to a panic.
Ensure all instances of the macro are replaced where gadget giveback
is used.

Example call stack:

Thread#1:
__dwc3_gadget_ep_set_halt() - CLEAR HALT
  -> dwc3_gadget_ep_cleanup_cancelled_requests()
    ->list_for_each_entry_safe()
    ->dwc3_gadget_giveback(n)
      ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
      ->spin_unlock
      ->Thread#2 executes
      ...
    ->dwc3_gadget_giveback(n+1)
      ->Already removed!

Thread#2:
dwc3_gadget_pullup()
  ->waiting for dwc3 spin_lock
  ...
  ->Thread#1 released lock
  ->dwc3_stop_active_transfers()
    ->dwc3_remove_requests()
      ->fetches n+1 item from cancelled_list (n removed by Thread#1)
      ->dwc3_gadget_giveback()
        ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list]
        ->spin_unlock

Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Reviewed-by: Peter Chen <peter.chen@kernel.org>
---
Changes in v2:
 - Updated commit message with context call stack of an example scenario
   seen on device.

 drivers/usb/dwc3/gadget.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index dd80e5c..efa939b 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1737,10 +1737,10 @@ static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *r
 static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
 {
 	struct dwc3_request		*req;
-	struct dwc3_request		*tmp;
 	struct dwc3			*dwc = dep->dwc;
 
-	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
+	while (!list_empty(&dep->cancelled_list)) {
+		req = next_request(&dep->cancelled_list);
 		dwc3_gadget_ep_skip_trbs(dep, req);
 		switch (req->status) {
 		case DWC3_REQUEST_STATUS_DISCONNECTED:
@@ -2935,11 +2935,11 @@ static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
 		const struct dwc3_event_depevt *event, int status)
 {
 	struct dwc3_request	*req;
-	struct dwc3_request	*tmp;
 
-	list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
+	while (!list_empty(&dep->started_list)) {
 		int ret;
 
+		req = next_request(&dep->started_list);
 		ret = dwc3_gadget_ep_cleanup_completed_request(dep, event,
 				req, status);
 		if (ret)
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
  2021-05-11  7:03 [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback Wesley Cheng
@ 2021-05-11  8:13 ` Felipe Balbi
  2021-05-19  8:52   ` Wesley Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Felipe Balbi @ 2021-05-11  8:13 UTC (permalink / raw)
  To: Wesley Cheng, gregkh, peter.chen
  Cc: linux-usb, linux-kernel, jackp, Wesley Cheng

[-- Attachment #1: Type: text/plain, Size: 2928 bytes --]


Hi,

Wesley Cheng <wcheng@codeaurora.org> writes:
> The list_for_each_entry_safe() macro saves the current item (n) and
> the item after (n+1), so that n can be safely removed without
> corrupting the list.  However, when traversing the list and removing
> items using gadget giveback, the DWC3 lock is briefly released,
> allowing other routines to execute.  There is a situation where, while
> items are being removed from the cancelled_list using
> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
> routine is running in parallel (due to UDC unbind).  As the cleanup
> routine removes n, and the pullup disable removes n+1, once the
> cleanup retakes the DWC3 lock, it references a request who was already
> removed/handled.  With list debug enabled, this leads to a panic.
> Ensure all instances of the macro are replaced where gadget giveback
> is used.
>
> Example call stack:
>
> Thread#1:
> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>     ->list_for_each_entry_safe()
>     ->dwc3_gadget_giveback(n)
>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>       ->spin_unlock
>       ->Thread#2 executes
>       ...
>     ->dwc3_gadget_giveback(n+1)
>       ->Already removed!
>
> Thread#2:
> dwc3_gadget_pullup()
>   ->waiting for dwc3 spin_lock
>   ...
>   ->Thread#1 released lock
>   ->dwc3_stop_active_transfers()
>     ->dwc3_remove_requests()
>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>       ->dwc3_gadget_giveback()
>         ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list]
>         ->spin_unlock
>
> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
> Reviewed-by: Peter Chen <peter.chen@kernel.org>
> ---
> Changes in v2:
>  - Updated commit message with context call stack of an example scenario
>    seen on device.
>
>  drivers/usb/dwc3/gadget.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index dd80e5c..efa939b 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1737,10 +1737,10 @@ static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *r
>  static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>  {
>  	struct dwc3_request		*req;
> -	struct dwc3_request		*tmp;
>  	struct dwc3			*dwc = dep->dwc;
>  
> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
> +	while (!list_empty(&dep->cancelled_list)) {
> +		req = next_request(&dep->cancelled_list);

couldn't this be solved list_replace_init() instead? Then we can keep
using the regular list_for_each_entry_safe() which has an added semantic
meaning due to its name.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
  2021-05-11  8:13 ` Felipe Balbi
@ 2021-05-19  8:52   ` Wesley Cheng
  2021-06-09 20:57     ` Wesley Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Wesley Cheng @ 2021-05-19  8:52 UTC (permalink / raw)
  To: Felipe Balbi, gregkh, peter.chen; +Cc: linux-usb, linux-kernel, jackp



On 5/11/2021 1:13 AM, Felipe Balbi wrote:
> 
> Hi,
> 
> Wesley Cheng <wcheng@codeaurora.org> writes:
>> The list_for_each_entry_safe() macro saves the current item (n) and
>> the item after (n+1), so that n can be safely removed without
>> corrupting the list.  However, when traversing the list and removing
>> items using gadget giveback, the DWC3 lock is briefly released,
>> allowing other routines to execute.  There is a situation where, while
>> items are being removed from the cancelled_list using
>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
>> routine is running in parallel (due to UDC unbind).  As the cleanup
>> routine removes n, and the pullup disable removes n+1, once the
>> cleanup retakes the DWC3 lock, it references a request who was already
>> removed/handled.  With list debug enabled, this leads to a panic.
>> Ensure all instances of the macro are replaced where gadget giveback
>> is used.
>>
>> Example call stack:
>>
>> Thread#1:
>> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>>     ->list_for_each_entry_safe()
>>     ->dwc3_gadget_giveback(n)
>>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>>       ->spin_unlock
>>       ->Thread#2 executes
>>       ...
>>     ->dwc3_gadget_giveback(n+1)
>>       ->Already removed!
>>
>> Thread#2:
>> dwc3_gadget_pullup()
>>   ->waiting for dwc3 spin_lock
>>   ...
>>   ->Thread#1 released lock
>>   ->dwc3_stop_active_transfers()
>>     ->dwc3_remove_requests()
>>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>>       ->dwc3_gadget_giveback()
>>         ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list]
>>         ->spin_unlock
>>
>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>> Reviewed-by: Peter Chen <peter.chen@kernel.org>
>> ---
>> Changes in v2:
>>  - Updated commit message with context call stack of an example scenario
>>    seen on device.
>>
>>  drivers/usb/dwc3/gadget.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index dd80e5c..efa939b 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -1737,10 +1737,10 @@ static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *r
>>  static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>  {
>>  	struct dwc3_request		*req;
>> -	struct dwc3_request		*tmp;
>>  	struct dwc3			*dwc = dep->dwc;
>>  
>> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
>> +	while (!list_empty(&dep->cancelled_list)) {
>> +		req = next_request(&dep->cancelled_list);
> 
> couldn't this be solved list_replace_init() instead? Then we can keep
> using the regular list_for_each_entry_safe() which has an added semantic
> meaning due to its name.
> 

Hi Felipe,

Sorry for the late response.  So I tried with a list_replace_init() to
within the list_for_each_entry_safe() loop to update tmp w/ the
cancelled_list list head, but the issue was still observed.  This is
because we can't replace the reference the loop already has stored in
tmp, which is simply updated as the current item on the next iteration.

I believe this is what you were trying to achieve?

Thanks
Wesley Cheng

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
  2021-05-19  8:52   ` Wesley Cheng
@ 2021-06-09 20:57     ` Wesley Cheng
  2021-07-21  6:14       ` Wesley Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Wesley Cheng @ 2021-06-09 20:57 UTC (permalink / raw)
  To: Felipe Balbi, gregkh, peter.chen; +Cc: linux-usb, linux-kernel, jackp

Hi Felipe,

On 5/19/2021 1:52 AM, Wesley Cheng wrote:
> 
> 
> On 5/11/2021 1:13 AM, Felipe Balbi wrote:
>>
>> Hi,
>>
>> Wesley Cheng <wcheng@codeaurora.org> writes:
>>> The list_for_each_entry_safe() macro saves the current item (n) and
>>> the item after (n+1), so that n can be safely removed without
>>> corrupting the list.  However, when traversing the list and removing
>>> items using gadget giveback, the DWC3 lock is briefly released,
>>> allowing other routines to execute.  There is a situation where, while
>>> items are being removed from the cancelled_list using
>>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
>>> routine is running in parallel (due to UDC unbind).  As the cleanup
>>> routine removes n, and the pullup disable removes n+1, once the
>>> cleanup retakes the DWC3 lock, it references a request who was already
>>> removed/handled.  With list debug enabled, this leads to a panic.
>>> Ensure all instances of the macro are replaced where gadget giveback
>>> is used.
>>>
>>> Example call stack:
>>>
>>> Thread#1:
>>> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>>>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>>>     ->list_for_each_entry_safe()
>>>     ->dwc3_gadget_giveback(n)
>>>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>>>       ->spin_unlock
>>>       ->Thread#2 executes
>>>       ...
>>>     ->dwc3_gadget_giveback(n+1)
>>>       ->Already removed!
>>>
>>> Thread#2:
>>> dwc3_gadget_pullup()
>>>   ->waiting for dwc3 spin_lock
>>>   ...
>>>   ->Thread#1 released lock
>>>   ->dwc3_stop_active_transfers()
>>>     ->dwc3_remove_requests()
>>>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>>>       ->dwc3_gadget_giveback()
>>>         ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list]
>>>         ->spin_unlock
>>>
>>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>> Reviewed-by: Peter Chen <peter.chen@kernel.org>
>>> ---
>>> Changes in v2:
>>>  - Updated commit message with context call stack of an example scenario
>>>    seen on device.
>>>
>>>  drivers/usb/dwc3/gadget.c | 8 ++++----
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index dd80e5c..efa939b 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -1737,10 +1737,10 @@ static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *r
>>>  static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>>  {
>>>  	struct dwc3_request		*req;
>>> -	struct dwc3_request		*tmp;
>>>  	struct dwc3			*dwc = dep->dwc;
>>>  
>>> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
>>> +	while (!list_empty(&dep->cancelled_list)) {
>>> +		req = next_request(&dep->cancelled_list);
>>
>> couldn't this be solved list_replace_init() instead? Then we can keep
>> using the regular list_for_each_entry_safe() which has an added semantic
>> meaning due to its name.
>>
> 
> Hi Felipe,
> 
> Sorry for the late response.  So I tried with a list_replace_init() to
> within the list_for_each_entry_safe() loop to update tmp w/ the
> cancelled_list list head, but the issue was still observed.  This is
> because we can't replace the reference the loop already has stored in
> tmp, which is simply updated as the current item on the next iteration.
> 
> I believe this is what you were trying to achieve?
> 
Was wondering if you had any further inputs on this change?  As
mentioned, I tried a few things with list_replace_init(), which did not
work.

Thanks
Wesley Cheng

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
  2021-06-09 20:57     ` Wesley Cheng
@ 2021-07-21  6:14       ` Wesley Cheng
  2021-07-21  6:30         ` Felipe Balbi
  0 siblings, 1 reply; 8+ messages in thread
From: Wesley Cheng @ 2021-07-21  6:14 UTC (permalink / raw)
  To: Felipe Balbi, gregkh, peter.chen; +Cc: linux-usb, linux-kernel, jackp

Hi Felipe,

On 6/9/2021 1:57 PM, Wesley Cheng wrote:
> Hi Felipe,
> 
> On 5/19/2021 1:52 AM, Wesley Cheng wrote:
>>
>>
>> On 5/11/2021 1:13 AM, Felipe Balbi wrote:
>>>
>>> Hi,
>>>
>>> Wesley Cheng <wcheng@codeaurora.org> writes:
>>>> The list_for_each_entry_safe() macro saves the current item (n) and
>>>> the item after (n+1), so that n can be safely removed without
>>>> corrupting the list.  However, when traversing the list and removing
>>>> items using gadget giveback, the DWC3 lock is briefly released,
>>>> allowing other routines to execute.  There is a situation where, while
>>>> items are being removed from the cancelled_list using
>>>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
>>>> routine is running in parallel (due to UDC unbind).  As the cleanup
>>>> routine removes n, and the pullup disable removes n+1, once the
>>>> cleanup retakes the DWC3 lock, it references a request who was already
>>>> removed/handled.  With list debug enabled, this leads to a panic.
>>>> Ensure all instances of the macro are replaced where gadget giveback
>>>> is used.
>>>>
>>>> Example call stack:
>>>>
>>>> Thread#1:
>>>> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>>>>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>>>>     ->list_for_each_entry_safe()
>>>>     ->dwc3_gadget_giveback(n)
>>>>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>>>>       ->spin_unlock
>>>>       ->Thread#2 executes
>>>>       ...
>>>>     ->dwc3_gadget_giveback(n+1)
>>>>       ->Already removed!
>>>>
>>>> Thread#2:
>>>> dwc3_gadget_pullup()
>>>>   ->waiting for dwc3 spin_lock
>>>>   ...
>>>>   ->Thread#1 released lock
>>>>   ->dwc3_stop_active_transfers()
>>>>     ->dwc3_remove_requests()
>>>>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>>>>       ->dwc3_gadget_giveback()
>>>>         ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list]
>>>>         ->spin_unlock
>>>>
>>>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>> Reviewed-by: Peter Chen <peter.chen@kernel.org>
>>>> ---
>>>> Changes in v2:
>>>>  - Updated commit message with context call stack of an example scenario
>>>>    seen on device.
>>>>
>>>>  drivers/usb/dwc3/gadget.c | 8 ++++----
>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>> index dd80e5c..efa939b 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -1737,10 +1737,10 @@ static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *r
>>>>  static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>>>  {
>>>>  	struct dwc3_request		*req;
>>>> -	struct dwc3_request		*tmp;
>>>>  	struct dwc3			*dwc = dep->dwc;
>>>>  
>>>> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
>>>> +	while (!list_empty(&dep->cancelled_list)) {
>>>> +		req = next_request(&dep->cancelled_list);
>>>
>>> couldn't this be solved list_replace_init() instead? Then we can keep
>>> using the regular list_for_each_entry_safe() which has an added semantic
>>> meaning due to its name.
>>>
>>
>> Hi Felipe,
>>
>> Sorry for the late response.  So I tried with a list_replace_init() to
>> within the list_for_each_entry_safe() loop to update tmp w/ the
>> cancelled_list list head, but the issue was still observed.  This is
>> because we can't replace the reference the loop already has stored in
>> tmp, which is simply updated as the current item on the next iteration.
>>
>> I believe this is what you were trying to achieve?
>>
> Was wondering if you had any further inputs on this change?  As
> mentioned, I tried a few things with list_replace_init(), which did not
> work.
> 

Sorry for the ping.  Is this change OK to add as is?  We've been running
into this instance pretty frequently during our testing, so just wanted
to close on the proper changes being merged upstream.

Thanks
Wesley Cheng


-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
  2021-07-21  6:14       ` Wesley Cheng
@ 2021-07-21  6:30         ` Felipe Balbi
  2021-07-27 21:00           ` Wesley Cheng
  0 siblings, 1 reply; 8+ messages in thread
From: Felipe Balbi @ 2021-07-21  6:30 UTC (permalink / raw)
  To: Wesley Cheng, gregkh, peter.chen; +Cc: linux-usb, linux-kernel, jackp

[-- Attachment #1: Type: text/plain, Size: 4897 bytes --]


Hi Wesley,

(first of all, sorry for the super long delay. This really fell through
the cracks)

Wesley Cheng <wcheng@codeaurora.org> writes:
> Hi Felipe,
>
> On 6/9/2021 1:57 PM, Wesley Cheng wrote:
>> Hi Felipe,
>> 
>> On 5/19/2021 1:52 AM, Wesley Cheng wrote:
>>>
>>>
>>> On 5/11/2021 1:13 AM, Felipe Balbi wrote:
>>>>
>>>> Hi,
>>>>
>>>> Wesley Cheng <wcheng@codeaurora.org> writes:
>>>>> The list_for_each_entry_safe() macro saves the current item (n) and
>>>>> the item after (n+1), so that n can be safely removed without
>>>>> corrupting the list.  However, when traversing the list and removing
>>>>> items using gadget giveback, the DWC3 lock is briefly released,
>>>>> allowing other routines to execute.  There is a situation where, while
>>>>> items are being removed from the cancelled_list using
>>>>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
>>>>> routine is running in parallel (due to UDC unbind).  As the cleanup
>>>>> routine removes n, and the pullup disable removes n+1, once the
>>>>> cleanup retakes the DWC3 lock, it references a request who was already
>>>>> removed/handled.  With list debug enabled, this leads to a panic.
>>>>> Ensure all instances of the macro are replaced where gadget giveback
>>>>> is used.
>>>>>
>>>>> Example call stack:
>>>>>
>>>>> Thread#1:
>>>>> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>>>>>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>>>>>     ->list_for_each_entry_safe()
>>>>>     ->dwc3_gadget_giveback(n)
>>>>>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>>>>>       ->spin_unlock
>>>>>       ->Thread#2 executes
>>>>>       ...
>>>>>     ->dwc3_gadget_giveback(n+1)
>>>>>       ->Already removed!
>>>>>
>>>>> Thread#2:
>>>>> dwc3_gadget_pullup()
>>>>>   ->waiting for dwc3 spin_lock
>>>>>   ...
>>>>>   ->Thread#1 released lock
>>>>>   ->dwc3_stop_active_transfers()
>>>>>     ->dwc3_remove_requests()
>>>>>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>>>>>       ->dwc3_gadget_giveback()
>>>>>         ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list]
>>>>>         ->spin_unlock
>>>>>
>>>>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>> Reviewed-by: Peter Chen <peter.chen@kernel.org>
>>>>> ---
>>>>> Changes in v2:
>>>>>  - Updated commit message with context call stack of an example scenario
>>>>>    seen on device.
>>>>>
>>>>>  drivers/usb/dwc3/gadget.c | 8 ++++----
>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>> index dd80e5c..efa939b 100644
>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>> @@ -1737,10 +1737,10 @@ static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *r
>>>>>  static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>>>>  {
>>>>>  	struct dwc3_request		*req;
>>>>> -	struct dwc3_request		*tmp;
>>>>>  	struct dwc3			*dwc = dep->dwc;
>>>>>  
>>>>> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
>>>>> +	while (!list_empty(&dep->cancelled_list)) {
>>>>> +		req = next_request(&dep->cancelled_list);
>>>>
>>>> couldn't this be solved list_replace_init() instead? Then we can keep
>>>> using the regular list_for_each_entry_safe() which has an added semantic
>>>> meaning due to its name.
>>>>
>>>
>>> Hi Felipe,
>>>
>>> Sorry for the late response.  So I tried with a list_replace_init() to
>>> within the list_for_each_entry_safe() loop to update tmp w/ the
>>> cancelled_list list head, but the issue was still observed.  This is
>>> because we can't replace the reference the loop already has stored in
>>> tmp, which is simply updated as the current item on the next iteration.
>>>
>>> I believe this is what you were trying to achieve?
>>>
>> Was wondering if you had any further inputs on this change?  As
>> mentioned, I tried a few things with list_replace_init(), which did not
>> work.
>> 
>
> Sorry for the ping.  Is this change OK to add as is?  We've been running
> into this instance pretty frequently during our testing, so just wanted
> to close on the proper changes being merged upstream.

The idea is this:

	struct list_head	local;

        spin_lock_irq(&lock);
        list_replace_init(&dwc->cancelled_list, &local);
        spin_unlock_irq(&lock);

	list_for_each_entry_safe(req, tmp, &local, list) {
        	/* ... */
	}

It looks to me this should work fine, no? You can also follow what
drivers/usb/core/hcd.c is doing in usb_giveback_urb_bh() and restarting
if dwc->cancelled_list is not empty after list_for_each_entry_safe().

Can you give that one a shot?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
  2021-07-21  6:30         ` Felipe Balbi
@ 2021-07-27 21:00           ` Wesley Cheng
  2021-07-28  5:53             ` Felipe Balbi
  0 siblings, 1 reply; 8+ messages in thread
From: Wesley Cheng @ 2021-07-27 21:00 UTC (permalink / raw)
  To: Felipe Balbi, gregkh, peter.chen; +Cc: linux-usb, linux-kernel, jackp

Hi Felipe,

On 7/20/2021 11:30 PM, Felipe Balbi wrote:
> 
> Hi Wesley,
> 
> (first of all, sorry for the super long delay. This really fell through
> the cracks)
> 

No problem, I understand that you probably get a whole bunch of things
all at once, so its hard to keep track of each one :).

> Wesley Cheng <wcheng@codeaurora.org> writes:
>> Hi Felipe,
>>
>> On 6/9/2021 1:57 PM, Wesley Cheng wrote:
>>> Hi Felipe,
>>>
>>> On 5/19/2021 1:52 AM, Wesley Cheng wrote:
>>>>
>>>>
>>>> On 5/11/2021 1:13 AM, Felipe Balbi wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Wesley Cheng <wcheng@codeaurora.org> writes:
>>>>>> The list_for_each_entry_safe() macro saves the current item (n) and
>>>>>> the item after (n+1), so that n can be safely removed without
>>>>>> corrupting the list.  However, when traversing the list and removing
>>>>>> items using gadget giveback, the DWC3 lock is briefly released,
>>>>>> allowing other routines to execute.  There is a situation where, while
>>>>>> items are being removed from the cancelled_list using
>>>>>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
>>>>>> routine is running in parallel (due to UDC unbind).  As the cleanup
>>>>>> routine removes n, and the pullup disable removes n+1, once the
>>>>>> cleanup retakes the DWC3 lock, it references a request who was already
>>>>>> removed/handled.  With list debug enabled, this leads to a panic.
>>>>>> Ensure all instances of the macro are replaced where gadget giveback
>>>>>> is used.
>>>>>>
>>>>>> Example call stack:
>>>>>>
>>>>>> Thread#1:
>>>>>> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>>>>>>   -> dwc3_gadget_ep_cleanup_cancelled_requests()
>>>>>>     ->list_for_each_entry_safe()
>>>>>>     ->dwc3_gadget_giveback(n)
>>>>>>       ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>>>>>>       ->spin_unlock
>>>>>>       ->Thread#2 executes
>>>>>>       ...
>>>>>>     ->dwc3_gadget_giveback(n+1)
>>>>>>       ->Already removed!
>>>>>>
>>>>>> Thread#2:
>>>>>> dwc3_gadget_pullup()
>>>>>>   ->waiting for dwc3 spin_lock
>>>>>>   ...
>>>>>>   ->Thread#1 released lock
>>>>>>   ->dwc3_stop_active_transfers()
>>>>>>     ->dwc3_remove_requests()
>>>>>>       ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>>>>>>       ->dwc3_gadget_giveback()
>>>>>>         ->dwc3_gadget_del_and_unmap_request()- n+1 deleted[cancelled_list]
>>>>>>         ->spin_unlock
>>>>>>
>>>>>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
>>>>>> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
>>>>>> Reviewed-by: Peter Chen <peter.chen@kernel.org>
>>>>>> ---
>>>>>> Changes in v2:
>>>>>>  - Updated commit message with context call stack of an example scenario
>>>>>>    seen on device.
>>>>>>
>>>>>>  drivers/usb/dwc3/gadget.c | 8 ++++----
>>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>> index dd80e5c..efa939b 100644
>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>> @@ -1737,10 +1737,10 @@ static void dwc3_gadget_ep_skip_trbs(struct dwc3_ep *dep, struct dwc3_request *r
>>>>>>  static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>>>>>>  {
>>>>>>  	struct dwc3_request		*req;
>>>>>> -	struct dwc3_request		*tmp;
>>>>>>  	struct dwc3			*dwc = dep->dwc;
>>>>>>  
>>>>>> -	list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
>>>>>> +	while (!list_empty(&dep->cancelled_list)) {
>>>>>> +		req = next_request(&dep->cancelled_list);
>>>>>
>>>>> couldn't this be solved list_replace_init() instead? Then we can keep
>>>>> using the regular list_for_each_entry_safe() which has an added semantic
>>>>> meaning due to its name.
>>>>>
>>>>
>>>> Hi Felipe,
>>>>
>>>> Sorry for the late response.  So I tried with a list_replace_init() to
>>>> within the list_for_each_entry_safe() loop to update tmp w/ the
>>>> cancelled_list list head, but the issue was still observed.  This is
>>>> because we can't replace the reference the loop already has stored in
>>>> tmp, which is simply updated as the current item on the next iteration.
>>>>
>>>> I believe this is what you were trying to achieve?
>>>>
>>> Was wondering if you had any further inputs on this change?  As
>>> mentioned, I tried a few things with list_replace_init(), which did not
>>> work.
>>>
>>
>> Sorry for the ping.  Is this change OK to add as is?  We've been running
>> into this instance pretty frequently during our testing, so just wanted
>> to close on the proper changes being merged upstream.
> 
> The idea is this:
> 
> 	struct list_head	local;
> 
>         spin_lock_irq(&lock);
>         list_replace_init(&dwc->cancelled_list, &local);
>         spin_unlock_irq(&lock);
> 
> 	list_for_each_entry_safe(req, tmp, &local, list) {
>         	/* ... */
> 	}
> 
> It looks to me this should work fine, no? You can also follow what
> drivers/usb/core/hcd.c is doing in usb_giveback_urb_bh() and restarting
> if dwc->cancelled_list is not empty after list_for_each_entry_safe().
> 
> Can you give that one a shot?
> 

Great, thanks for this suggestion!  Now I understand what you were
referring to.  I gave this a try and it works well.  Will prepare a
change to replace both places with list_replace_init()

Thanks
Wesley Cheng

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
  2021-07-27 21:00           ` Wesley Cheng
@ 2021-07-28  5:53             ` Felipe Balbi
  0 siblings, 0 replies; 8+ messages in thread
From: Felipe Balbi @ 2021-07-28  5:53 UTC (permalink / raw)
  To: Wesley Cheng; +Cc: gregkh, peter.chen, linux-usb, linux-kernel, jackp


Hi Wesley,

Wesley Cheng <wcheng@codeaurora.org> writes:
>>>>> Sorry for the late response.  So I tried with a list_replace_init() to
>>>>> within the list_for_each_entry_safe() loop to update tmp w/ the
>>>>> cancelled_list list head, but the issue was still observed.  This is
>>>>> because we can't replace the reference the loop already has stored in
>>>>> tmp, which is simply updated as the current item on the next iteration.
>>>>>
>>>>> I believe this is what you were trying to achieve?
>>>>>
>>>> Was wondering if you had any further inputs on this change?  As
>>>> mentioned, I tried a few things with list_replace_init(), which did not
>>>> work.
>>>>
>>>
>>> Sorry for the ping.  Is this change OK to add as is?  We've been running
>>> into this instance pretty frequently during our testing, so just wanted
>>> to close on the proper changes being merged upstream.
>> 
>> The idea is this:
>> 
>> 	struct list_head	local;
>> 
>>         spin_lock_irq(&lock);
>>         list_replace_init(&dwc->cancelled_list, &local);
>>         spin_unlock_irq(&lock);
>> 
>> 	list_for_each_entry_safe(req, tmp, &local, list) {
>>         	/* ... */
>> 	}
>> 
>> It looks to me this should work fine, no? You can also follow what
>> drivers/usb/core/hcd.c is doing in usb_giveback_urb_bh() and restarting
>> if dwc->cancelled_list is not empty after list_for_each_entry_safe().
>> 
>> Can you give that one a shot?
>> 
>
> Great, thanks for this suggestion!  Now I understand what you were
> referring to.  I gave this a try and it works well.  Will prepare a
> change to replace both places with list_replace_init()

this is great news :-) Thanks for trying it out

-- 
balbi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-07-28  5:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11  7:03 [PATCH v2] usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback Wesley Cheng
2021-05-11  8:13 ` Felipe Balbi
2021-05-19  8:52   ` Wesley Cheng
2021-06-09 20:57     ` Wesley Cheng
2021-07-21  6:14       ` Wesley Cheng
2021-07-21  6:30         ` Felipe Balbi
2021-07-27 21:00           ` Wesley Cheng
2021-07-28  5:53             ` Felipe Balbi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).