Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>,
	Jiri Pirko <jiri@resnulli.us>,
	"David Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	"Linux Kernel Network Developers" <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, <linuxarm@huawei.com>
Subject: Re: [PATCH net-next] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
Date: Wed, 2 Sep 2020 09:42:10 +0800	[thread overview]
Message-ID: <511bcb5c-b089-ab4e-4424-a83c6e718bfa@huawei.com> (raw)
In-Reply-To: <CAM_iQpVtb3Cks-LacZ865=C8r-_8ek1cy=n3SxELYGxvNgkPtw@mail.gmail.com>

On 2020/9/2 2:24, Cong Wang wrote:
> On Mon, Aug 31, 2020 at 5:59 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> Currently there is concurrent reset and enqueue operation for the
>> same lockless qdisc when there is no lock to synchronize the
>> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
>> qdisc_deactivate() called by dev_deactivate_queue(), which may cause
>> out-of-bounds access for priv->ring[] in hns3 driver if user has
>> requested a smaller queue num when __dev_xmit_skb() still enqueue a
>> skb with a larger queue_mapping after the corresponding qdisc is
>> reset, and call hns3_nic_net_xmit() with that skb later.
> 
> Can you be more specific here? Which call path requests a smaller
> tx queue num? If you mean netif_set_real_num_tx_queues(), clearly
> we already have a synchronize_net() there.

When the netdevice is in active state, the synchronize_net() seems to
do the correct work, as below:

CPU 0:                                       CPU1:
__dev_queue_xmit()                       netif_set_real_num_tx_queues()
rcu_read_lock_bh();
netdev_core_pick_tx(dev, skb, sb_dev);
	.
	.				dev->real_num_tx_queues = txq;
	.					.
	.				    	.
        .                               synchronize_net();
	.					.
q->enqueue()					.
	.					.
rcu_read_unlock_bh()				.
					qdisc_reset_all_tx_gt


but dev->real_num_tx_queues is not RCU-protected, maybe that is a problem
too.

The problem we hit is as below:
In hns3_set_channels(), hns3_reset_notify(h, HNAE3_DOWN_CLIENT) is called
to deactive the netdevice when user requested a smaller queue num, and
txq->qdisc is already changed to noop_qdisc when calling
netif_set_real_num_tx_queues(), so the synchronize_net() in the function
netif_set_real_num_tx_queues() does not help here.

> 
>>
>> Avoid the above concurrent op by calling synchronize_rcu_tasks()
>> after assigning new qdisc to dev_queue->qdisc and before calling
>> qdisc_deactivate() to make sure skb with larger queue_mapping
>> enqueued to old qdisc will always be reset when qdisc_deactivate()
>> is called.
> 
> Like Eric said, it is not nice to call such a blocking function when
> we have a large number of TX queues. Possibly we just need to
> add a synchronize_net() as in netif_set_real_num_tx_queues(),
> if it is missing.

As above, the synchronize_net() in netif_set_real_num_tx_queues() seems
to work when netdevice is in active state, but does not work when in
deactive.

And we do not want skb left in the old qdisc when netdevice is deactived,
right?

As reply to Eric, maybe the existing synchronize_net() in dev_deactivate_many()
can be reused to order the qdisc assignment and qdisc reset?

> 
> Thanks.
> .
> 

  reply	other threads:[~2020-09-02  1:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-01  0:55 Yunsheng Lin
2020-09-01  6:48 ` Eric Dumazet
2020-09-01  7:27   ` Yunsheng Lin
2020-09-01 18:34     ` David Miller
2020-09-02  1:43       ` Yunsheng Lin
2020-09-01 18:24 ` Cong Wang
2020-09-02  1:42   ` Yunsheng Lin [this message]
2020-09-02  4:41     ` Cong Wang
2020-09-02  6:34       ` Yunsheng Lin
2020-09-02  7:32         ` Eric Dumazet
2020-09-02  8:14           ` Yunsheng Lin
2020-09-02  9:20             ` Eric Dumazet
2020-09-03  1:14               ` Yunsheng Lin
2020-09-03  7:24                 ` Eric Dumazet
2020-09-04  8:10                   ` Yunsheng Lin
2020-09-03  0:35         ` Cong Wang
2020-09-03  1:21           ` Yunsheng Lin
2020-09-03  1:48             ` Cong Wang
2020-09-03  2:22               ` Yunsheng Lin
2020-09-03  2:53                 ` Cong Wang
2020-09-04  1:30                   ` John Fastabend
2020-09-04  8:08                     ` Yunsheng Lin
2020-09-06  8:52 ` [net] 6fd0d0dede: hwsim.ap_ht40_5ghz_switch.fail kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=511bcb5c-b089-ab4e-4424-a83c6e718bfa@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    --subject='Re: [PATCH net-next] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).