Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Yunsheng Lin <linyunsheng@huawei.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Russell King - ARM Linux <linux@armlinux.org.uk>,
	Marcin Wojtas <mw@semihalf.com>, <linuxarm@openeuler.org>,
	<yisen.zhuang@huawei.com>, "Salil Mehta" <salil.mehta@huawei.com>,
	<thomas.petazzoni@bootlin.com>, <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"John Fastabend" <john.fastabend@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Will Deacon" <will@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	"Vlastimil Babka" <vbabka@suse.cz>, <fenghua.yu@intel.com>,
	<guro@fb.com>, Peter Xu <peterx@redhat.com>,
	Feng Tang <feng.tang@intel.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	Matteo Croce <mcroce@microsoft.com>,
	Hugh Dickins <hughd@google.com>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	"Alexander Lobakin" <alobakin@pm.me>,
	Willem de Bruijn <willemb@google.com>, <wenxu@ucloud.cn>,
	Cong Wang <cong.wang@bytedance.com>,
	Kevin Hao <haokexin@gmail.com>, <nogikh@google.com>,
	Marco Elver <elver@google.com>, Netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH rfc v2 3/5] page_pool: add page recycling support based on elevated refcnt
Date: Mon, 12 Jul 2021 10:06:01 +0800	[thread overview]
Message-ID: <d45619d1-5235-9dd4-77aa-497fa1115c80@huawei.com> (raw)
In-Reply-To: <CAKgT0Udnb_KgHyWiwxDF+r+DkytgZd4CQJz4QR85JpinhZAJzw@mail.gmail.com>

On 2021/7/11 1:31, Alexander Duyck wrote:
> On Sat, Jul 10, 2021 at 12:44 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> <snip>
>> @@ -419,6 +471,20 @@ static __always_inline struct page *
>>  __page_pool_put_page(struct page_pool *pool, struct page *page,
>>                      unsigned int dma_sync_size, bool allow_direct)
>>  {
>> +       int bias = page_pool_get_pagecnt_bias(page);
>> +
>> +       /* Handle the elevated refcnt case first */
>> +       if (bias) {
>> +               /* It is not the last user yet */
>> +               if (!page_pool_bias_page_recyclable(page, bias))
>> +                       return NULL;
>> +
>> +               if (likely(!page_is_pfmemalloc(page)))
>> +                       goto recyclable;
>> +               else
>> +                       goto unrecyclable;
>> +       }
>> +
> 
> So this part is still broken. Anything that takes a reference to the
> page and holds it while this is called will cause it to break. For
> example with the recent fixes we put in place all it would take is a
> skb_clone followed by pskb_expand_head and this starts leaking memory.

Ok, it seems the fix is confilcting with the expectation this patch is
based, which is "the last user will always call page_pool_put_full_page()
in order to do the recycling or do the resource cleanup(dma unmaping..etc)
and freeing.".

As the user of the new skb after skb_clone() and pskb_expand_head() is
not aware of that their frag page may still be in the page pool after
the fix?

> 
> One of the key bits in order for pagecnt_bias to work is that you have
> to deduct the bias once there are no more parties using it. Otherwise
> you leave the reference count artificially inflated and the page will
> never be freed. It works fine for the single producer single consumer
> case but once you introduce multiple consumers this is going to fall
> apart.

It seems we have diffferent understanding about consumer, I treat the
above user of new skb after skb_clone() and pskb_expand_head() as the
consumer of the page pool too, so that new skb should keep the recycle
bit in order for that to happen.

If the semantic is "the new user of a page should not be handled by page
pool if page pool is not aware of the new user(the new user is added by
calling page allocator API instead of calling the page pool API, like the
skb_clone() and pskb_expand_head() above) ", I suppose I am ok with that
semantic too as long as the above semantic is aligned with the people
involved.

Also, it seems _refcount and dma_addr in "struct page" is in the same cache
line, which means there is already cache line bouncing already between _refcount
and dma_addr updating, so it may makes senses to only use bias to indicate
number of the page pool user for a page, instead of using "bias - page_ref_count",
as the page_ref_count is not reliable if somebody is using the page allocator API
directly.

And the trick part seems to be how to make the bias atomic for allocating and
freeing.

Any better idea?

> .
> 

  reply	other threads:[~2021-07-12  2:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-10  7:43 [PATCH rfc v2 0/5] add elevated refcnt support for page pool Yunsheng Lin
2021-07-10  7:43 ` [PATCH rfc v2 1/5] page_pool: keep pp info as long as page pool owns the page Yunsheng Lin
2021-07-10  7:43 ` [PATCH rfc v2 2/5] page_pool: add interface for getting and setting pagecnt_bias Yunsheng Lin
2021-07-10 16:55   ` Alexander Duyck
2021-07-12  7:44     ` Yunsheng Lin
2021-07-10  7:43 ` [PATCH rfc v2 3/5] page_pool: add page recycling support based on elevated refcnt Yunsheng Lin
2021-07-10 17:24   ` Alexander Duyck
2021-07-12  7:54     ` Yunsheng Lin
2021-07-10 17:31   ` Alexander Duyck
2021-07-12  2:06     ` Yunsheng Lin [this message]
2021-07-12  3:30       ` Alexander Duyck
2021-07-10  7:43 ` [PATCH rfc v2 4/5] page_pool: support page frag API for page pool Yunsheng Lin
2021-07-10 17:43   ` Alexander Duyck
2021-07-12  7:57     ` Yunsheng Lin
2021-07-10  7:43 ` [PATCH rfc v2 5/5] net: hns3: support skb's frag page recycling based on " Yunsheng Lin
2021-07-10 15:40 ` [PATCH rfc v2 0/5] add elevated refcnt support for " Matteo Croce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d45619d1-5235-9dd4-77aa-497fa1115c80@huawei.com \
    --to=linyunsheng@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=alobakin@pm.me \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=elver@google.com \
    --cc=feng.tang@intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=guro@fb.com \
    --cc=haokexin@gmail.com \
    --cc=hawk@kernel.org \
    --cc=hughd@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxarm@openeuler.org \
    --cc=mcroce@microsoft.com \
    --cc=mw@semihalf.com \
    --cc=netdev@vger.kernel.org \
    --cc=nogikh@google.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=salil.mehta@huawei.com \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=vbabka@suse.cz \
    --cc=wenxu@ucloud.cn \
    --cc=will@kernel.org \
    --cc=willemb@google.com \
    --cc=willy@infradead.org \
    --cc=yisen.zhuang@huawei.com \
    --subject='Re: [PATCH rfc v2 3/5] page_pool: add page recycling support based on elevated refcnt' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).