Netdev Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH net] page_pool: mask the page->signature before the checking @ 2021-08-05 1:06 Yunsheng Lin 2021-08-05 1:50 ` Matthew Wilcox 0 siblings, 1 reply; 6+ messages in thread From: Yunsheng Lin @ 2021-08-05 1:06 UTC (permalink / raw) To: davem, kuba Cc: hawk, ilias.apalodimas, mcroce, willy, alexander.duyck, netdev, linux-kernel, linuxarm, chenhao288 As mentioned in commit c07aea3ef4d4 ("mm: add a signature in struct page"): "The page->signature field is aliased to page->lru.next and page->compound_head." And as the comment in page_is_pfmemalloc(): "lru.next has bit 1 set if the page is allocated from the pfmemalloc reserves. Callers may simply overwrite it if they do not need to preserve that information." The page->signature is or’ed with PP_SIGNATURE when a page is allocated in page pool, see __page_pool_alloc_pages_slow(), and page->signature is checked directly with PP_SIGNATURE in page_pool_return_skb_page(), which might cause resoure leaking problem for a page from page pool if bit 1 of lru.next is set for a pfmemalloc page. As bit 0 is page->compound_head, So mask both bit 0 and 1 before the checking in page_pool_return_skb_page(). Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> --- net/core/page_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 5e4eb45..33b7dd7 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -634,7 +634,7 @@ bool page_pool_return_skb_page(struct page *page) struct page_pool *pp; page = compound_head(page); - if (unlikely(page->pp_magic != PP_SIGNATURE)) + if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) return false; pp = page->pp; -- 2.7.4 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] page_pool: mask the page->signature before the checking 2021-08-05 1:06 [PATCH net] page_pool: mask the page->signature before the checking Yunsheng Lin @ 2021-08-05 1:50 ` Matthew Wilcox 2021-08-05 2:14 ` Yunsheng Lin 0 siblings, 1 reply; 6+ messages in thread From: Matthew Wilcox @ 2021-08-05 1:50 UTC (permalink / raw) To: Yunsheng Lin Cc: davem, kuba, hawk, ilias.apalodimas, mcroce, alexander.duyck, netdev, linux-kernel, linuxarm, chenhao288 On Thu, Aug 05, 2021 at 09:06:57AM +0800, Yunsheng Lin wrote: > As mentioned in commit c07aea3ef4d4 ("mm: add a signature > in struct page"): > "The page->signature field is aliased to page->lru.next and > page->compound_head." > > And as the comment in page_is_pfmemalloc(): > "lru.next has bit 1 set if the page is allocated from the > pfmemalloc reserves. Callers may simply overwrite it if they > do not need to preserve that information." > > The page->signature is or’ed with PP_SIGNATURE when a page is > allocated in page pool, see __page_pool_alloc_pages_slow(), > and page->signature is checked directly with PP_SIGNATURE in > page_pool_return_skb_page(), which might cause resoure leaking > problem for a page from page pool if bit 1 of lru.next is set for > a pfmemalloc page. > > As bit 0 is page->compound_head, So mask both bit 0 and 1 before > the checking in page_pool_return_skb_page(). No, you don't understand. We *want* the check to fail if we were low on memory so we return the emergency allocation. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] page_pool: mask the page->signature before the checking 2021-08-05 1:50 ` Matthew Wilcox @ 2021-08-05 2:14 ` Yunsheng Lin 2021-08-05 8:54 ` Ilias Apalodimas 0 siblings, 1 reply; 6+ messages in thread From: Yunsheng Lin @ 2021-08-05 2:14 UTC (permalink / raw) To: Matthew Wilcox Cc: davem, kuba, hawk, ilias.apalodimas, mcroce, alexander.duyck, netdev, linux-kernel, linuxarm, chenhao288 On 2021/8/5 9:50, Matthew Wilcox wrote: > On Thu, Aug 05, 2021 at 09:06:57AM +0800, Yunsheng Lin wrote: >> As mentioned in commit c07aea3ef4d4 ("mm: add a signature >> in struct page"): >> "The page->signature field is aliased to page->lru.next and >> page->compound_head." >> >> And as the comment in page_is_pfmemalloc(): >> "lru.next has bit 1 set if the page is allocated from the >> pfmemalloc reserves. Callers may simply overwrite it if they >> do not need to preserve that information." >> >> The page->signature is or’ed with PP_SIGNATURE when a page is >> allocated in page pool, see __page_pool_alloc_pages_slow(), >> and page->signature is checked directly with PP_SIGNATURE in >> page_pool_return_skb_page(), which might cause resoure leaking >> problem for a page from page pool if bit 1 of lru.next is set for >> a pfmemalloc page. >> >> As bit 0 is page->compound_head, So mask both bit 0 and 1 before >> the checking in page_pool_return_skb_page(). > > No, you don't understand. We *want* the check to fail if we were low > on memory so we return the emergency allocation. If the check failed, but the page pool assume the page is not from page pool and will not do the resource cleaning(like dma unmapping), as the page pool still use the page with pfmemalloc set and dma map the page if pp_flags & PP_FLAG_DMA_MAP is true in __page_pool_alloc_pages_slow(). The returning the emergency allocation you mentioned seems to be handled in __page_pool_put_page(), see: https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L411 We just use the page with pfmemalloc one time and do the resource cleaning before returning the page back to page allocator. Or did I miss something here? > . > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] page_pool: mask the page->signature before the checking 2021-08-05 2:14 ` Yunsheng Lin @ 2021-08-05 8:54 ` Ilias Apalodimas 2021-08-05 9:31 ` Yunsheng Lin 0 siblings, 1 reply; 6+ messages in thread From: Ilias Apalodimas @ 2021-08-05 8:54 UTC (permalink / raw) To: Yunsheng Lin Cc: Matthew Wilcox, davem, kuba, hawk, mcroce, alexander.duyck, netdev, linux-kernel, linuxarm, chenhao288 On Thu, Aug 05, 2021 at 10:14:39AM +0800, Yunsheng Lin wrote: > On 2021/8/5 9:50, Matthew Wilcox wrote: > > On Thu, Aug 05, 2021 at 09:06:57AM +0800, Yunsheng Lin wrote: > >> As mentioned in commit c07aea3ef4d4 ("mm: add a signature > >> in struct page"): > >> "The page->signature field is aliased to page->lru.next and > >> page->compound_head." > >> > >> And as the comment in page_is_pfmemalloc(): > >> "lru.next has bit 1 set if the page is allocated from the > >> pfmemalloc reserves. Callers may simply overwrite it if they > >> do not need to preserve that information." > >> > >> The page->signature is or???ed with PP_SIGNATURE when a page is > >> allocated in page pool, see __page_pool_alloc_pages_slow(), > >> and page->signature is checked directly with PP_SIGNATURE in > >> page_pool_return_skb_page(), which might cause resoure leaking > >> problem for a page from page pool if bit 1 of lru.next is set for > >> a pfmemalloc page. > >> > >> As bit 0 is page->compound_head, So mask both bit 0 and 1 before > >> the checking in page_pool_return_skb_page(). > > > > No, you don't understand. We *want* the check to fail if we were low > > on memory so we return the emergency allocation. > > If the check failed, but the page pool assume the page is not from page > pool and will not do the resource cleaning(like dma unmapping), as the > page pool still use the page with pfmemalloc set and dma map the page > if pp_flags & PP_FLAG_DMA_MAP is true in __page_pool_alloc_pages_slow(). > > The returning the emergency allocation you mentioned seems to be handled > in __page_pool_put_page(), see: > > https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L411 > > We just use the page with pfmemalloc one time and do the resource cleaning > before returning the page back to page allocator. Or did I miss something > here? > > > . > > I think you are right here. What happens is that the original pp->signature is OR'ed after the allocation in order to preserve any existing bits. When those are present though the if which will trigger the recycling will fail and those DMA mapping will be left stale. If we mask the bits during the check (as your patch does), we'll end up not recycling the page anyway since it has the pfmemalloc bit set. The page pool recycle function will end up releasing the page and the DMA mappings right? Regards /Ilias ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] page_pool: mask the page->signature before the checking 2021-08-05 8:54 ` Ilias Apalodimas @ 2021-08-05 9:31 ` Yunsheng Lin 2021-08-05 14:47 ` Ilias Apalodimas 0 siblings, 1 reply; 6+ messages in thread From: Yunsheng Lin @ 2021-08-05 9:31 UTC (permalink / raw) To: Ilias Apalodimas Cc: Matthew Wilcox, davem, kuba, hawk, mcroce, alexander.duyck, netdev, linux-kernel, linuxarm, chenhao288 On 2021/8/5 16:54, Ilias Apalodimas wrote: > On Thu, Aug 05, 2021 at 10:14:39AM +0800, Yunsheng Lin wrote: >> On 2021/8/5 9:50, Matthew Wilcox wrote: >>> On Thu, Aug 05, 2021 at 09:06:57AM +0800, Yunsheng Lin wrote: >>>> As mentioned in commit c07aea3ef4d4 ("mm: add a signature >>>> in struct page"): >>>> "The page->signature field is aliased to page->lru.next and >>>> page->compound_head." >>>> >>>> And as the comment in page_is_pfmemalloc(): >>>> "lru.next has bit 1 set if the page is allocated from the >>>> pfmemalloc reserves. Callers may simply overwrite it if they >>>> do not need to preserve that information." >>>> >>>> The page->signature is or???ed with PP_SIGNATURE when a page is >>>> allocated in page pool, see __page_pool_alloc_pages_slow(), >>>> and page->signature is checked directly with PP_SIGNATURE in >>>> page_pool_return_skb_page(), which might cause resoure leaking >>>> problem for a page from page pool if bit 1 of lru.next is set for >>>> a pfmemalloc page. >>>> >>>> As bit 0 is page->compound_head, So mask both bit 0 and 1 before >>>> the checking in page_pool_return_skb_page(). >>> >>> No, you don't understand. We *want* the check to fail if we were low >>> on memory so we return the emergency allocation. >> >> If the check failed, but the page pool assume the page is not from page >> pool and will not do the resource cleaning(like dma unmapping), as the >> page pool still use the page with pfmemalloc set and dma map the page >> if pp_flags & PP_FLAG_DMA_MAP is true in __page_pool_alloc_pages_slow(). >> >> The returning the emergency allocation you mentioned seems to be handled >> in __page_pool_put_page(), see: >> >> https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L411 >> >> We just use the page with pfmemalloc one time and do the resource cleaning >> before returning the page back to page allocator. Or did I miss something >> here? >> >>> . >>> > > I think you are right here. What happens is that the original > pp->signature is OR'ed after the allocation in order to preserve any > existing bits. When those are present though the if which will trigger the > recycling will fail and those DMA mapping will be left stale. > > If we mask the bits during the check (as your patch does), we'll end up not > recycling the page anyway since it has the pfmemalloc bit set. The page > pool recycle function will end up releasing the page and the DMA mappings right? Yes. The problem might be magnified when frag page in page pool is added, because page pool only hold one ref of the page, and page_pool_return_skb_page() might dec the page ref twice if the frag page has two users, supposing the above checking fail with the pfmemalloc page, leaving to the below log: [ 49.584990] BUG: Bad page state in process iperf pfn:20af242 [ 49.584992] page:(____ptrval____) refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20af242 > > Regards > /Ilias > . > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net] page_pool: mask the page->signature before the checking 2021-08-05 9:31 ` Yunsheng Lin @ 2021-08-05 14:47 ` Ilias Apalodimas 0 siblings, 0 replies; 6+ messages in thread From: Ilias Apalodimas @ 2021-08-05 14:47 UTC (permalink / raw) To: Yunsheng Lin Cc: Matthew Wilcox, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer, Matteo Croce, Alexander Duyck, Networking, open list, linuxarm, chenhao288 Right, mind sending a v2 with a comment explaining why we need to mask? Other than that Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> On Thu, 5 Aug 2021 at 12:31, Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2021/8/5 16:54, Ilias Apalodimas wrote: > > On Thu, Aug 05, 2021 at 10:14:39AM +0800, Yunsheng Lin wrote: > >> On 2021/8/5 9:50, Matthew Wilcox wrote: > >>> On Thu, Aug 05, 2021 at 09:06:57AM +0800, Yunsheng Lin wrote: > >>>> As mentioned in commit c07aea3ef4d4 ("mm: add a signature > >>>> in struct page"): > >>>> "The page->signature field is aliased to page->lru.next and > >>>> page->compound_head." > >>>> > >>>> And as the comment in page_is_pfmemalloc(): > >>>> "lru.next has bit 1 set if the page is allocated from the > >>>> pfmemalloc reserves. Callers may simply overwrite it if they > >>>> do not need to preserve that information." > >>>> > >>>> The page->signature is or???ed with PP_SIGNATURE when a page is > >>>> allocated in page pool, see __page_pool_alloc_pages_slow(), > >>>> and page->signature is checked directly with PP_SIGNATURE in > >>>> page_pool_return_skb_page(), which might cause resoure leaking > >>>> problem for a page from page pool if bit 1 of lru.next is set for > >>>> a pfmemalloc page. > >>>> > >>>> As bit 0 is page->compound_head, So mask both bit 0 and 1 before > >>>> the checking in page_pool_return_skb_page(). > >>> > >>> No, you don't understand. We *want* the check to fail if we were low > >>> on memory so we return the emergency allocation. > >> > >> If the check failed, but the page pool assume the page is not from page > >> pool and will not do the resource cleaning(like dma unmapping), as the > >> page pool still use the page with pfmemalloc set and dma map the page > >> if pp_flags & PP_FLAG_DMA_MAP is true in __page_pool_alloc_pages_slow(). > >> > >> The returning the emergency allocation you mentioned seems to be handled > >> in __page_pool_put_page(), see: > >> > >> https://elixir.bootlin.com/linux/latest/source/net/core/page_pool.c#L411 > >> > >> We just use the page with pfmemalloc one time and do the resource cleaning > >> before returning the page back to page allocator. Or did I miss something > >> here? > >> > >>> . > >>> > > > > I think you are right here. What happens is that the original > > pp->signature is OR'ed after the allocation in order to preserve any > > existing bits. When those are present though the if which will trigger the > > recycling will fail and those DMA mapping will be left stale. > > > > If we mask the bits during the check (as your patch does), we'll end up not > > recycling the page anyway since it has the pfmemalloc bit set. The page > > pool recycle function will end up releasing the page and the DMA mappings right? > > Yes. > The problem might be magnified when frag page in page pool is added, because > page pool only hold one ref of the page, and page_pool_return_skb_page() might > dec the page ref twice if the frag page has two users, supposing the above > checking fail with the pfmemalloc page, leaving to the below log: > > [ 49.584990] BUG: Bad page state in process iperf pfn:20af242 > [ 49.584992] page:(____ptrval____) refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20af242 > > > > > Regards > > /Ilias > > . > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-05 14:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-05 1:06 [PATCH net] page_pool: mask the page->signature before the checking Yunsheng Lin 2021-08-05 1:50 ` Matthew Wilcox 2021-08-05 2:14 ` Yunsheng Lin 2021-08-05 8:54 ` Ilias Apalodimas 2021-08-05 9:31 ` Yunsheng Lin 2021-08-05 14:47 ` Ilias Apalodimas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).