Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH net-next v2 0/4] add frag page support in page pool
@ 2021-08-06  2:46 Yunsheng Lin
  2021-08-06  2:46 ` [PATCH net-next v2 1/4] page_pool: keep pp info as long as page pool owns the page Yunsheng Lin
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Yunsheng Lin @ 2021-08-06  2:46 UTC (permalink / raw)
  To: davem, kuba
  Cc: alexander.duyck, linux, mw, linuxarm, yisen.zhuang, salil.mehta,
	thomas.petazzoni, hawk, ilias.apalodimas, ast, daniel,
	john.fastabend, akpm, peterz, will, willy, vbabka, fenghua.yu,
	guro, peterx, feng.tang, jgg, mcroce, hughd, jonathan.lemon,
	alobakin, willemb, wenxu, cong.wang, haokexin, nogikh, elver,
	yhs, kpsingh, andrii, kafai, songliubraving, netdev,
	linux-kernel, bpf, chenhao288

This patchset adds frag page support in page pool and
enable skb's page frag recycling based on page pool in
hns3 drvier.

V2:
1. resend based on the latest net-next.

V1:
1. avoid atomic_long_read() in case of freeing or draining
   page frag, and drop RFC tag.

RFC v6:
1. Disable frag page support in system 32-bit arch and
   64-bit DMA.

RFC v5:
1. Rename dma_addr[0] to pp_frag_count and adjust codes
   according to the rename.

RFC v4:
1. Use the dma_addr[1] to store bias.
2. Default to a pagecnt_bias of PAGE_SIZE - 1.
3. other minor comment suggested by Alexander.

RFC v3:
1. Implement the semantic of "page recycling only wait for the
   page pool user instead of all user of a page"
2. Support the frag allocation of different sizes
3. Merge patch 4 & 5 to one patch as it does not make sense to
   use page_pool_dev_alloc_pages() API directly with elevated
   refcnt.
4. other minor comment suggested by Alexander.

RFC v2:
1. Split patch 1 to more reviewable one.
2. Repurpose the lower 12 bits of the dma address to store the
   pagecnt_bias as suggested by Alexander.
3. support recycling to pool->alloc for elevated refcnt case
   too.

Yunsheng Lin (4):
  page_pool: keep pp info as long as page pool owns the page
  page_pool: add interface to manipulate frag count in page pool
  page_pool: add frag page recycling support in page pool
  net: hns3: support skb's frag page recycling based on page pool

 drivers/net/ethernet/hisilicon/Kconfig          |   1 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c |  79 +++++++++++++++--
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h |   3 +
 drivers/net/ethernet/marvell/mvneta.c           |   6 +-
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c |   2 +-
 drivers/net/ethernet/ti/cpsw.c                  |   2 +-
 drivers/net/ethernet/ti/cpsw_new.c              |   2 +-
 include/linux/mm_types.h                        |  18 ++--
 include/linux/skbuff.h                          |   4 +-
 include/net/page_pool.h                         |  68 +++++++++++---
 net/core/page_pool.c                            | 112 +++++++++++++++++++++++-
 11 files changed, 258 insertions(+), 39 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 1/4] page_pool: keep pp info as long as page pool owns the page
  2021-08-06  2:46 [PATCH net-next v2 0/4] add frag page support in page pool Yunsheng Lin
@ 2021-08-06  2:46 ` Yunsheng Lin
  2021-08-06  2:46 ` [PATCH net-next v2 2/4] page_pool: add interface to manipulate frag count in page pool Yunsheng Lin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Yunsheng Lin @ 2021-08-06  2:46 UTC (permalink / raw)
  To: davem, kuba
  Cc: alexander.duyck, linux, mw, linuxarm, yisen.zhuang, salil.mehta,
	thomas.petazzoni, hawk, ilias.apalodimas, ast, daniel,
	john.fastabend, akpm, peterz, will, willy, vbabka, fenghua.yu,
	guro, peterx, feng.tang, jgg, mcroce, hughd, jonathan.lemon,
	alobakin, willemb, wenxu, cong.wang, haokexin, nogikh, elver,
	yhs, kpsingh, andrii, kafai, songliubraving, netdev,
	linux-kernel, bpf, chenhao288

Currently, page->pp is cleared and set everytime the page
is recycled, which is unnecessary.

So only set the page->pp when the page is added to the page
pool and only clear it when the page is released from the
page pool.

This is also a preparation to support allocating frag page
in page pool.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 drivers/net/ethernet/marvell/mvneta.c           |  6 +-----
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c |  2 +-
 drivers/net/ethernet/ti/cpsw.c                  |  2 +-
 drivers/net/ethernet/ti/cpsw_new.c              |  2 +-
 include/linux/skbuff.h                          |  4 +---
 include/net/page_pool.h                         |  7 -------
 net/core/page_pool.c                            | 21 +++++++++++++++++----
 7 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ff8db31..5d1007e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2327,7 +2327,7 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
-	skb_mark_for_recycle(skb, virt_to_page(xdp->data), pool);
+	skb_mark_for_recycle(skb);
 
 	skb_reserve(skb, xdp->data - xdp->data_hard_start);
 	skb_put(skb, xdp->data_end - xdp->data);
@@ -2339,10 +2339,6 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
 				skb_frag_page(frag), skb_frag_off(frag),
 				skb_frag_size(frag), PAGE_SIZE);
-		/* We don't need to reset pp_recycle here. It's already set, so
-		 * just mark fragments for recycling.
-		 */
-		page_pool_store_mem_info(skb_frag_page(frag), pool);
 	}
 
 	return skb;
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 99bd8b8..744f58f 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -3995,7 +3995,7 @@ static int mvpp2_rx(struct mvpp2_port *port, struct napi_struct *napi,
 		}
 
 		if (pp)
-			skb_mark_for_recycle(skb, page, pp);
+			skb_mark_for_recycle(skb);
 		else
 			dma_unmap_single_attrs(dev->dev.parent, dma_addr,
 					       bm_pool->buf_size, DMA_FROM_DEVICE,
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index abf9a2a..c451eaa 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -431,7 +431,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 	skb->protocol = eth_type_trans(skb, ndev);
 
 	/* mark skb for recycling */
-	skb_mark_for_recycle(skb, page, pool);
+	skb_mark_for_recycle(skb);
 	netif_receive_skb(skb);
 
 	ndev->stats.rx_bytes += len;
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index ae16722..d197623 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -375,7 +375,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 	skb->protocol = eth_type_trans(skb, ndev);
 
 	/* mark skb for recycling */
-	skb_mark_for_recycle(skb, page, pool);
+	skb_mark_for_recycle(skb);
 	netif_receive_skb(skb);
 
 	ndev->stats.rx_bytes += len;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 783cc23..6bdb0db 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4712,11 +4712,9 @@ static inline u64 skb_get_kcov_handle(struct sk_buff *skb)
 }
 
 #ifdef CONFIG_PAGE_POOL
-static inline void skb_mark_for_recycle(struct sk_buff *skb, struct page *page,
-					struct page_pool *pp)
+static inline void skb_mark_for_recycle(struct sk_buff *skb)
 {
 	skb->pp_recycle = 1;
-	page_pool_store_mem_info(page, pp);
 }
 #endif
 
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 3dd62dd..8d7744d 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -253,11 +253,4 @@ static inline void page_pool_ring_unlock(struct page_pool *pool)
 		spin_unlock_bh(&pool->ring.producer_lock);
 }
 
-/* Store mem_info on struct page and use it while recycling skb frags */
-static inline
-void page_pool_store_mem_info(struct page *page, struct page_pool *pp)
-{
-	page->pp = pp;
-}
-
 #endif /* _NET_PAGE_POOL_H */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 5e4eb45..78838c6 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -206,6 +206,19 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
 	return true;
 }
 
+static void page_pool_set_pp_info(struct page_pool *pool,
+				  struct page *page)
+{
+	page->pp = pool;
+	page->pp_magic |= PP_SIGNATURE;
+}
+
+static void page_pool_clear_pp_info(struct page *page)
+{
+	page->pp_magic = 0;
+	page->pp = NULL;
+}
+
 static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
 						 gfp_t gfp)
 {
@@ -222,7 +235,7 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
 		return NULL;
 	}
 
-	page->pp_magic |= PP_SIGNATURE;
+	page_pool_set_pp_info(pool, page);
 
 	/* Track how many pages are held 'in-flight' */
 	pool->pages_state_hold_cnt++;
@@ -266,7 +279,8 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 			put_page(page);
 			continue;
 		}
-		page->pp_magic |= PP_SIGNATURE;
+
+		page_pool_set_pp_info(pool, page);
 		pool->alloc.cache[pool->alloc.count++] = page;
 		/* Track how many pages are held 'in-flight' */
 		pool->pages_state_hold_cnt++;
@@ -345,7 +359,7 @@ void page_pool_release_page(struct page_pool *pool, struct page *page)
 			     DMA_ATTR_SKIP_CPU_SYNC);
 	page_pool_set_dma_addr(page, 0);
 skip_dma_unmap:
-	page->pp_magic = 0;
+	page_pool_clear_pp_info(page);
 
 	/* This may be the last page returned, releasing the pool, so
 	 * it is not safe to reference pool afterwards.
@@ -644,7 +658,6 @@ bool page_pool_return_skb_page(struct page *page)
 	 * The page will be returned to the pool here regardless of the
 	 * 'flipped' fragment being in use or not.
 	 */
-	page->pp = NULL;
 	page_pool_put_full_page(pp, page, false);
 
 	return true;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 2/4] page_pool: add interface to manipulate frag count in page pool
  2021-08-06  2:46 [PATCH net-next v2 0/4] add frag page support in page pool Yunsheng Lin
  2021-08-06  2:46 ` [PATCH net-next v2 1/4] page_pool: keep pp info as long as page pool owns the page Yunsheng Lin
@ 2021-08-06  2:46 ` Yunsheng Lin
  2021-08-10 14:58   ` Jesper Dangaard Brouer
  2021-08-12 15:17   ` Jesper Dangaard Brouer
  2021-08-06  2:46 ` [PATCH net-next v2 3/4] page_pool: add frag page recycling support " Yunsheng Lin
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Yunsheng Lin @ 2021-08-06  2:46 UTC (permalink / raw)
  To: davem, kuba
  Cc: alexander.duyck, linux, mw, linuxarm, yisen.zhuang, salil.mehta,
	thomas.petazzoni, hawk, ilias.apalodimas, ast, daniel,
	john.fastabend, akpm, peterz, will, willy, vbabka, fenghua.yu,
	guro, peterx, feng.tang, jgg, mcroce, hughd, jonathan.lemon,
	alobakin, willemb, wenxu, cong.wang, haokexin, nogikh, elver,
	yhs, kpsingh, andrii, kafai, songliubraving, netdev,
	linux-kernel, bpf, chenhao288

For 32 bit systems with 64 bit dma, dma_addr[1] is used to
store the upper 32 bit dma addr, those system should be rare
those days.

For normal system, the dma_addr[1] in 'struct page' is not
used, so we can reuse dma_addr[1] for storing frag count,
which means how many frags this page might be splited to.

In order to simplify the page frag support in the page pool,
the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate
the 32 bit systems with 64 bit dma, and the page frag support
in page pool is disabled for such system.

The newly added page_pool_set_frag_count() is called to reserve
the maximum frag count before any page frag is passed to the
user. The page_pool_atomic_sub_frag_count_return() is called
when user is done with the page frag.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/linux/mm_types.h | 18 +++++++++++++-----
 include/net/page_pool.h  | 46 +++++++++++++++++++++++++++++++++++++++-------
 net/core/page_pool.c     |  4 ++++
 3 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 52bbd2b..7f8ee09 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -103,11 +103,19 @@ struct page {
 			unsigned long pp_magic;
 			struct page_pool *pp;
 			unsigned long _pp_mapping_pad;
-			/**
-			 * @dma_addr: might require a 64-bit value on
-			 * 32-bit architectures.
-			 */
-			unsigned long dma_addr[2];
+			unsigned long dma_addr;
+			union {
+				/**
+				 * dma_addr_upper: might require a 64-bit
+				 * value on 32-bit architectures.
+				 */
+				unsigned long dma_addr_upper;
+				/**
+				 * For frag page support, not supported in
+				 * 32-bit architectures with 64-bit DMA.
+				 */
+				atomic_long_t pp_frag_count;
+			};
 		};
 		struct {	/* slab, slob and slub */
 			union {
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 8d7744d..42e6997 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -45,7 +45,10 @@
 					* Please note DMA-sync-for-CPU is still
 					* device driver responsibility
 					*/
-#define PP_FLAG_ALL		(PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)
+#define PP_FLAG_PAGE_FRAG	BIT(2) /* for page frag feature */
+#define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
+				 PP_FLAG_DMA_SYNC_DEV |\
+				 PP_FLAG_PAGE_FRAG)
 
 /*
  * Fast allocation side cache array/stack
@@ -198,19 +201,48 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
+#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
+		(sizeof(dma_addr_t) > sizeof(unsigned long))
+
 static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
-	dma_addr_t ret = page->dma_addr[0];
-	if (sizeof(dma_addr_t) > sizeof(unsigned long))
-		ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;
+	dma_addr_t ret = page->dma_addr;
+
+	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
+		ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
+
 	return ret;
 }
 
 static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
 {
-	page->dma_addr[0] = addr;
-	if (sizeof(dma_addr_t) > sizeof(unsigned long))
-		page->dma_addr[1] = upper_32_bits(addr);
+	page->dma_addr = addr;
+	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
+		page->dma_addr_upper = upper_32_bits(addr);
+}
+
+static inline void page_pool_set_frag_count(struct page *page, long nr)
+{
+	atomic_long_set(&page->pp_frag_count, nr);
+}
+
+static inline long page_pool_atomic_sub_frag_count_return(struct page *page,
+							  long nr)
+{
+	long ret;
+
+	/* As suggested by Alexander, atomic_long_read() may cover up the
+	 * reference count errors, so avoid calling atomic_long_read() in
+	 * the cases of freeing or draining the page_frags, where we would
+	 * not expect it to match or that are slowpath anyway.
+	 */
+	if (__builtin_constant_p(nr) &&
+	    atomic_long_read(&page->pp_frag_count) == nr)
+		return 0;
+
+	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
+	WARN_ON(ret < 0);
+	return ret;
 }
 
 static inline bool is_page_pool_compiled_in(void)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 78838c6..68fab94 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -67,6 +67,10 @@ static int page_pool_init(struct page_pool *pool,
 		 */
 	}
 
+	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
+	    pool->p.flags & PP_FLAG_PAGE_FRAG)
+		return -EINVAL;
+
 	if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0)
 		return -ENOMEM;
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 3/4] page_pool: add frag page recycling support in page pool
  2021-08-06  2:46 [PATCH net-next v2 0/4] add frag page support in page pool Yunsheng Lin
  2021-08-06  2:46 ` [PATCH net-next v2 1/4] page_pool: keep pp info as long as page pool owns the page Yunsheng Lin
  2021-08-06  2:46 ` [PATCH net-next v2 2/4] page_pool: add interface to manipulate frag count in page pool Yunsheng Lin
@ 2021-08-06  2:46 ` Yunsheng Lin
  2021-08-06  2:46 ` [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on " Yunsheng Lin
  2021-08-10 14:01 ` [PATCH net-next v2 0/4] add frag page support in " Jakub Kicinski
  4 siblings, 0 replies; 19+ messages in thread
From: Yunsheng Lin @ 2021-08-06  2:46 UTC (permalink / raw)
  To: davem, kuba
  Cc: alexander.duyck, linux, mw, linuxarm, yisen.zhuang, salil.mehta,
	thomas.petazzoni, hawk, ilias.apalodimas, ast, daniel,
	john.fastabend, akpm, peterz, will, willy, vbabka, fenghua.yu,
	guro, peterx, feng.tang, jgg, mcroce, hughd, jonathan.lemon,
	alobakin, willemb, wenxu, cong.wang, haokexin, nogikh, elver,
	yhs, kpsingh, andrii, kafai, songliubraving, netdev,
	linux-kernel, bpf, chenhao288

Currently page pool only support page recycling when there
is only one user of the page, and the split page reusing
implemented in the most driver can not use the page pool as
bing-pong way of reusing requires the multi user support in
page pool.

Those reusing or recycling has below limitations:
1. page from page pool can only be used be one user in order
   for the page recycling to happen.
2. Bing-pong way of reusing in most driver does not support
   multi desc using different part of the same page in order
   to save memory.

So add multi-users support and frag page recycling in page
pool to overcome the above limitation.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/net/page_pool.h | 15 +++++++++
 net/core/page_pool.c    | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 42e6997..a408240 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -91,6 +91,9 @@ struct page_pool {
 	unsigned long defer_warn;
 
 	u32 pages_state_hold_cnt;
+	unsigned int frag_offset;
+	struct page *frag_page;
+	long frag_users;
 
 	/*
 	 * Data structure for allocation side
@@ -140,6 +143,18 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
 	return page_pool_alloc_pages(pool, gfp);
 }
 
+struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset,
+				  unsigned int size, gfp_t gfp);
+
+static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
+						    unsigned int *offset,
+						    unsigned int size)
+{
+	gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
+
+	return page_pool_alloc_frag(pool, offset, size, gfp);
+}
+
 /* get the stored dma direction. A driver might decide to treat this locally and
  * avoid the extra cache line from page_pool to determine the direction
  */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 68fab94..ac11604 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -24,6 +24,8 @@
 #define DEFER_TIME (msecs_to_jiffies(1000))
 #define DEFER_WARN_INTERVAL (60 * HZ)
 
+#define BIAS_MAX	LONG_MAX
+
 static int page_pool_init(struct page_pool *pool,
 			  const struct page_pool_params *params)
 {
@@ -423,6 +425,11 @@ static __always_inline struct page *
 __page_pool_put_page(struct page_pool *pool, struct page *page,
 		     unsigned int dma_sync_size, bool allow_direct)
 {
+	/* It is not the last user for the page frag case */
+	if (pool->p.flags & PP_FLAG_PAGE_FRAG &&
+	    page_pool_atomic_sub_frag_count_return(page, 1))
+		return NULL;
+
 	/* This allocator is optimized for the XDP mode that uses
 	 * one-frame-per-page, but have fallbacks that act like the
 	 * regular page allocator APIs.
@@ -515,6 +522,84 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 }
 EXPORT_SYMBOL(page_pool_put_page_bulk);
 
+static struct page *page_pool_drain_frag(struct page_pool *pool,
+					 struct page *page)
+{
+	long drain_count = BIAS_MAX - pool->frag_users;
+
+	/* Some user is still using the page frag */
+	if (likely(page_pool_atomic_sub_frag_count_return(page,
+							  drain_count)))
+		return NULL;
+
+	if (page_ref_count(page) == 1 && !page_is_pfmemalloc(page)) {
+		if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
+			page_pool_dma_sync_for_device(pool, page, -1);
+
+		return page;
+	}
+
+	page_pool_return_page(pool, page);
+	return NULL;
+}
+
+static void page_pool_free_frag(struct page_pool *pool)
+{
+	long drain_count = BIAS_MAX - pool->frag_users;
+	struct page *page = pool->frag_page;
+
+	pool->frag_page = NULL;
+
+	if (!page ||
+	    page_pool_atomic_sub_frag_count_return(page, drain_count))
+		return;
+
+	page_pool_return_page(pool, page);
+}
+
+struct page *page_pool_alloc_frag(struct page_pool *pool,
+				  unsigned int *offset,
+				  unsigned int size, gfp_t gfp)
+{
+	unsigned int max_size = PAGE_SIZE << pool->p.order;
+	struct page *page = pool->frag_page;
+
+	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
+		    size > max_size))
+		return NULL;
+
+	size = ALIGN(size, dma_get_cache_alignment());
+	*offset = pool->frag_offset;
+
+	if (page && *offset + size > max_size) {
+		page = page_pool_drain_frag(pool, page);
+		if (page)
+			goto frag_reset;
+	}
+
+	if (!page) {
+		page = page_pool_alloc_pages(pool, gfp);
+		if (unlikely(!page)) {
+			pool->frag_page = NULL;
+			return NULL;
+		}
+
+		pool->frag_page = page;
+
+frag_reset:
+		pool->frag_users = 1;
+		*offset = 0;
+		pool->frag_offset = size;
+		page_pool_set_frag_count(page, BIAS_MAX);
+		return page;
+	}
+
+	pool->frag_users++;
+	pool->frag_offset = *offset + size;
+	return page;
+}
+EXPORT_SYMBOL(page_pool_alloc_frag);
+
 static void page_pool_empty_ring(struct page_pool *pool)
 {
 	struct page *page;
@@ -620,6 +705,8 @@ void page_pool_destroy(struct page_pool *pool)
 	if (!page_pool_put(pool))
 		return;
 
+	page_pool_free_frag(pool);
+
 	if (!page_pool_release(pool))
 		return;
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on page pool
  2021-08-06  2:46 [PATCH net-next v2 0/4] add frag page support in page pool Yunsheng Lin
                   ` (2 preceding siblings ...)
  2021-08-06  2:46 ` [PATCH net-next v2 3/4] page_pool: add frag page recycling support " Yunsheng Lin
@ 2021-08-06  2:46 ` Yunsheng Lin
  2021-09-08  8:31   ` moyufeng
  2021-08-10 14:01 ` [PATCH net-next v2 0/4] add frag page support in " Jakub Kicinski
  4 siblings, 1 reply; 19+ messages in thread
From: Yunsheng Lin @ 2021-08-06  2:46 UTC (permalink / raw)
  To: davem, kuba
  Cc: alexander.duyck, linux, mw, linuxarm, yisen.zhuang, salil.mehta,
	thomas.petazzoni, hawk, ilias.apalodimas, ast, daniel,
	john.fastabend, akpm, peterz, will, willy, vbabka, fenghua.yu,
	guro, peterx, feng.tang, jgg, mcroce, hughd, jonathan.lemon,
	alobakin, willemb, wenxu, cong.wang, haokexin, nogikh, elver,
	yhs, kpsingh, andrii, kafai, songliubraving, netdev,
	linux-kernel, bpf, chenhao288

This patch adds skb's frag page recycling support based on
the frag page support in page pool.

The performance improves above 10~20% for single thread iperf
TCP flow with IOMMU disabled when iperf server and irq/NAPI
have a different CPU.

The performance improves about 135%(14Gbit to 33Gbit) for single
thread iperf TCP flow when IOMMU is in strict mode and iperf
server shares the same cpu with irq/NAPI.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 drivers/net/ethernet/hisilicon/Kconfig          |  1 +
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 79 +++++++++++++++++++++++--
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h |  3 +
 3 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
index 094e4a3..2ba0e7b 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -91,6 +91,7 @@ config HNS3
 	tristate "Hisilicon Network Subsystem Support HNS3 (Framework)"
 	depends on PCI
 	select NET_DEVLINK
+	select PAGE_POOL
 	help
 	  This selects the framework support for Hisilicon Network Subsystem 3.
 	  This layer facilitates clients like ENET, RoCE and user-space ethernet
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index cb8d5da..fcbeb1f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -3205,6 +3205,21 @@ static int hns3_alloc_buffer(struct hns3_enet_ring *ring,
 	unsigned int order = hns3_page_order(ring);
 	struct page *p;
 
+	if (ring->page_pool) {
+		p = page_pool_dev_alloc_frag(ring->page_pool,
+					     &cb->page_offset,
+					     hns3_buf_size(ring));
+		if (unlikely(!p))
+			return -ENOMEM;
+
+		cb->priv = p;
+		cb->buf = page_address(p);
+		cb->dma = page_pool_get_dma_addr(p);
+		cb->type = DESC_TYPE_PP_FRAG;
+		cb->reuse_flag = 0;
+		return 0;
+	}
+
 	p = dev_alloc_pages(order);
 	if (!p)
 		return -ENOMEM;
@@ -3227,8 +3242,13 @@ static void hns3_free_buffer(struct hns3_enet_ring *ring,
 	if (cb->type & (DESC_TYPE_SKB | DESC_TYPE_BOUNCE_HEAD |
 			DESC_TYPE_BOUNCE_ALL | DESC_TYPE_SGL_SKB))
 		napi_consume_skb(cb->priv, budget);
-	else if (!HNAE3_IS_TX_RING(ring) && cb->pagecnt_bias)
-		__page_frag_cache_drain(cb->priv, cb->pagecnt_bias);
+	else if (!HNAE3_IS_TX_RING(ring)) {
+		if (cb->type & DESC_TYPE_PAGE && cb->pagecnt_bias)
+			__page_frag_cache_drain(cb->priv, cb->pagecnt_bias);
+		else if (cb->type & DESC_TYPE_PP_FRAG)
+			page_pool_put_full_page(ring->page_pool, cb->priv,
+						false);
+	}
 	memset(cb, 0, sizeof(*cb));
 }
 
@@ -3315,7 +3335,7 @@ static int hns3_alloc_and_map_buffer(struct hns3_enet_ring *ring,
 	int ret;
 
 	ret = hns3_alloc_buffer(ring, cb);
-	if (ret)
+	if (ret || ring->page_pool)
 		goto out;
 
 	ret = hns3_map_buffer(ring, cb);
@@ -3337,7 +3357,8 @@ static int hns3_alloc_and_attach_buffer(struct hns3_enet_ring *ring, int i)
 	if (ret)
 		return ret;
 
-	ring->desc[i].addr = cpu_to_le64(ring->desc_cb[i].dma);
+	ring->desc[i].addr = cpu_to_le64(ring->desc_cb[i].dma +
+					 ring->desc_cb[i].page_offset);
 
 	return 0;
 }
@@ -3367,7 +3388,8 @@ static void hns3_replace_buffer(struct hns3_enet_ring *ring, int i,
 {
 	hns3_unmap_buffer(ring, &ring->desc_cb[i]);
 	ring->desc_cb[i] = *res_cb;
-	ring->desc[i].addr = cpu_to_le64(ring->desc_cb[i].dma);
+	ring->desc[i].addr = cpu_to_le64(ring->desc_cb[i].dma +
+					 ring->desc_cb[i].page_offset);
 	ring->desc[i].rx.bd_base_info = 0;
 }
 
@@ -3539,6 +3561,12 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i,
 	u32 frag_size = size - pull_len;
 	bool reused;
 
+	if (ring->page_pool) {
+		skb_add_rx_frag(skb, i, desc_cb->priv, frag_offset,
+				frag_size, truesize);
+		return;
+	}
+
 	/* Avoid re-using remote or pfmem page */
 	if (unlikely(!dev_page_is_reusable(desc_cb->priv)))
 		goto out;
@@ -3856,6 +3884,9 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length,
 		/* We can reuse buffer as-is, just make sure it is reusable */
 		if (dev_page_is_reusable(desc_cb->priv))
 			desc_cb->reuse_flag = 1;
+		else if (desc_cb->type & DESC_TYPE_PP_FRAG)
+			page_pool_put_full_page(ring->page_pool, desc_cb->priv,
+						false);
 		else /* This page cannot be reused so discard it */
 			__page_frag_cache_drain(desc_cb->priv,
 						desc_cb->pagecnt_bias);
@@ -3863,6 +3894,10 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length,
 		hns3_rx_ring_move_fw(ring);
 		return 0;
 	}
+
+	if (ring->page_pool)
+		skb_mark_for_recycle(skb);
+
 	u64_stats_update_begin(&ring->syncp);
 	ring->stats.seg_pkt_cnt++;
 	u64_stats_update_end(&ring->syncp);
@@ -3901,6 +3936,10 @@ static int hns3_add_frag(struct hns3_enet_ring *ring)
 					    "alloc rx fraglist skb fail\n");
 				return -ENXIO;
 			}
+
+			if (ring->page_pool)
+				skb_mark_for_recycle(new_skb);
+
 			ring->frag_num = 0;
 
 			if (ring->tail_skb) {
@@ -4705,6 +4744,29 @@ static void hns3_put_ring_config(struct hns3_nic_priv *priv)
 	priv->ring = NULL;
 }
 
+static void hns3_alloc_page_pool(struct hns3_enet_ring *ring)
+{
+	struct page_pool_params pp_params = {
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_PAGE_FRAG |
+				PP_FLAG_DMA_SYNC_DEV,
+		.order = hns3_page_order(ring),
+		.pool_size = ring->desc_num * hns3_buf_size(ring) /
+				(PAGE_SIZE << hns3_page_order(ring)),
+		.nid = dev_to_node(ring_to_dev(ring)),
+		.dev = ring_to_dev(ring),
+		.dma_dir = DMA_FROM_DEVICE,
+		.offset = 0,
+		.max_len = PAGE_SIZE << hns3_page_order(ring),
+	};
+
+	ring->page_pool = page_pool_create(&pp_params);
+	if (IS_ERR(ring->page_pool)) {
+		dev_warn(ring_to_dev(ring), "page pool creation failed: %ld\n",
+			 PTR_ERR(ring->page_pool));
+		ring->page_pool = NULL;
+	}
+}
+
 static int hns3_alloc_ring_memory(struct hns3_enet_ring *ring)
 {
 	int ret;
@@ -4724,6 +4786,8 @@ static int hns3_alloc_ring_memory(struct hns3_enet_ring *ring)
 		goto out_with_desc_cb;
 
 	if (!HNAE3_IS_TX_RING(ring)) {
+		hns3_alloc_page_pool(ring);
+
 		ret = hns3_alloc_ring_buffers(ring);
 		if (ret)
 			goto out_with_desc;
@@ -4764,6 +4828,11 @@ void hns3_fini_ring(struct hns3_enet_ring *ring)
 		devm_kfree(ring_to_dev(ring), tx_spare);
 		ring->tx_spare = NULL;
 	}
+
+	if (!HNAE3_IS_TX_RING(ring) && ring->page_pool) {
+		page_pool_destroy(ring->page_pool);
+		ring->page_pool = NULL;
+	}
 }
 
 static int hns3_buf_size2type(u32 buf_size)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index 15af3d9..27809d6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -6,6 +6,7 @@
 
 #include <linux/dim.h>
 #include <linux/if_vlan.h>
+#include <net/page_pool.h>
 
 #include "hnae3.h"
 
@@ -307,6 +308,7 @@ enum hns3_desc_type {
 	DESC_TYPE_BOUNCE_ALL		= 1 << 3,
 	DESC_TYPE_BOUNCE_HEAD		= 1 << 4,
 	DESC_TYPE_SGL_SKB		= 1 << 5,
+	DESC_TYPE_PP_FRAG		= 1 << 6,
 };
 
 struct hns3_desc_cb {
@@ -451,6 +453,7 @@ struct hns3_enet_ring {
 	struct hnae3_queue *tqp;
 	int queue_index;
 	struct device *dev; /* will be used for DMA mapping of descriptors */
+	struct page_pool *page_pool;
 
 	/* statistic */
 	struct ring_stats stats;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 0/4] add frag page support in page pool
  2021-08-06  2:46 [PATCH net-next v2 0/4] add frag page support in page pool Yunsheng Lin
                   ` (3 preceding siblings ...)
  2021-08-06  2:46 ` [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on " Yunsheng Lin
@ 2021-08-10 14:01 ` Jakub Kicinski
  2021-08-10 14:23   ` Jesper Dangaard Brouer
  4 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-08-10 14:01 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, alexander.duyck, linux, mw, linuxarm, yisen.zhuang,
	salil.mehta, thomas.petazzoni, hawk, ilias.apalodimas, ast,
	daniel, john.fastabend, akpm, peterz, will, willy, vbabka,
	fenghua.yu, guro, peterx, feng.tang, jgg, mcroce, hughd,
	jonathan.lemon, alobakin, willemb, wenxu, cong.wang, haokexin,
	nogikh, elver, yhs, kpsingh, andrii, kafai, songliubraving,
	netdev, linux-kernel, bpf, chenhao288

On Fri, 6 Aug 2021 10:46:18 +0800 Yunsheng Lin wrote:
> enable skb's page frag recycling based on page pool in
> hns3 drvier.

Applied, thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 0/4] add frag page support in page pool
  2021-08-10 14:01 ` [PATCH net-next v2 0/4] add frag page support in " Jakub Kicinski
@ 2021-08-10 14:23   ` Jesper Dangaard Brouer
  2021-08-10 14:43     ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2021-08-10 14:23 UTC (permalink / raw)
  To: Jakub Kicinski, Yunsheng Lin
  Cc: brouer, davem, alexander.duyck, linux, mw, linuxarm,
	yisen.zhuang, salil.mehta, thomas.petazzoni, hawk,
	ilias.apalodimas, ast, daniel, john.fastabend, akpm, peterz,
	will, willy, vbabka, fenghua.yu, guro, peterx, feng.tang, jgg,
	mcroce, hughd, jonathan.lemon, alobakin, willemb, wenxu,
	cong.wang, haokexin, nogikh, elver, yhs, kpsingh, andrii, kafai,
	songliubraving, netdev, linux-kernel, bpf, chenhao288, Linux-MM



On 10/08/2021 16.01, Jakub Kicinski wrote:
> On Fri, 6 Aug 2021 10:46:18 +0800 Yunsheng Lin wrote:
>> enable skb's page frag recycling based on page pool in
>> hns3 drvier.
> 
> Applied, thanks!

I had hoped to see more acks / reviewed-by before this got applied.
E.g. from MM-people as this patchset changes struct page and page_pool 
(that I'm marked as maintainer of).  And I would have appreciated an 
reviewed-by credit to/from Alexander as he did a lot of work in the RFC 
patchset for the split-page tricks.

p.s. I just returned from vacation today, and have not had time to 
review, sorry.

--Jesper

(relevant struct page changes for MM-people to review)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 52bbd2b..7f8ee09 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -103,11 +103,19 @@ struct page {
  			unsigned long pp_magic;
  			struct page_pool *pp;
  			unsigned long _pp_mapping_pad;
-			/**
-			 * @dma_addr: might require a 64-bit value on
-			 * 32-bit architectures.
-			 */
-			unsigned long dma_addr[2];
+			unsigned long dma_addr;
+			union {
+				/**
+				 * dma_addr_upper: might require a 64-bit
+				 * value on 32-bit architectures.
+				 */
+				unsigned long dma_addr_upper;
+				/**
+				 * For frag page support, not supported in
+				 * 32-bit architectures with 64-bit DMA.
+				 */
+				atomic_long_t pp_frag_count;
+			};
  		};
  		struct {	/* slab, slob and slub */


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 0/4] add frag page support in page pool
  2021-08-10 14:23   ` Jesper Dangaard Brouer
@ 2021-08-10 14:43     ` Jakub Kicinski
  2021-08-10 15:09       ` Alexander Duyck
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-08-10 14:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Yunsheng Lin, brouer, davem, alexander.duyck, linux, mw,
	linuxarm, yisen.zhuang, salil.mehta, thomas.petazzoni, hawk,
	ilias.apalodimas, ast, daniel, john.fastabend, akpm, peterz,
	will, willy, vbabka, fenghua.yu, guro, peterx, feng.tang, jgg,
	mcroce, hughd, jonathan.lemon, alobakin, willemb, wenxu,
	cong.wang, haokexin, nogikh, elver, yhs, kpsingh, andrii, kafai,
	songliubraving, netdev, linux-kernel, bpf, chenhao288, Linux-MM

On Tue, 10 Aug 2021 16:23:52 +0200 Jesper Dangaard Brouer wrote:
> On 10/08/2021 16.01, Jakub Kicinski wrote:
> > On Fri, 6 Aug 2021 10:46:18 +0800 Yunsheng Lin wrote:  
> >> enable skb's page frag recycling based on page pool in
> >> hns3 drvier.  
> > 
> > Applied, thanks!  
> 
> I had hoped to see more acks / reviewed-by before this got applied.
> E.g. from MM-people as this patchset changes struct page and page_pool 
> (that I'm marked as maintainer of). 

Sorry, it was on the list for days and there were 7 or so prior
versions, I thought it was ripe. If possible, a note that review 
will come would be useful.

> And I would have appreciated an reviewed-by credit to/from Alexander
> as he did a lot of work in the RFC patchset for the split-page tricks.

I asked him off-list, he said something I interpreted as "code is okay,
but the review tag is not coming".

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 2/4] page_pool: add interface to manipulate frag count in page pool
  2021-08-06  2:46 ` [PATCH net-next v2 2/4] page_pool: add interface to manipulate frag count in page pool Yunsheng Lin
@ 2021-08-10 14:58   ` Jesper Dangaard Brouer
  2021-08-11  0:48     ` Yunsheng Lin
  2021-08-12 15:17   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2021-08-10 14:58 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: brouer, alexander.duyck, linux, mw, linuxarm, yisen.zhuang,
	salil.mehta, thomas.petazzoni, hawk, ilias.apalodimas, ast,
	daniel, john.fastabend, akpm, peterz, will, willy, vbabka,
	fenghua.yu, guro, peterx, feng.tang, jgg, mcroce, hughd,
	jonathan.lemon, alobakin, willemb, wenxu, cong.wang, haokexin,
	nogikh, elver, yhs, kpsingh, andrii, kafai, songliubraving,
	netdev, linux-kernel, bpf, chenhao288, Linux-MM



On 06/08/2021 04.46, Yunsheng Lin wrote:
> For 32 bit systems with 64 bit dma, dma_addr[1] is used to
> store the upper 32 bit dma addr, those system should be rare
> those days.
> 
> For normal system, the dma_addr[1] in 'struct page' is not
> used, so we can reuse dma_addr[1] for storing frag count,
> which means how many frags this page might be splited to.
> 
> In order to simplify the page frag support in the page pool,
> the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate
> the 32 bit systems with 64 bit dma, and the page frag support
> in page pool is disabled for such system.
> 
> The newly added page_pool_set_frag_count() is called to reserve
> the maximum frag count before any page frag is passed to the
> user. The page_pool_atomic_sub_frag_count_return() is called
> when user is done with the page frag.
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>   include/linux/mm_types.h | 18 +++++++++++++-----
>   include/net/page_pool.h  | 46 +++++++++++++++++++++++++++++++++++++++-------
>   net/core/page_pool.c     |  4 ++++
>   3 files changed, 56 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 52bbd2b..7f8ee09 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -103,11 +103,19 @@ struct page {
>   			unsigned long pp_magic;
>   			struct page_pool *pp;
>   			unsigned long _pp_mapping_pad;
> -			/**
> -			 * @dma_addr: might require a 64-bit value on
> -			 * 32-bit architectures.
> -			 */
> -			unsigned long dma_addr[2];
> +			unsigned long dma_addr;
> +			union {
> +				/**
> +				 * dma_addr_upper: might require a 64-bit
> +				 * value on 32-bit architectures.
> +				 */
> +				unsigned long dma_addr_upper;
> +				/**
> +				 * For frag page support, not supported in
> +				 * 32-bit architectures with 64-bit DMA.
> +				 */
> +				atomic_long_t pp_frag_count;
> +			};
>   		};
>   		struct {	/* slab, slob and slub */
>   			union {
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 8d7744d..42e6997 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -45,7 +45,10 @@
>   					* Please note DMA-sync-for-CPU is still
>   					* device driver responsibility
>   					*/
> -#define PP_FLAG_ALL		(PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)
> +#define PP_FLAG_PAGE_FRAG	BIT(2) /* for page frag feature */
> +#define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
> +				 PP_FLAG_DMA_SYNC_DEV |\
> +				 PP_FLAG_PAGE_FRAG)
>   
>   /*
>    * Fast allocation side cache array/stack
> @@ -198,19 +201,48 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
>   	page_pool_put_full_page(pool, page, true);
>   }
>   
> +#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
> +		(sizeof(dma_addr_t) > sizeof(unsigned long))
> +
>   static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>   {
> -	dma_addr_t ret = page->dma_addr[0];
> -	if (sizeof(dma_addr_t) > sizeof(unsigned long))
> -		ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;
> +	dma_addr_t ret = page->dma_addr;
> +
> +	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> +		ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;

I find the macro name confusing.

I think it would be easier to read the code, if it was called:
  PAGE_POOL_DMA_CANNOT_USE_PP_FRAG_COUNT

> +
>   	return ret;
>   }
>   
>   static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
>   {
> -	page->dma_addr[0] = addr;
> -	if (sizeof(dma_addr_t) > sizeof(unsigned long))
> -		page->dma_addr[1] = upper_32_bits(addr);
> +	page->dma_addr = addr;
> +	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> +		page->dma_addr_upper = upper_32_bits(addr);
> +}
> +
> +static inline void page_pool_set_frag_count(struct page *page, long nr)
> +{
> +	atomic_long_set(&page->pp_frag_count, nr);
> +}
> +
> +static inline long page_pool_atomic_sub_frag_count_return(struct page *page,
> +							  long nr)
> +{
> +	long ret;
> +
> +	/* As suggested by Alexander, atomic_long_read() may cover up the
> +	 * reference count errors, so avoid calling atomic_long_read() in
> +	 * the cases of freeing or draining the page_frags, where we would
> +	 * not expect it to match or that are slowpath anyway.
> +	 */
> +	if (__builtin_constant_p(nr) &&
> +	    atomic_long_read(&page->pp_frag_count) == nr)
> +		return 0;
> +
> +	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
> +	WARN_ON(ret < 0);
> +	return ret;
>   }
>   
>   static inline bool is_page_pool_compiled_in(void)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 78838c6..68fab94 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -67,6 +67,10 @@ static int page_pool_init(struct page_pool *pool,
>   		 */
>   	}
>   
> +	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
> +	    pool->p.flags & PP_FLAG_PAGE_FRAG)
> +		return -EINVAL;

I read this as: if the page_pool use pp_frag_count and have flag set, 
then it is invalid/no-allowed, which seems wrong.

I find this code more intuitive to read:

  +	if (PAGE_POOL_DMA_CANNOT_USE_PP_FRAG_COUNT &&
  +	    pool->p.flags & PP_FLAG_PAGE_FRAG)
  +		return -EINVAL;

--Jesper


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 0/4] add frag page support in page pool
  2021-08-10 14:43     ` Jakub Kicinski
@ 2021-08-10 15:09       ` Alexander Duyck
  2021-08-11  1:06         ` [Linuxarm] " Yunsheng Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Duyck @ 2021-08-10 15:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jesper Dangaard Brouer, Yunsheng Lin, Jesper Dangaard Brouer,
	David Miller, Russell King - ARM Linux, Marcin Wojtas, linuxarm,
	yisen.zhuang, Salil Mehta, thomas.petazzoni, hawk,
	Ilias Apalodimas, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrew Morton, Peter Zijlstra, Will Deacon,
	Matthew Wilcox, Vlastimil Babka, fenghua.yu, guro, Peter Xu,
	Feng Tang, Jason Gunthorpe, Matteo Croce, Hugh Dickins,
	Jonathan Lemon, Alexander Lobakin, Willem de Bruijn, wenxu,
	Cong Wang, Kevin Hao, nogikh, Marco Elver, Yonghong Song,
	kpsingh, andrii, Martin KaFai Lau, songliubraving, Netdev, LKML,
	bpf, chenhao288, Linux-MM

On Tue, Aug 10, 2021 at 7:43 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 10 Aug 2021 16:23:52 +0200 Jesper Dangaard Brouer wrote:
> > On 10/08/2021 16.01, Jakub Kicinski wrote:
> > > On Fri, 6 Aug 2021 10:46:18 +0800 Yunsheng Lin wrote:
> > >> enable skb's page frag recycling based on page pool in
> > >> hns3 drvier.
> > >
> > > Applied, thanks!
> >
> > I had hoped to see more acks / reviewed-by before this got applied.
> > E.g. from MM-people as this patchset changes struct page and page_pool
> > (that I'm marked as maintainer of).
>
> Sorry, it was on the list for days and there were 7 or so prior
> versions, I thought it was ripe. If possible, a note that review
> will come would be useful.
>
> > And I would have appreciated an reviewed-by credit to/from Alexander
> > as he did a lot of work in the RFC patchset for the split-page tricks.
>
> I asked him off-list, he said something I interpreted as "code is okay,
> but the review tag is not coming".

Yeah, I ran out of feedback a revision or two ago and just haven't had
a chance to go through and add my reviewed by. If you want feel free
to add my reviewed by for the set.

Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 2/4] page_pool: add interface to manipulate frag count in page pool
  2021-08-10 14:58   ` Jesper Dangaard Brouer
@ 2021-08-11  0:48     ` Yunsheng Lin
  0 siblings, 0 replies; 19+ messages in thread
From: Yunsheng Lin @ 2021-08-11  0:48 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, davem, kuba
  Cc: brouer, alexander.duyck, linux, mw, linuxarm, yisen.zhuang,
	salil.mehta, thomas.petazzoni, hawk, ilias.apalodimas, ast,
	daniel, john.fastabend, akpm, peterz, will, willy, vbabka,
	fenghua.yu, guro, peterx, feng.tang, jgg, mcroce, hughd,
	jonathan.lemon, alobakin, willemb, wenxu, cong.wang, haokexin,
	nogikh, elver, yhs, kpsingh, andrii, kafai, songliubraving,
	netdev, linux-kernel, bpf, chenhao288, Linux-MM

On 2021/8/10 22:58, Jesper Dangaard Brouer wrote:
> 
> 
> On 06/08/2021 04.46, Yunsheng Lin wrote:
>> For 32 bit systems with 64 bit dma, dma_addr[1] is used to
>> store the upper 32 bit dma addr, those system should be rare
>> those days.
>>
>> For normal system, the dma_addr[1] in 'struct page' is not
>> used, so we can reuse dma_addr[1] for storing frag count,
>> which means how many frags this page might be splited to.
>>
>> In order to simplify the page frag support in the page pool,
>> the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate
>> the 32 bit systems with 64 bit dma, and the page frag support
>> in page pool is disabled for such system.
>>
>> The newly added page_pool_set_frag_count() is called to reserve
>> the maximum frag count before any page frag is passed to the
>> user. The page_pool_atomic_sub_frag_count_return() is called
>> when user is done with the page frag.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
>>   include/linux/mm_types.h | 18 +++++++++++++-----
>>   include/net/page_pool.h  | 46 +++++++++++++++++++++++++++++++++++++++-------
>>   net/core/page_pool.c     |  4 ++++
>>   3 files changed, 56 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 52bbd2b..7f8ee09 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -103,11 +103,19 @@ struct page {
>>               unsigned long pp_magic;
>>               struct page_pool *pp;
>>               unsigned long _pp_mapping_pad;
>> -            /**
>> -             * @dma_addr: might require a 64-bit value on
>> -             * 32-bit architectures.
>> -             */
>> -            unsigned long dma_addr[2];
>> +            unsigned long dma_addr;
>> +            union {
>> +                /**
>> +                 * dma_addr_upper: might require a 64-bit
>> +                 * value on 32-bit architectures.
>> +                 */
>> +                unsigned long dma_addr_upper;
>> +                /**
>> +                 * For frag page support, not supported in
>> +                 * 32-bit architectures with 64-bit DMA.
>> +                 */
>> +                atomic_long_t pp_frag_count;
>> +            };
>>           };
>>           struct {    /* slab, slob and slub */
>>               union {
>> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
>> index 8d7744d..42e6997 100644
>> --- a/include/net/page_pool.h
>> +++ b/include/net/page_pool.h
>> @@ -45,7 +45,10 @@
>>                       * Please note DMA-sync-for-CPU is still
>>                       * device driver responsibility
>>                       */
>> -#define PP_FLAG_ALL        (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)
>> +#define PP_FLAG_PAGE_FRAG    BIT(2) /* for page frag feature */
>> +#define PP_FLAG_ALL        (PP_FLAG_DMA_MAP |\
>> +                 PP_FLAG_DMA_SYNC_DEV |\
>> +                 PP_FLAG_PAGE_FRAG)
>>     /*
>>    * Fast allocation side cache array/stack
>> @@ -198,19 +201,48 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
>>       page_pool_put_full_page(pool, page, true);
>>   }
>>   +#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT    \
>> +        (sizeof(dma_addr_t) > sizeof(unsigned long))
>> +
>>   static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>>   {
>> -    dma_addr_t ret = page->dma_addr[0];
>> -    if (sizeof(dma_addr_t) > sizeof(unsigned long))
>> -        ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;
>> +    dma_addr_t ret = page->dma_addr;
>> +
>> +    if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
>> +        ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
> 
> I find the macro name confusing.
> 
> I think it would be easier to read the code, if it was called:
>  PAGE_POOL_DMA_CANNOT_USE_PP_FRAG_COUNT

Actually, there is a *DMA* in tha above macro, which means DMA
addr uses the PP_FRAG_COUNT field.
Perhaps PAGE_POOL_DMA_ADDR_UPPER_USE_PP_FRAG_COUNT is more obvious
here?

> 
>> +
>>       return ret;
>>   }
>>     static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
>>   {
>> -    page->dma_addr[0] = addr;
>> -    if (sizeof(dma_addr_t) > sizeof(unsigned long))
>> -        page->dma_addr[1] = upper_32_bits(addr);
>> +    page->dma_addr = addr;
>> +    if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
>> +        page->dma_addr_upper = upper_32_bits(addr);
>> +}
>> +
>> +static inline void page_pool_set_frag_count(struct page *page, long nr)
>> +{
>> +    atomic_long_set(&page->pp_frag_count, nr);
>> +}
>> +
>> +static inline long page_pool_atomic_sub_frag_count_return(struct page *page,
>> +                              long nr)
>> +{
>> +    long ret;
>> +
>> +    /* As suggested by Alexander, atomic_long_read() may cover up the
>> +     * reference count errors, so avoid calling atomic_long_read() in
>> +     * the cases of freeing or draining the page_frags, where we would
>> +     * not expect it to match or that are slowpath anyway.
>> +     */
>> +    if (__builtin_constant_p(nr) &&
>> +        atomic_long_read(&page->pp_frag_count) == nr)
>> +        return 0;
>> +
>> +    ret = atomic_long_sub_return(nr, &page->pp_frag_count);
>> +    WARN_ON(ret < 0);
>> +    return ret;
>>   }
>>     static inline bool is_page_pool_compiled_in(void)
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index 78838c6..68fab94 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -67,6 +67,10 @@ static int page_pool_init(struct page_pool *pool,
>>            */
>>       }
>>   +    if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
>> +        pool->p.flags & PP_FLAG_PAGE_FRAG)
>> +        return -EINVAL;
> 
> I read this as: if the page_pool use pp_frag_count and have flag set, then it is invalid/no-allowed, which seems wrong.
> 
> I find this code more intuitive to read:
> 
>  +    if (PAGE_POOL_DMA_CANNOT_USE_PP_FRAG_COUNT &&
>  +        pool->p.flags & PP_FLAG_PAGE_FRAG)
>  +        return -EINVAL;
> 
> --Jesper
> 
> .
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Linuxarm] Re: [PATCH net-next v2 0/4] add frag page support in page pool
  2021-08-10 15:09       ` Alexander Duyck
@ 2021-08-11  1:06         ` Yunsheng Lin
  0 siblings, 0 replies; 19+ messages in thread
From: Yunsheng Lin @ 2021-08-11  1:06 UTC (permalink / raw)
  To: Alexander Duyck, Jakub Kicinski
  Cc: Jesper Dangaard Brouer, Jesper Dangaard Brouer, David Miller,
	Russell King - ARM Linux, Marcin Wojtas, linuxarm, yisen.zhuang,
	Salil Mehta, thomas.petazzoni, hawk, Ilias Apalodimas,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrew Morton, Peter Zijlstra, Will Deacon, Matthew Wilcox,
	Vlastimil Babka, fenghua.yu, guro, Peter Xu, Feng Tang,
	Jason Gunthorpe, Matteo Croce, Hugh Dickins, Jonathan Lemon,
	Alexander Lobakin, Willem de Bruijn, wenxu, Cong Wang, Kevin Hao,
	nogikh, Marco Elver, Yonghong Song, kpsingh, andrii,
	Martin KaFai Lau, songliubraving, Netdev, LKML, bpf, chenhao288,
	Linux-MM

On 2021/8/10 23:09, Alexander Duyck wrote:
> On Tue, Aug 10, 2021 at 7:43 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Tue, 10 Aug 2021 16:23:52 +0200 Jesper Dangaard Brouer wrote:
>>> On 10/08/2021 16.01, Jakub Kicinski wrote:
>>>> On Fri, 6 Aug 2021 10:46:18 +0800 Yunsheng Lin wrote:
>>>>> enable skb's page frag recycling based on page pool in
>>>>> hns3 drvier.
>>>>
>>>> Applied, thanks!
>>>
>>> I had hoped to see more acks / reviewed-by before this got applied.
>>> E.g. from MM-people as this patchset changes struct page and page_pool
>>> (that I'm marked as maintainer of).
>>
>> Sorry, it was on the list for days and there were 7 or so prior
>> versions, I thought it was ripe. If possible, a note that review
>> will come would be useful.
>>
>>> And I would have appreciated an reviewed-by credit to/from Alexander
>>> as he did a lot of work in the RFC patchset for the split-page tricks.

Yeah, the credit goes to Ilias, Matteo, Matthew too, the patchset from them
paves the path for supporting the skb frag page recycling.

>>
>> I asked him off-list, he said something I interpreted as "code is okay,
>> but the review tag is not coming".
> 
> Yeah, I ran out of feedback a revision or two ago and just haven't had
> a chance to go through and add my reviewed by. If you want feel free
> to add my reviewed by for the set.
> 
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>

Yeah, thanks for the time and patient for reviewing this patchset.

By the way, I am still trying to implement the tx recycling mentioned
in the other thread, which seems more controversial than rx recycling
as tx recycling may touch the tcp/ip and socket layer. So it would be
good have your opinion about that idea or implemention too:)

> _______________________________________________
> Linuxarm mailing list -- linuxarm@openeuler.org
> To unsubscribe send an email to linuxarm-leave@openeuler.org
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 2/4] page_pool: add interface to manipulate frag count in page pool
  2021-08-06  2:46 ` [PATCH net-next v2 2/4] page_pool: add interface to manipulate frag count in page pool Yunsheng Lin
  2021-08-10 14:58   ` Jesper Dangaard Brouer
@ 2021-08-12 15:17   ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2021-08-12 15:17 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: brouer, alexander.duyck, linux, mw, linuxarm, yisen.zhuang,
	salil.mehta, thomas.petazzoni, hawk, ilias.apalodimas, ast,
	daniel, john.fastabend, akpm, peterz, will, willy, vbabka,
	fenghua.yu, guro, peterx, feng.tang, jgg, mcroce, hughd,
	jonathan.lemon, alobakin, willemb, wenxu, cong.wang, haokexin,
	nogikh, elver, yhs, kpsingh, andrii, kafai, songliubraving,
	netdev, linux-kernel, bpf, chenhao288


On 06/08/2021 04.46, Yunsheng Lin wrote:
> +static inline long page_pool_atomic_sub_frag_count_return(struct page *page,
> +							  long nr)
> +{
> +	long ret;
> +
> +	/* As suggested by Alexander, atomic_long_read() may cover up the
> +	 * reference count errors, so avoid calling atomic_long_read() in
> +	 * the cases of freeing or draining the page_frags, where we would
> +	 * not expect it to match or that are slowpath anyway.
> +	 */
> +	if (__builtin_constant_p(nr) &&
> +	    atomic_long_read(&page->pp_frag_count) == nr)
> +		return 0;
> +
> +	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
> +	WARN_ON(ret < 0);

I worried about this WARN_ON() as it generates an 'ud2' instruction 
which influence I-cache fetching.  But I have disassembled (objdump) the 
page_pool.o binary and the ud2 gets placed last in the main function 
page_pool_put_page() that use this inlined function.
Thus, I assume this is not a problem :-)


> +	return ret;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on page pool
  2021-08-06  2:46 ` [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on " Yunsheng Lin
@ 2021-09-08  8:31   ` moyufeng
  2021-09-08 15:08     ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: moyufeng @ 2021-09-08  8:31 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: alexander.duyck, linux, mw, linuxarm, yisen.zhuang, salil.mehta,
	thomas.petazzoni, hawk, ilias.apalodimas, ast, daniel,
	john.fastabend, akpm, peterz, will, willy, vbabka, fenghua.yu,
	guro, peterx, feng.tang, jgg, mcroce, hughd, jonathan.lemon,
	alobakin, willemb, wenxu, cong.wang, haokexin, nogikh, elver,
	yhs, kpsingh, andrii, kafai, songliubraving, netdev,
	linux-kernel, bpf, chenhao288, moyufeng

Hi Jakub

    After adding page pool to hns3 receiving package process,
we want to add some debug info. Such as below:

1. count of page pool allocate and free page, which is defined
for pages_state_hold_cnt and pages_state_release_cnt in page
pool framework.

2. pool size、order、nid、dev、max_len, which is setted for
each rx ring in hns3 driver.

In this regard, we consider two ways to show these info:

1. Add it to queue statistics and query it by ethtool -S.

2. Add a file node "page_pool_info" for debugfs, then cat this
file node, print as below:

queue_id  allocate_cnt  free_cnt  pool_size  order  nid  dev  max_len
000		   xxx       xxx        xxx    xxx  xxx  xxx      xxx
001
002
.
.
	
Which one is more acceptable, or would you have some other suggestion?

Thanks


On 2021/8/6 10:46, Yunsheng Lin wrote:
> This patch adds skb's frag page recycling support based on
> the frag page support in page pool.
> 
> The performance improves above 10~20% for single thread iperf
> TCP flow with IOMMU disabled when iperf server and irq/NAPI
> have a different CPU.
> 
> The performance improves about 135%(14Gbit to 33Gbit) for single
> thread iperf TCP flow when IOMMU is in strict mode and iperf
> server shares the same cpu with irq/NAPI.
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>  drivers/net/ethernet/hisilicon/Kconfig          |  1 +
>  drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 79 +++++++++++++++++++++++--
>  drivers/net/ethernet/hisilicon/hns3/hns3_enet.h |  3 +
>  3 files changed, 78 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
> index 094e4a3..2ba0e7b 100644
> --- a/drivers/net/ethernet/hisilicon/Kconfig
> +++ b/drivers/net/ethernet/hisilicon/Kconfig
> @@ -91,6 +91,7 @@ config HNS3
>  	tristate "Hisilicon Network Subsystem Support HNS3 (Framework)"
>  	depends on PCI
>  	select NET_DEVLINK
> +	select PAGE_POOL
>  	help
>  	  This selects the framework support for Hisilicon Network Subsystem 3.
>  	  This layer facilitates clients like ENET, RoCE and user-space ethernet
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> index cb8d5da..fcbeb1f 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> @@ -3205,6 +3205,21 @@ static int hns3_alloc_buffer(struct hns3_enet_ring *ring,
>  	unsigned int order = hns3_page_order(ring);
>  	struct page *p;
>  
> +	if (ring->page_pool) {
> +		p = page_pool_dev_alloc_frag(ring->page_pool,
> +					     &cb->page_offset,
> +					     hns3_buf_size(ring));
> +		if (unlikely(!p))
> +			return -ENOMEM;
> +
> +		cb->priv = p;
> +		cb->buf = page_address(p);
> +		cb->dma = page_pool_get_dma_addr(p);
> +		cb->type = DESC_TYPE_PP_FRAG;
> +		cb->reuse_flag = 0;
> +		return 0;
> +	}
> +
>  	p = dev_alloc_pages(order);
>  	if (!p)
>  		return -ENOMEM;
> @@ -3227,8 +3242,13 @@ static void hns3_free_buffer(struct hns3_enet_ring *ring,
>  	if (cb->type & (DESC_TYPE_SKB | DESC_TYPE_BOUNCE_HEAD |
>  			DESC_TYPE_BOUNCE_ALL | DESC_TYPE_SGL_SKB))
>  		napi_consume_skb(cb->priv, budget);
> -	else if (!HNAE3_IS_TX_RING(ring) && cb->pagecnt_bias)
> -		__page_frag_cache_drain(cb->priv, cb->pagecnt_bias);
> +	else if (!HNAE3_IS_TX_RING(ring)) {
> +		if (cb->type & DESC_TYPE_PAGE && cb->pagecnt_bias)
> +			__page_frag_cache_drain(cb->priv, cb->pagecnt_bias);
> +		else if (cb->type & DESC_TYPE_PP_FRAG)
> +			page_pool_put_full_page(ring->page_pool, cb->priv,
> +						false);
> +	}
>  	memset(cb, 0, sizeof(*cb));
>  }
>  
> @@ -3315,7 +3335,7 @@ static int hns3_alloc_and_map_buffer(struct hns3_enet_ring *ring,
>  	int ret;
>  
>  	ret = hns3_alloc_buffer(ring, cb);
> -	if (ret)
> +	if (ret || ring->page_pool)
>  		goto out;
>  
>  	ret = hns3_map_buffer(ring, cb);
> @@ -3337,7 +3357,8 @@ static int hns3_alloc_and_attach_buffer(struct hns3_enet_ring *ring, int i)
>  	if (ret)
>  		return ret;
>  
> -	ring->desc[i].addr = cpu_to_le64(ring->desc_cb[i].dma);
> +	ring->desc[i].addr = cpu_to_le64(ring->desc_cb[i].dma +
> +					 ring->desc_cb[i].page_offset);
>  
>  	return 0;
>  }
> @@ -3367,7 +3388,8 @@ static void hns3_replace_buffer(struct hns3_enet_ring *ring, int i,
>  {
>  	hns3_unmap_buffer(ring, &ring->desc_cb[i]);
>  	ring->desc_cb[i] = *res_cb;
> -	ring->desc[i].addr = cpu_to_le64(ring->desc_cb[i].dma);
> +	ring->desc[i].addr = cpu_to_le64(ring->desc_cb[i].dma +
> +					 ring->desc_cb[i].page_offset);
>  	ring->desc[i].rx.bd_base_info = 0;
>  }
>  
> @@ -3539,6 +3561,12 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i,
>  	u32 frag_size = size - pull_len;
>  	bool reused;
>  
> +	if (ring->page_pool) {
> +		skb_add_rx_frag(skb, i, desc_cb->priv, frag_offset,
> +				frag_size, truesize);
> +		return;
> +	}
> +
>  	/* Avoid re-using remote or pfmem page */
>  	if (unlikely(!dev_page_is_reusable(desc_cb->priv)))
>  		goto out;
> @@ -3856,6 +3884,9 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length,
>  		/* We can reuse buffer as-is, just make sure it is reusable */
>  		if (dev_page_is_reusable(desc_cb->priv))
>  			desc_cb->reuse_flag = 1;
> +		else if (desc_cb->type & DESC_TYPE_PP_FRAG)
> +			page_pool_put_full_page(ring->page_pool, desc_cb->priv,
> +						false);
>  		else /* This page cannot be reused so discard it */
>  			__page_frag_cache_drain(desc_cb->priv,
>  						desc_cb->pagecnt_bias);
> @@ -3863,6 +3894,10 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length,
>  		hns3_rx_ring_move_fw(ring);
>  		return 0;
>  	}
> +
> +	if (ring->page_pool)
> +		skb_mark_for_recycle(skb);
> +
>  	u64_stats_update_begin(&ring->syncp);
>  	ring->stats.seg_pkt_cnt++;
>  	u64_stats_update_end(&ring->syncp);
> @@ -3901,6 +3936,10 @@ static int hns3_add_frag(struct hns3_enet_ring *ring)
>  					    "alloc rx fraglist skb fail\n");
>  				return -ENXIO;
>  			}
> +
> +			if (ring->page_pool)
> +				skb_mark_for_recycle(new_skb);
> +
>  			ring->frag_num = 0;
>  
>  			if (ring->tail_skb) {
> @@ -4705,6 +4744,29 @@ static void hns3_put_ring_config(struct hns3_nic_priv *priv)
>  	priv->ring = NULL;
>  }
>  
> +static void hns3_alloc_page_pool(struct hns3_enet_ring *ring)
> +{
> +	struct page_pool_params pp_params = {
> +		.flags = PP_FLAG_DMA_MAP | PP_FLAG_PAGE_FRAG |
> +				PP_FLAG_DMA_SYNC_DEV,
> +		.order = hns3_page_order(ring),
> +		.pool_size = ring->desc_num * hns3_buf_size(ring) /
> +				(PAGE_SIZE << hns3_page_order(ring)),
> +		.nid = dev_to_node(ring_to_dev(ring)),
> +		.dev = ring_to_dev(ring),
> +		.dma_dir = DMA_FROM_DEVICE,
> +		.offset = 0,
> +		.max_len = PAGE_SIZE << hns3_page_order(ring),
> +	};
> +
> +	ring->page_pool = page_pool_create(&pp_params);
> +	if (IS_ERR(ring->page_pool)) {
> +		dev_warn(ring_to_dev(ring), "page pool creation failed: %ld\n",
> +			 PTR_ERR(ring->page_pool));
> +		ring->page_pool = NULL;
> +	}
> +}
> +
>  static int hns3_alloc_ring_memory(struct hns3_enet_ring *ring)
>  {
>  	int ret;
> @@ -4724,6 +4786,8 @@ static int hns3_alloc_ring_memory(struct hns3_enet_ring *ring)
>  		goto out_with_desc_cb;
>  
>  	if (!HNAE3_IS_TX_RING(ring)) {
> +		hns3_alloc_page_pool(ring);
> +
>  		ret = hns3_alloc_ring_buffers(ring);
>  		if (ret)
>  			goto out_with_desc;
> @@ -4764,6 +4828,11 @@ void hns3_fini_ring(struct hns3_enet_ring *ring)
>  		devm_kfree(ring_to_dev(ring), tx_spare);
>  		ring->tx_spare = NULL;
>  	}
> +
> +	if (!HNAE3_IS_TX_RING(ring) && ring->page_pool) {
> +		page_pool_destroy(ring->page_pool);
> +		ring->page_pool = NULL;
> +	}
>  }
>  
>  static int hns3_buf_size2type(u32 buf_size)
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
> index 15af3d9..27809d6 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
> @@ -6,6 +6,7 @@
>  
>  #include <linux/dim.h>
>  #include <linux/if_vlan.h>
> +#include <net/page_pool.h>
>  
>  #include "hnae3.h"
>  
> @@ -307,6 +308,7 @@ enum hns3_desc_type {
>  	DESC_TYPE_BOUNCE_ALL		= 1 << 3,
>  	DESC_TYPE_BOUNCE_HEAD		= 1 << 4,
>  	DESC_TYPE_SGL_SKB		= 1 << 5,
> +	DESC_TYPE_PP_FRAG		= 1 << 6,
>  };
>  
>  struct hns3_desc_cb {
> @@ -451,6 +453,7 @@ struct hns3_enet_ring {
>  	struct hnae3_queue *tqp;
>  	int queue_index;
>  	struct device *dev; /* will be used for DMA mapping of descriptors */
> +	struct page_pool *page_pool;
>  
>  	/* statistic */
>  	struct ring_stats stats;
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on page pool
  2021-09-08  8:31   ` moyufeng
@ 2021-09-08 15:08     ` Jakub Kicinski
  2021-09-08 15:26       ` Ilias Apalodimas
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2021-09-08 15:08 UTC (permalink / raw)
  To: moyufeng
  Cc: Yunsheng Lin, davem, alexander.duyck, linux, mw, linuxarm,
	yisen.zhuang, salil.mehta, thomas.petazzoni, hawk,
	ilias.apalodimas, ast, daniel, john.fastabend, akpm, peterz,
	will, willy, vbabka, fenghua.yu, guro, peterx, feng.tang, jgg,
	mcroce, hughd, jonathan.lemon, alobakin, willemb, wenxu,
	cong.wang, haokexin, nogikh, elver, yhs, kpsingh, andrii, kafai,
	songliubraving, netdev, linux-kernel, bpf, chenhao288

On Wed, 8 Sep 2021 16:31:40 +0800 moyufeng wrote:
>     After adding page pool to hns3 receiving package process,
> we want to add some debug info. Such as below:
> 
> 1. count of page pool allocate and free page, which is defined
> for pages_state_hold_cnt and pages_state_release_cnt in page
> pool framework.
> 
> 2. pool size、order、nid、dev、max_len, which is setted for
> each rx ring in hns3 driver.
> 
> In this regard, we consider two ways to show these info:
> 
> 1. Add it to queue statistics and query it by ethtool -S.
> 
> 2. Add a file node "page_pool_info" for debugfs, then cat this
> file node, print as below:
> 
> queue_id  allocate_cnt  free_cnt  pool_size  order  nid  dev  max_len
> 000		   xxx       xxx        xxx    xxx  xxx  xxx      xxx
> 001
> 002
> .
> .
> 	
> Which one is more acceptable, or would you have some other suggestion?

Normally I'd say put the stats in ethtool -S and the rest in debugfs
but I'm not sure if exposing pages_state_hold_cnt and
pages_state_release_cnt directly. Those are short counters, and will
very likely wrap. They are primarily meaningful for calculating
page_pool_inflight(). Given this I think their semantics may be too
confusing for an average ethtool -S user.

Putting all the information in debugfs seems like a better idea.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on page pool
  2021-09-08 15:08     ` Jakub Kicinski
@ 2021-09-08 15:26       ` Ilias Apalodimas
  2021-09-08 15:57         ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Ilias Apalodimas @ 2021-09-08 15:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: moyufeng, Yunsheng Lin, davem, alexander.duyck, linux, mw,
	linuxarm, yisen.zhuang, salil.mehta, thomas.petazzoni, hawk, ast,
	daniel, john.fastabend, akpm, peterz, will, willy, vbabka,
	fenghua.yu, guro, peterx, feng.tang, jgg, mcroce, hughd,
	jonathan.lemon, alobakin, willemb, wenxu, cong.wang, haokexin,
	nogikh, elver, yhs, kpsingh, andrii, kafai, songliubraving,
	netdev, linux-kernel, bpf, chenhao288

Hi Jakub,

On Wed, Sep 08, 2021 at 08:08:43AM -0700, Jakub Kicinski wrote:
> On Wed, 8 Sep 2021 16:31:40 +0800 moyufeng wrote:
> >     After adding page pool to hns3 receiving package process,
> > we want to add some debug info. Such as below:
> > 
> > 1. count of page pool allocate and free page, which is defined
> > for pages_state_hold_cnt and pages_state_release_cnt in page
> > pool framework.
> > 
> > 2. pool size、order、nid、dev、max_len, which is setted for
> > each rx ring in hns3 driver.
> > 
> > In this regard, we consider two ways to show these info:
> > 
> > 1. Add it to queue statistics and query it by ethtool -S.
> > 
> > 2. Add a file node "page_pool_info" for debugfs, then cat this
> > file node, print as below:
> > 
> > queue_id  allocate_cnt  free_cnt  pool_size  order  nid  dev  max_len
> > 000		   xxx       xxx        xxx    xxx  xxx  xxx      xxx
> > 001
> > 002
> > .
> > .
> > 	
> > Which one is more acceptable, or would you have some other suggestion?
> 
> Normally I'd say put the stats in ethtool -S and the rest in debugfs
> but I'm not sure if exposing pages_state_hold_cnt and
> pages_state_release_cnt directly. Those are short counters, and will
> very likely wrap. They are primarily meaningful for calculating
> page_pool_inflight(). Given this I think their semantics may be too
> confusing for an average ethtool -S user.
> 
> Putting all the information in debugfs seems like a better idea.

I can't really disagree on the aforementioned stats being confusing.
However at some point we'll want to add more useful page_pool stats (e.g the
percentage of the page/page fragments that are hitting the recycling path).
Would it still be 'ok' to have info split across ethtool and debugfs?

Regards
/Ilias

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on page pool
  2021-09-08 15:26       ` Ilias Apalodimas
@ 2021-09-08 15:57         ` Jakub Kicinski
  2021-09-08 16:47           ` Jesper Dangaard Brouer
  2021-09-08 16:51           ` Ilias Apalodimas
  0 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2021-09-08 15:57 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: moyufeng, Yunsheng Lin, davem, alexander.duyck, linux, mw,
	linuxarm, yisen.zhuang, salil.mehta, thomas.petazzoni, hawk, ast,
	daniel, john.fastabend, akpm, peterz, will, willy, vbabka,
	fenghua.yu, guro, peterx, feng.tang, jgg, mcroce, hughd,
	jonathan.lemon, alobakin, willemb, wenxu, cong.wang, haokexin,
	nogikh, elver, yhs, kpsingh, andrii, kafai, songliubraving,
	netdev, linux-kernel, bpf, chenhao288

On Wed, 8 Sep 2021 18:26:35 +0300 Ilias Apalodimas wrote:
> > Normally I'd say put the stats in ethtool -S and the rest in debugfs
> > but I'm not sure if exposing pages_state_hold_cnt and
> > pages_state_release_cnt directly. Those are short counters, and will
> > very likely wrap. They are primarily meaningful for calculating
> > page_pool_inflight(). Given this I think their semantics may be too
> > confusing for an average ethtool -S user.
> > 
> > Putting all the information in debugfs seems like a better idea.  
> 
> I can't really disagree on the aforementioned stats being confusing.
> However at some point we'll want to add more useful page_pool stats (e.g the
> percentage of the page/page fragments that are hitting the recycling path).
> Would it still be 'ok' to have info split across ethtool and debugfs?

Possibly. We'll also see what Alex L comes up with for XDP stats. Maybe
we can arrive at a netlink API for standard things (broken record).

You said percentage - even tho I personally don't like it - there is a
small precedent of ethtool -S containing non-counter information (IOW
not monotonically increasing event counters), e.g. some vendors rammed
PCI link quality in there. So if all else fails ethtool -S should be
fine.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on page pool
  2021-09-08 15:57         ` Jakub Kicinski
@ 2021-09-08 16:47           ` Jesper Dangaard Brouer
  2021-09-08 16:51           ` Ilias Apalodimas
  1 sibling, 0 replies; 19+ messages in thread
From: Jesper Dangaard Brouer @ 2021-09-08 16:47 UTC (permalink / raw)
  To: Jakub Kicinski, Ilias Apalodimas
  Cc: brouer, moyufeng, Yunsheng Lin, davem, alexander.duyck, linux,
	mw, linuxarm, yisen.zhuang, salil.mehta, thomas.petazzoni, hawk,
	ast, daniel, john.fastabend, akpm, peterz, will, willy, vbabka,
	fenghua.yu, guro, peterx, feng.tang, jgg, mcroce, hughd,
	jonathan.lemon, alobakin, willemb, wenxu, cong.wang, haokexin,
	nogikh, elver, yhs, kpsingh, andrii, kafai, songliubraving,
	netdev, linux-kernel, bpf, chenhao288



On 08/09/2021 17.57, Jakub Kicinski wrote:
> On Wed, 8 Sep 2021 18:26:35 +0300 Ilias Apalodimas wrote:
>>> Normally I'd say put the stats in ethtool -S and the rest in debugfs
>>> but I'm not sure if exposing pages_state_hold_cnt and
>>> pages_state_release_cnt directly. Those are short counters, and will
>>> very likely wrap. They are primarily meaningful for calculating
>>> page_pool_inflight(). Given this I think their semantics may be too
>>> confusing for an average ethtool -S user.
>>>
>>> Putting all the information in debugfs seems like a better idea.
>>
>> I can't really disagree on the aforementioned stats being confusing.
>> However at some point we'll want to add more useful page_pool stats (e.g the
>> percentage of the page/page fragments that are hitting the recycling path).
>> Would it still be 'ok' to have info split across ethtool and debugfs?
> 
> Possibly. We'll also see what Alex L comes up with for XDP stats. Maybe
> we can arrive at a netlink API for standard things (broken record).
> 
> You said percentage - even tho I personally don't like it - there is a
> small precedent of ethtool -S containing non-counter information (IOW
> not monotonically increasing event counters), e.g. some vendors rammed
> PCI link quality in there. So if all else fails ethtool -S should be
> fine.

I agree with Ilias, that we ought-to add some page_pool stats.
*BUT* ONLY if this doesn't hurt performance!!!

We have explained before, how this is possible, e.g. by keeping consumer 
vs. producer counters on separate cache-lines (internally in page_pool 
struct and likely on per CPU for returning pages).  Then the drivers 
ethtool functions can request the page_pool to fillout a driver provided 
stats area, such that the collection and aggregation of counters are not 
on the fast-path.

I definitely don't want to see pages_state_hold_cnt and 
pages_state_release_cnt being exposed directly.  These were carefully 
designed to not hurt performance. An inflight counter can be deducted by 
above ethtool-driver step and presented to userspace.


Notice that while developing page_pool, I've been using tracepoints and 
bpftrace scripts to inspect the behavior and internals of page_pool.
See[1] and I've even written a page leak detector[2].

In principle you could write a bpftrace tool that extract stats, the 
same way. But I would only recommend doing this for devel phase, because 
these tracepoints do add some overhead.
Originally I wanted to push people to use this for stats, but I've 
realized that not having these stats easy available is annoying ;-)

-Jesper

[1] 
https://github.com/xdp-project/xdp-project/tree/master/areas/mem/bpftrace
[2] 
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_leaks02.bt


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on page pool
  2021-09-08 15:57         ` Jakub Kicinski
  2021-09-08 16:47           ` Jesper Dangaard Brouer
@ 2021-09-08 16:51           ` Ilias Apalodimas
  1 sibling, 0 replies; 19+ messages in thread
From: Ilias Apalodimas @ 2021-09-08 16:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: moyufeng, Yunsheng Lin, davem, alexander.duyck, linux, mw,
	linuxarm, yisen.zhuang, salil.mehta, thomas.petazzoni, hawk, ast,
	daniel, john.fastabend, akpm, peterz, will, willy, vbabka,
	fenghua.yu, guro, peterx, feng.tang, jgg, mcroce, hughd,
	jonathan.lemon, alobakin, willemb, wenxu, cong.wang, haokexin,
	nogikh, elver, yhs, kpsingh, andrii, kafai, songliubraving,
	netdev, linux-kernel, bpf, chenhao288

On Wed, Sep 08, 2021 at 08:57:23AM -0700, Jakub Kicinski wrote:
> On Wed, 8 Sep 2021 18:26:35 +0300 Ilias Apalodimas wrote:
> > > Normally I'd say put the stats in ethtool -S and the rest in debugfs
> > > but I'm not sure if exposing pages_state_hold_cnt and
> > > pages_state_release_cnt directly. Those are short counters, and will
> > > very likely wrap. They are primarily meaningful for calculating
> > > page_pool_inflight(). Given this I think their semantics may be too
> > > confusing for an average ethtool -S user.
> > > 
> > > Putting all the information in debugfs seems like a better idea.  
> > 
> > I can't really disagree on the aforementioned stats being confusing.
> > However at some point we'll want to add more useful page_pool stats (e.g the
> > percentage of the page/page fragments that are hitting the recycling path).
> > Would it still be 'ok' to have info split across ethtool and debugfs?
> 
> Possibly. We'll also see what Alex L comes up with for XDP stats. Maybe
> we can arrive at a netlink API for standard things (broken record).
> 
> You said percentage - even tho I personally don't like it - there is a
> small precedent of ethtool -S containing non-counter information (IOW
> not monotonically increasing event counters), e.g. some vendors rammed
> PCI link quality in there. So if all else fails ethtool -S should be
> fine.

Yea percentage may have been the wrong example. I agree that having
absolute numbers (all allocated pages and recycled pages) is a better
option.  To be honest keeping the 'weird' stats in debugfs seems sane, the 
pages_state_hold_cnt/pages_state_release_cnt are only going to be needed
during debug.


Thanks
/Ilias

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-09-08 16:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-06  2:46 [PATCH net-next v2 0/4] add frag page support in page pool Yunsheng Lin
2021-08-06  2:46 ` [PATCH net-next v2 1/4] page_pool: keep pp info as long as page pool owns the page Yunsheng Lin
2021-08-06  2:46 ` [PATCH net-next v2 2/4] page_pool: add interface to manipulate frag count in page pool Yunsheng Lin
2021-08-10 14:58   ` Jesper Dangaard Brouer
2021-08-11  0:48     ` Yunsheng Lin
2021-08-12 15:17   ` Jesper Dangaard Brouer
2021-08-06  2:46 ` [PATCH net-next v2 3/4] page_pool: add frag page recycling support " Yunsheng Lin
2021-08-06  2:46 ` [PATCH net-next v2 4/4] net: hns3: support skb's frag page recycling based on " Yunsheng Lin
2021-09-08  8:31   ` moyufeng
2021-09-08 15:08     ` Jakub Kicinski
2021-09-08 15:26       ` Ilias Apalodimas
2021-09-08 15:57         ` Jakub Kicinski
2021-09-08 16:47           ` Jesper Dangaard Brouer
2021-09-08 16:51           ` Ilias Apalodimas
2021-08-10 14:01 ` [PATCH net-next v2 0/4] add frag page support in " Jakub Kicinski
2021-08-10 14:23   ` Jesper Dangaard Brouer
2021-08-10 14:43     ` Jakub Kicinski
2021-08-10 15:09       ` Alexander Duyck
2021-08-11  1:06         ` [Linuxarm] " Yunsheng Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).