Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH net 0/3] Avoid premature Rx buffer reuse for XDP_REDIRECT
@ 2020-08-25 9:16 Björn Töpel
2020-08-25 9:16 ` [PATCH net 1/3] i40e: avoid premature Rx buffer reuse Björn Töpel
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Björn Töpel @ 2020-08-25 9:16 UTC (permalink / raw)
To: jeffrey.t.kirsher, intel-wired-lan
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev,
maciej.fijalkowski, piotr.raczynski, maciej.machnikowski,
lirongqing
Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(), e.g. a redirect to a
devmap where the ndo_xdp_xmit() implementation would transmit and free
the frame, or xskmap where the frame would be copied to userspace and
freed.
This assumption leads to that page fragments that are used by the
stack/XDP redirect can be reused and overwritten.
To avoid this, store the page count prior invoking
xdp_do_redirect(). The affected drivers are ixgbe, ice, and i40e.
An example how things might go wrong:
t0: Page is allocated, and put on the Rx ring
+---------------
used by NIC ->| upper buffer
(rx_buffer) +---------------
| lower buffer
+---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX
t1: Buffer is received, and passed to the stack (e.g.)
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 1
t2: Buffer is received, and redirected
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
Now, prior calling xdp_do_redirect():
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 2
This means that buffer *cannot* be flipped/reused, because the skb is
still using it.
The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
page count == USHRT_MAX - 1
rx_buffer->pagecnt_bias == USHRT_MAX - 2
From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!
To work around this, the page count is stored prior calling
xdp_do_redirect().
Note that this is not optimal, since the NIC could actually reuse the
"lower buffer" again. However, then we need to track whether
XDP_REDIRECT consumed the buffer or not. This scenario is very rare,
and tracking consumtion status would introduce more complexity.
A big thanks to Li RongQing from Baidu for having patience with me
understanding that there was a bug. I would have given up much
earlier! :-)
Cheers,
Björn
Björn Töpel (3):
i40e: avoid premature Rx buffer reuse
ixgbe: avoid premature Rx buffer reuse
ice: avoid premature Rx buffer reuse
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 28 ++++++++++++-----
drivers/net/ethernet/intel/ice/ice_txrx.c | 31 +++++++++++++------
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 28 ++++++++++++-----
3 files changed, 64 insertions(+), 23 deletions(-)
base-commit: 99408c422d336db32bfab5cbebc10038a70cf7d2
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net 1/3] i40e: avoid premature Rx buffer reuse
2020-08-25 9:16 [PATCH net 0/3] Avoid premature Rx buffer reuse for XDP_REDIRECT Björn Töpel
@ 2020-08-25 9:16 ` Björn Töpel
2020-08-25 11:13 ` Maciej Fijalkowski
2020-08-25 9:16 ` [PATCH net 2/3] ixgbe: " Björn Töpel
2020-08-25 9:16 ` [PATCH net 3/3] ice: " Björn Töpel
2 siblings, 1 reply; 11+ messages in thread
From: Björn Töpel @ 2020-08-25 9:16 UTC (permalink / raw)
To: jeffrey.t.kirsher, intel-wired-lan
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev,
maciej.fijalkowski, piotr.raczynski, maciej.machnikowski,
lirongqing
From: Björn Töpel <bjorn.topel@intel.com>
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Longer explanation:
Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.
t0: Page is allocated, and put on the Rx ring
+---------------
used by NIC ->| upper buffer
(rx_buffer) +---------------
| lower buffer
+---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX
t1: Buffer is received, and passed to the stack (e.g.)
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 1
t2: Buffer is received, and redirected
+---------------
| upper buff (skb)
+---------------
used by NIC ->| lower buffer
(rx_buffer) +---------------
Now, prior calling xdp_do_redirect():
page count == USHRT_MAX
rx_buffer->pagecnt_bias == USHRT_MAX - 2
This means that buffer *cannot* be flipped/reused, because the skb is
still using it.
The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
page count == USHRT_MAX - 1
rx_buffer->pagecnt_bias == USHRT_MAX - 2
From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!
To work around this, the page count is stored prior calling
xdp_do_redirect().
Note that this is not optimal, since the NIC could actually reuse the
"lower buffer" again. However, then we need to track whether
XDP_REDIRECT consumed the buffer or not.
Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT")
Reported-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 28 +++++++++++++++------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 3e5c566ceb01..5e446dc39190 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1873,7 +1873,8 @@ static inline bool i40e_page_is_reusable(struct page *page)
*
* In either case, if the page is reusable its refcount is increased.
**/
-static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer)
+static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer,
+ int rx_buffer_pgcnt)
{
unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
struct page *page = rx_buffer->page;
@@ -1884,7 +1885,7 @@ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer)
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
- if (unlikely((page_count(page) - pagecnt_bias) > 1))
+ if (unlikely((rx_buffer_pgcnt - pagecnt_bias) > 1))
return false;
#else
#define I40E_LAST_OFFSET \
@@ -1939,6 +1940,15 @@ static void i40e_add_rx_frag(struct i40e_ring *rx_ring,
#endif
}
+static int i40e_rx_buffer_page_count(struct i40e_rx_buffer *rx_buffer)
+{
+#if (PAGE_SIZE < 8192)
+ return page_count(rx_buffer->page);
+#else
+ return 0;
+#endif
+}
+
/**
* i40e_get_rx_buffer - Fetch Rx buffer and synchronize data for use
* @rx_ring: rx descriptor ring to transact packets on
@@ -1948,11 +1958,13 @@ static void i40e_add_rx_frag(struct i40e_ring *rx_ring,
* for use by the CPU.
*/
static struct i40e_rx_buffer *i40e_get_rx_buffer(struct i40e_ring *rx_ring,
- const unsigned int size)
+ const unsigned int size,
+ int *rx_buffer_pgcnt)
{
struct i40e_rx_buffer *rx_buffer;
rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean);
+ *rx_buffer_pgcnt = i40e_rx_buffer_page_count(rx_buffer);
prefetchw(rx_buffer->page);
/* we are reusing so sync this buffer for CPU use */
@@ -2112,9 +2124,10 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
* either recycle the buffer or unmap it and free the associated resources.
*/
static void i40e_put_rx_buffer(struct i40e_ring *rx_ring,
- struct i40e_rx_buffer *rx_buffer)
+ struct i40e_rx_buffer *rx_buffer,
+ int rx_buffer_pgcnt)
{
- if (i40e_can_reuse_rx_page(rx_buffer)) {
+ if (i40e_can_reuse_rx_page(rx_buffer, rx_buffer_pgcnt)) {
/* hand second half of page back to the ring */
i40e_reuse_rx_page(rx_ring, rx_buffer);
} else {
@@ -2319,6 +2332,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
unsigned int xdp_xmit = 0;
bool failure = false;
struct xdp_buff xdp;
+ int rx_buffer_pgcnt;
#if (PAGE_SIZE < 8192)
xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
@@ -2370,7 +2384,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
break;
i40e_trace(clean_rx_irq, rx_ring, rx_desc, skb);
- rx_buffer = i40e_get_rx_buffer(rx_ring, size);
+ rx_buffer = i40e_get_rx_buffer(rx_ring, size, &rx_buffer_pgcnt);
/* retrieve a buffer from the ring */
if (!skb) {
@@ -2413,7 +2427,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
break;
}
- i40e_put_rx_buffer(rx_ring, rx_buffer);
+ i40e_put_rx_buffer(rx_ring, rx_buffer, rx_buffer_pgcnt);
cleaned_count++;
if (i40e_is_non_eop(rx_ring, rx_desc, skb))
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 2/3] ixgbe: avoid premature Rx buffer reuse
2020-08-25 9:16 [PATCH net 0/3] Avoid premature Rx buffer reuse for XDP_REDIRECT Björn Töpel
2020-08-25 9:16 ` [PATCH net 1/3] i40e: avoid premature Rx buffer reuse Björn Töpel
@ 2020-08-25 9:16 ` Björn Töpel
2020-08-25 9:55 ` Li,Rongqing
2020-08-25 9:16 ` [PATCH net 3/3] ice: " Björn Töpel
2 siblings, 1 reply; 11+ messages in thread
From: Björn Töpel @ 2020-08-25 9:16 UTC (permalink / raw)
To: jeffrey.t.kirsher, intel-wired-lan
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev,
maciej.fijalkowski, piotr.raczynski, maciej.machnikowski,
lirongqing
From: Björn Töpel <bjorn.topel@intel.com>
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 28 ++++++++++++++-----
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2f8a4cfc5fa1..fb5c311d72b6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1945,7 +1945,8 @@ static inline bool ixgbe_page_is_reserved(struct page *page)
return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
}
-static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer)
+static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer,
+ int rx_buffer_pgcnt)
{
unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
struct page *page = rx_buffer->page;
@@ -1956,7 +1957,7 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer)
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
- if (unlikely((page_ref_count(page) - pagecnt_bias) > 1))
+ if (unlikely((rx_buffer_pgcnt - pagecnt_bias) > 1))
return false;
#else
/* The last offset is a bit aggressive in that we assume the
@@ -2018,14 +2019,25 @@ static void ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
#endif
}
+static int ixgbe_rx_buffer_page_count(struct ixgbe_rx_buffer *rx_buffer)
+{
+#if (PAGE_SIZE < 8192)
+ return page_count(rx_buffer->page);
+#else
+ return 0;
+#endif
+}
+
static struct ixgbe_rx_buffer *ixgbe_get_rx_buffer(struct ixgbe_ring *rx_ring,
union ixgbe_adv_rx_desc *rx_desc,
struct sk_buff **skb,
- const unsigned int size)
+ const unsigned int size,
+ int *rx_buffer_pgcnt)
{
struct ixgbe_rx_buffer *rx_buffer;
rx_buffer = &rx_ring->rx_buffer_info[rx_ring->next_to_clean];
+ *rx_buffer_pgcnt = ixgbe_rx_buffer_page_count(rx_buffer);
prefetchw(rx_buffer->page);
*skb = rx_buffer->skb;
@@ -2055,9 +2067,10 @@ static struct ixgbe_rx_buffer *ixgbe_get_rx_buffer(struct ixgbe_ring *rx_ring,
static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
struct ixgbe_rx_buffer *rx_buffer,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ int rx_buffer_pgcnt)
{
- if (ixgbe_can_reuse_rx_page(rx_buffer)) {
+ if (ixgbe_can_reuse_rx_page(rx_buffer, rx_buffer_pgcnt)) {
/* hand second half of page back to the ring */
ixgbe_reuse_rx_page(rx_ring, rx_buffer);
} else {
@@ -2296,6 +2309,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
u16 cleaned_count = ixgbe_desc_unused(rx_ring);
unsigned int xdp_xmit = 0;
struct xdp_buff xdp;
+ int rx_buffer_pgcnt;
xdp.rxq = &rx_ring->xdp_rxq;
@@ -2327,7 +2341,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
*/
dma_rmb();
- rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
+ rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size, &rx_buffer_pgcnt);
/* retrieve a buffer from the ring */
if (!skb) {
@@ -2372,7 +2386,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
break;
}
- ixgbe_put_rx_buffer(rx_ring, rx_buffer, skb);
+ ixgbe_put_rx_buffer(rx_ring, rx_buffer, skb, rx_buffer_pgcnt);
cleaned_count++;
/* place incomplete frames back on ring for completion */
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net 3/3] ice: avoid premature Rx buffer reuse
2020-08-25 9:16 [PATCH net 0/3] Avoid premature Rx buffer reuse for XDP_REDIRECT Björn Töpel
2020-08-25 9:16 ` [PATCH net 1/3] i40e: avoid premature Rx buffer reuse Björn Töpel
2020-08-25 9:16 ` [PATCH net 2/3] ixgbe: " Björn Töpel
@ 2020-08-25 9:16 ` Björn Töpel
2020-08-25 9:55 ` Li,Rongqing
2 siblings, 1 reply; 11+ messages in thread
From: Björn Töpel @ 2020-08-25 9:16 UTC (permalink / raw)
To: jeffrey.t.kirsher, intel-wired-lan
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev,
maciej.fijalkowski, piotr.raczynski, maciej.machnikowski,
lirongqing
From: Björn Töpel <bjorn.topel@intel.com>
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.
To avoid this, store the page count prior invoking xdp_do_redirect().
Fixes: efc2214b6047 ("ice: Add support for XDP")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
drivers/net/ethernet/intel/ice/ice_txrx.c | 31 ++++++++++++++++-------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 9d0d6b0025cf..03a88c8f17b7 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -768,7 +768,8 @@ ice_rx_buf_adjust_pg_offset(struct ice_rx_buf *rx_buf, unsigned int size)
* pointing to; otherwise, the DMA mapping needs to be destroyed and
* page freed
*/
-static bool ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf)
+static bool ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf,
+ int rx_buf_pgcnt)
{
unsigned int pagecnt_bias = rx_buf->pagecnt_bias;
struct page *page = rx_buf->page;
@@ -779,7 +780,7 @@ static bool ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf)
#if (PAGE_SIZE < 8192)
/* if we are only owner of page we can reuse it */
- if (unlikely((page_count(page) - pagecnt_bias) > 1))
+ if (unlikely((rx_buf_pgcnt - pagecnt_bias) > 1))
return false;
#else
#define ICE_LAST_OFFSET \
@@ -859,6 +860,15 @@ ice_reuse_rx_page(struct ice_ring *rx_ring, struct ice_rx_buf *old_buf)
new_buf->pagecnt_bias = old_buf->pagecnt_bias;
}
+static int ice_rx_buf_page_count(struct ice_rx_buf *rx_buf)
+{
+#if (PAGE_SIZE < 8192)
+ return page_count(rx_buf->page);
+#else
+ return 0;
+#endif
+}
+
/**
* ice_get_rx_buf - Fetch Rx buffer and synchronize data for use
* @rx_ring: Rx descriptor ring to transact packets on
@@ -870,11 +880,13 @@ ice_reuse_rx_page(struct ice_ring *rx_ring, struct ice_rx_buf *old_buf)
*/
static struct ice_rx_buf *
ice_get_rx_buf(struct ice_ring *rx_ring, struct sk_buff **skb,
- const unsigned int size)
+ const unsigned int size,
+ int *rx_buf_pgcnt)
{
struct ice_rx_buf *rx_buf;
rx_buf = &rx_ring->rx_buf[rx_ring->next_to_clean];
+ *rx_buf_pgcnt = ice_rx_buf_page_count(rx_buf);
prefetchw(rx_buf->page);
*skb = rx_buf->skb;
@@ -1017,7 +1029,7 @@ ice_construct_skb(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf,
* of the rx_buf. It will either recycle the buffer or unmap it and free
* the associated resources.
*/
-static void ice_put_rx_buf(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf)
+static void ice_put_rx_buf(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf, int rx_buf_pgcnt)
{
u16 ntc = rx_ring->next_to_clean + 1;
@@ -1028,7 +1040,7 @@ static void ice_put_rx_buf(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf)
if (!rx_buf)
return;
- if (ice_can_reuse_rx_page(rx_buf)) {
+ if (ice_can_reuse_rx_page(rx_buf, rx_buf_pgcnt)) {
/* hand second half of page back to the ring */
ice_reuse_rx_page(rx_ring, rx_buf);
} else {
@@ -1088,6 +1100,7 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
unsigned int xdp_res, xdp_xmit = 0;
struct bpf_prog *xdp_prog = NULL;
struct xdp_buff xdp;
+ int rx_buf_pgcnt;
bool failure;
xdp.rxq = &rx_ring->xdp_rxq;
@@ -1125,7 +1138,7 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
dma_rmb();
if (rx_desc->wb.rxdid == FDIR_DESC_RXDID || !rx_ring->netdev) {
- ice_put_rx_buf(rx_ring, NULL);
+ ice_put_rx_buf(rx_ring, NULL, 0);
cleaned_count++;
continue;
}
@@ -1134,7 +1147,7 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
ICE_RX_FLX_DESC_PKT_LEN_M;
/* retrieve a buffer from the ring */
- rx_buf = ice_get_rx_buf(rx_ring, &skb, size);
+ rx_buf = ice_get_rx_buf(rx_ring, &skb, size, &rx_buf_pgcnt);
if (!size) {
xdp.data = NULL;
@@ -1174,7 +1187,7 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
total_rx_pkts++;
cleaned_count++;
- ice_put_rx_buf(rx_ring, rx_buf);
+ ice_put_rx_buf(rx_ring, rx_buf, rx_buf_pgcnt);
continue;
construct_skb:
if (skb) {
@@ -1193,7 +1206,7 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
break;
}
- ice_put_rx_buf(rx_ring, rx_buf);
+ ice_put_rx_buf(rx_ring, rx_buf, rx_buf_pgcnt);
cleaned_count++;
/* skip if it is NOP desc */
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: [PATCH net 3/3] ice: avoid premature Rx buffer reuse
2020-08-25 9:16 ` [PATCH net 3/3] ice: " Björn Töpel
@ 2020-08-25 9:55 ` Li,Rongqing
0 siblings, 0 replies; 11+ messages in thread
From: Li,Rongqing @ 2020-08-25 9:55 UTC (permalink / raw)
To: Björn Töpel, jeffrey.t.kirsher, intel-wired-lan
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev,
maciej.fijalkowski, piotr.raczynski, maciej.machnikowski
> -----Original Message-----
> From: Björn Töpel [mailto:bjorn.topel@gmail.com]
> Sent: Tuesday, August 25, 2020 5:16 PM
> To: jeffrey.t.kirsher@intel.com; intel-wired-lan@lists.osuosl.org
> Cc: Björn Töpel <bjorn.topel@intel.com>; magnus.karlsson@intel.com;
> magnus.karlsson@gmail.com; netdev@vger.kernel.org;
> maciej.fijalkowski@intel.com; piotr.raczynski@intel.com;
> maciej.machnikowski@intel.com; Li,Rongqing <lirongqing@baidu.com>
> Subject: [PATCH net 3/3] ice: avoid premature Rx buffer reuse
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The page recycle code, incorrectly, relied on that a page fragment could not be
> freed inside xdp_do_redirect(). This assumption leads to that page fragments
> that are used by the stack/XDP redirect can be reused and overwritten.
>
> To avoid this, store the page count prior invoking xdp_do_redirect().
>
> Fixes: efc2214b6047 ("ice: Add support for XDP")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
Thanks
-Li
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH net 2/3] ixgbe: avoid premature Rx buffer reuse
2020-08-25 9:16 ` [PATCH net 2/3] ixgbe: " Björn Töpel
@ 2020-08-25 9:55 ` Li,Rongqing
2020-08-25 10:00 ` Björn Töpel
0 siblings, 1 reply; 11+ messages in thread
From: Li,Rongqing @ 2020-08-25 9:55 UTC (permalink / raw)
To: Björn Töpel, jeffrey.t.kirsher, intel-wired-lan
Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev,
maciej.fijalkowski, piotr.raczynski, maciej.machnikowski
> -----Original Message-----
> From: Björn Töpel [mailto:bjorn.topel@gmail.com]
> Sent: Tuesday, August 25, 2020 5:16 PM
> To: jeffrey.t.kirsher@intel.com; intel-wired-lan@lists.osuosl.org
> Cc: Björn Töpel <bjorn.topel@intel.com>; magnus.karlsson@intel.com;
> magnus.karlsson@gmail.com; netdev@vger.kernel.org;
> maciej.fijalkowski@intel.com; piotr.raczynski@intel.com;
> maciej.machnikowski@intel.com; Li,Rongqing <lirongqing@baidu.com>
> Subject: [PATCH net 2/3] ixgbe: avoid premature Rx buffer reuse
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The page recycle code, incorrectly, relied on that a page fragment could not be
> freed inside xdp_do_redirect(). This assumption leads to that page fragments
> that are used by the stack/XDP redirect can be reused and overwritten.
>
> To avoid this, store the page count prior invoking xdp_do_redirect().
>
> Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
Thanks
-Li
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net 2/3] ixgbe: avoid premature Rx buffer reuse
2020-08-25 9:55 ` Li,Rongqing
@ 2020-08-25 10:00 ` Björn Töpel
0 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-08-25 10:00 UTC (permalink / raw)
To: Li,Rongqing, Björn Töpel, jeffrey.t.kirsher, intel-wired-lan
Cc: magnus.karlsson, magnus.karlsson, netdev, maciej.fijalkowski,
piotr.raczynski, maciej.machnikowski
On 2020-08-25 11:55, Li,Rongqing wrote:
>
>
>> -----Original Message-----
>> From: Björn Töpel [mailto:bjorn.topel@gmail.com]
>> Sent: Tuesday, August 25, 2020 5:16 PM
>> To: jeffrey.t.kirsher@intel.com; intel-wired-lan@lists.osuosl.org
>> Cc: Björn Töpel <bjorn.topel@intel.com>; magnus.karlsson@intel.com;
>> magnus.karlsson@gmail.com; netdev@vger.kernel.org;
>> maciej.fijalkowski@intel.com; piotr.raczynski@intel.com;
>> maciej.machnikowski@intel.com; Li,Rongqing <lirongqing@baidu.com>
>> Subject: [PATCH net 2/3] ixgbe: avoid premature Rx buffer reuse
>>
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> The page recycle code, incorrectly, relied on that a page fragment could not be
>> freed inside xdp_do_redirect(). This assumption leads to that page fragments
>> that are used by the stack/XDP redirect can be reused and overwritten.
>>
>> To avoid this, store the page count prior invoking xdp_do_redirect().
>>
>> Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>
> Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
>
Thanks Li! I should have added that. Intel-folks, please make sure Li's
tags for ixgbe/ice are added.
Björn
> Thanks
>
> -Li
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net 1/3] i40e: avoid premature Rx buffer reuse
2020-08-25 9:16 ` [PATCH net 1/3] i40e: avoid premature Rx buffer reuse Björn Töpel
@ 2020-08-25 11:13 ` Maciej Fijalkowski
2020-08-25 11:25 ` Björn Töpel
0 siblings, 1 reply; 11+ messages in thread
From: Maciej Fijalkowski @ 2020-08-25 11:13 UTC (permalink / raw)
To: Björn Töpel
Cc: jeffrey.t.kirsher, intel-wired-lan, Björn Töpel,
magnus.karlsson, magnus.karlsson, netdev, piotr.raczynski,
maciej.machnikowski, lirongqing
On Tue, Aug 25, 2020 at 11:16:27AM +0200, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> The page recycle code, incorrectly, relied on that a page fragment
> could not be freed inside xdp_do_redirect(). This assumption leads to
> that page fragments that are used by the stack/XDP redirect can be
> reused and overwritten.
>
> To avoid this, store the page count prior invoking xdp_do_redirect().
>
> Longer explanation:
>
> Intel NICs have a recycle mechanism. The main idea is that a page is
> split into two parts. One part is owned by the driver, one part might
> be owned by someone else, such as the stack.
>
> t0: Page is allocated, and put on the Rx ring
> +---------------
> used by NIC ->| upper buffer
> (rx_buffer) +---------------
> | lower buffer
> +---------------
> page count == USHRT_MAX
> rx_buffer->pagecnt_bias == USHRT_MAX
>
> t1: Buffer is received, and passed to the stack (e.g.)
> +---------------
> | upper buff (skb)
> +---------------
> used by NIC ->| lower buffer
> (rx_buffer) +---------------
> page count == USHRT_MAX
> rx_buffer->pagecnt_bias == USHRT_MAX - 1
>
> t2: Buffer is received, and redirected
> +---------------
> | upper buff (skb)
> +---------------
> used by NIC ->| lower buffer
> (rx_buffer) +---------------
>
> Now, prior calling xdp_do_redirect():
> page count == USHRT_MAX
> rx_buffer->pagecnt_bias == USHRT_MAX - 2
>
> This means that buffer *cannot* be flipped/reused, because the skb is
> still using it.
>
> The problem arises when xdp_do_redirect() actually frees the
> segment. Then we get:
> page count == USHRT_MAX - 1
> rx_buffer->pagecnt_bias == USHRT_MAX - 2
>
> From a recycle perspective, the buffer can be flipped and reused,
> which means that the skb data area is passed to the Rx HW ring!
>
> To work around this, the page count is stored prior calling
> xdp_do_redirect().
>
> Note that this is not optimal, since the NIC could actually reuse the
> "lower buffer" again. However, then we need to track whether
> XDP_REDIRECT consumed the buffer or not.
>
> Fixes: d9314c474d4f ("i40e: add support for XDP_REDIRECT")
> Reported-by: Li RongQing <lirongqing@baidu.com>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 28 +++++++++++++++------
> 1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 3e5c566ceb01..5e446dc39190 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1873,7 +1873,8 @@ static inline bool i40e_page_is_reusable(struct page *page)
> *
> * In either case, if the page is reusable its refcount is increased.
> **/
> -static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer)
> +static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer,
> + int rx_buffer_pgcnt)
> {
> unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
> struct page *page = rx_buffer->page;
> @@ -1884,7 +1885,7 @@ static bool i40e_can_reuse_rx_page(struct i40e_rx_buffer *rx_buffer)
>
> #if (PAGE_SIZE < 8192)
> /* if we are only owner of page we can reuse it */
> - if (unlikely((page_count(page) - pagecnt_bias) > 1))
> + if (unlikely((rx_buffer_pgcnt - pagecnt_bias) > 1))
> return false;
> #else
> #define I40E_LAST_OFFSET \
> @@ -1939,6 +1940,15 @@ static void i40e_add_rx_frag(struct i40e_ring *rx_ring,
> #endif
> }
>
> +static int i40e_rx_buffer_page_count(struct i40e_rx_buffer *rx_buffer)
> +{
> +#if (PAGE_SIZE < 8192)
> + return page_count(rx_buffer->page);
> +#else
> + return 0;
> +#endif
> +}
> +
> /**
> * i40e_get_rx_buffer - Fetch Rx buffer and synchronize data for use
> * @rx_ring: rx descriptor ring to transact packets on
> @@ -1948,11 +1958,13 @@ static void i40e_add_rx_frag(struct i40e_ring *rx_ring,
> * for use by the CPU.
> */
> static struct i40e_rx_buffer *i40e_get_rx_buffer(struct i40e_ring *rx_ring,
> - const unsigned int size)
> + const unsigned int size,
> + int *rx_buffer_pgcnt)
> {
> struct i40e_rx_buffer *rx_buffer;
>
> rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean);
> + *rx_buffer_pgcnt = i40e_rx_buffer_page_count(rx_buffer);
What i previously meant was:
#if (PAGE_SIZE < 8192)
*rx_buffer_pgcnt = page_count(rx_buffer->page);
#endif
and see below
> prefetchw(rx_buffer->page);
>
> /* we are reusing so sync this buffer for CPU use */
> @@ -2112,9 +2124,10 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
> * either recycle the buffer or unmap it and free the associated resources.
> */
> static void i40e_put_rx_buffer(struct i40e_ring *rx_ring,
> - struct i40e_rx_buffer *rx_buffer)
> + struct i40e_rx_buffer *rx_buffer,
> + int rx_buffer_pgcnt)
> {
> - if (i40e_can_reuse_rx_page(rx_buffer)) {
> + if (i40e_can_reuse_rx_page(rx_buffer, rx_buffer_pgcnt)) {
> /* hand second half of page back to the ring */
> i40e_reuse_rx_page(rx_ring, rx_buffer);
> } else {
> @@ -2319,6 +2332,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
> unsigned int xdp_xmit = 0;
> bool failure = false;
> struct xdp_buff xdp;
> + int rx_buffer_pgcnt;
you could move scope this variable only for the
while (likely(total_rx_packets < (unsigned int)budget))
loop and init this to 0. then you could drop the helper function you've
added. and BTW the page_count is not being used for big pages but i agree
that it's better to have it set to 0.
>
> #if (PAGE_SIZE < 8192)
> xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
> @@ -2370,7 +2384,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
> break;
>
> i40e_trace(clean_rx_irq, rx_ring, rx_desc, skb);
> - rx_buffer = i40e_get_rx_buffer(rx_ring, size);
> + rx_buffer = i40e_get_rx_buffer(rx_ring, size, &rx_buffer_pgcnt);
>
> /* retrieve a buffer from the ring */
> if (!skb) {
> @@ -2413,7 +2427,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
> break;
> }
>
> - i40e_put_rx_buffer(rx_ring, rx_buffer);
> + i40e_put_rx_buffer(rx_ring, rx_buffer, rx_buffer_pgcnt);
> cleaned_count++;
>
> if (i40e_is_non_eop(rx_ring, rx_desc, skb))
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net 1/3] i40e: avoid premature Rx buffer reuse
2020-08-25 11:13 ` Maciej Fijalkowski
@ 2020-08-25 11:25 ` Björn Töpel
2020-08-25 11:29 ` Maciej Fijalkowski
0 siblings, 1 reply; 11+ messages in thread
From: Björn Töpel @ 2020-08-25 11:25 UTC (permalink / raw)
To: Maciej Fijalkowski, Björn Töpel
Cc: jeffrey.t.kirsher, intel-wired-lan, magnus.karlsson,
magnus.karlsson, netdev, piotr.raczynski, maciej.machnikowski,
lirongqing
On 2020-08-25 13:13, Maciej Fijalkowski wrote:
> On Tue, Aug 25, 2020 at 11:16:27AM +0200, Björn Töpel wrote:
[...]
>> struct i40e_rx_buffer *rx_buffer;
>>
>> rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean);
>> + *rx_buffer_pgcnt = i40e_rx_buffer_page_count(rx_buffer);
>
> What i previously meant was:
>
> #if (PAGE_SIZE < 8192)
> *rx_buffer_pgcnt = page_count(rx_buffer->page);
> #endif
>
> and see below
>
Right...
>> prefetchw(rx_buffer->page);
>>
>> /* we are reusing so sync this buffer for CPU use */
>> @@ -2112,9 +2124,10 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
>> * either recycle the buffer or unmap it and free the associated resources.
>> */
>> static void i40e_put_rx_buffer(struct i40e_ring *rx_ring,
>> - struct i40e_rx_buffer *rx_buffer)
>> + struct i40e_rx_buffer *rx_buffer,
>> + int rx_buffer_pgcnt)
>> {
>> - if (i40e_can_reuse_rx_page(rx_buffer)) {
>> + if (i40e_can_reuse_rx_page(rx_buffer, rx_buffer_pgcnt)) {
>> /* hand second half of page back to the ring */
>> i40e_reuse_rx_page(rx_ring, rx_buffer);
>> } else {
>> @@ -2319,6 +2332,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>> unsigned int xdp_xmit = 0;
>> bool failure = false;
>> struct xdp_buff xdp;
>> + int rx_buffer_pgcnt;
>
> you could move scope this variable only for the
>
> while (likely(total_rx_packets < (unsigned int)budget))
>
> loop and init this to 0. then you could drop the helper function you've
> added. and BTW the page_count is not being used for big pages but i agree
> that it's better to have it set to 0.
>
...but isn't it a bit nasty with an output parameter that relies on the
that the input was set to zero. I guess it's a matter of taste, but I
find that more error prone.
Let me know if you have strong feelings about this, and I'll respin (but
I rather not!).
Björn
>>
>> #if (PAGE_SIZE < 8192)
>> xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
>> @@ -2370,7 +2384,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>> break;
>>
>> i40e_trace(clean_rx_irq, rx_ring, rx_desc, skb);
>> - rx_buffer = i40e_get_rx_buffer(rx_ring, size);
>> + rx_buffer = i40e_get_rx_buffer(rx_ring, size, &rx_buffer_pgcnt);
>>
>> /* retrieve a buffer from the ring */
>> if (!skb) {
>> @@ -2413,7 +2427,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>> break;
>> }
>>
>> - i40e_put_rx_buffer(rx_ring, rx_buffer);
>> + i40e_put_rx_buffer(rx_ring, rx_buffer, rx_buffer_pgcnt);
>> cleaned_count++;
>>
>> if (i40e_is_non_eop(rx_ring, rx_desc, skb))
>> --
>> 2.25.1
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net 1/3] i40e: avoid premature Rx buffer reuse
2020-08-25 11:25 ` Björn Töpel
@ 2020-08-25 11:29 ` Maciej Fijalkowski
2020-08-25 11:37 ` Björn Töpel
0 siblings, 1 reply; 11+ messages in thread
From: Maciej Fijalkowski @ 2020-08-25 11:29 UTC (permalink / raw)
To: Björn Töpel
Cc: Björn Töpel, jeffrey.t.kirsher, intel-wired-lan,
magnus.karlsson, magnus.karlsson, netdev, piotr.raczynski,
maciej.machnikowski, lirongqing
On Tue, Aug 25, 2020 at 01:25:16PM +0200, Björn Töpel wrote:
> On 2020-08-25 13:13, Maciej Fijalkowski wrote:
> > On Tue, Aug 25, 2020 at 11:16:27AM +0200, Björn Töpel wrote:
> [...]
> > > struct i40e_rx_buffer *rx_buffer;
> > > rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean);
> > > + *rx_buffer_pgcnt = i40e_rx_buffer_page_count(rx_buffer);
> >
> > What i previously meant was:
> >
> > #if (PAGE_SIZE < 8192)
> > *rx_buffer_pgcnt = page_count(rx_buffer->page);
> > #endif
> >
> > and see below
> >
>
> Right...
>
> > > prefetchw(rx_buffer->page);
> > > /* we are reusing so sync this buffer for CPU use */
> > > @@ -2112,9 +2124,10 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
> > > * either recycle the buffer or unmap it and free the associated resources.
> > > */
> > > static void i40e_put_rx_buffer(struct i40e_ring *rx_ring,
> > > - struct i40e_rx_buffer *rx_buffer)
> > > + struct i40e_rx_buffer *rx_buffer,
> > > + int rx_buffer_pgcnt)
> > > {
> > > - if (i40e_can_reuse_rx_page(rx_buffer)) {
> > > + if (i40e_can_reuse_rx_page(rx_buffer, rx_buffer_pgcnt)) {
> > > /* hand second half of page back to the ring */
> > > i40e_reuse_rx_page(rx_ring, rx_buffer);
> > > } else {
> > > @@ -2319,6 +2332,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
> > > unsigned int xdp_xmit = 0;
> > > bool failure = false;
> > > struct xdp_buff xdp;
> > > + int rx_buffer_pgcnt;
> >
> > you could move scope this variable only for the
> >
> > while (likely(total_rx_packets < (unsigned int)budget))
> >
> > loop and init this to 0. then you could drop the helper function you've
> > added. and BTW the page_count is not being used for big pages but i agree
> > that it's better to have it set to 0.
> >
>
> ...but isn't it a bit nasty with an output parameter that relies on the that
> the input was set to zero. I guess it's a matter of taste, but I find that
> more error prone.
>
> Let me know if you have strong feelings about this, and I'll respin (but I
> rather not!).
Up to you. No strong feelings, i just think that i40e_rx_buffer_page_count
is not needed. But if you want to keep it, then i was usually asking
people to provide the doxygen descriptions for newly introduced
functions... :P
but scoping it still makes sense to me, static analysis tools would agree
with me I guess.
>
>
> Björn
>
>
> > > #if (PAGE_SIZE < 8192)
> > > xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
> > > @@ -2370,7 +2384,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
> > > break;
> > > i40e_trace(clean_rx_irq, rx_ring, rx_desc, skb);
> > > - rx_buffer = i40e_get_rx_buffer(rx_ring, size);
> > > + rx_buffer = i40e_get_rx_buffer(rx_ring, size, &rx_buffer_pgcnt);
> > > /* retrieve a buffer from the ring */
> > > if (!skb) {
> > > @@ -2413,7 +2427,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
> > > break;
> > > }
> > > - i40e_put_rx_buffer(rx_ring, rx_buffer);
> > > + i40e_put_rx_buffer(rx_ring, rx_buffer, rx_buffer_pgcnt);
> > > cleaned_count++;
> > > if (i40e_is_non_eop(rx_ring, rx_desc, skb))
> > > --
> > > 2.25.1
> > >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net 1/3] i40e: avoid premature Rx buffer reuse
2020-08-25 11:29 ` Maciej Fijalkowski
@ 2020-08-25 11:37 ` Björn Töpel
0 siblings, 0 replies; 11+ messages in thread
From: Björn Töpel @ 2020-08-25 11:37 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: Björn Töpel, jeffrey.t.kirsher, intel-wired-lan,
magnus.karlsson, magnus.karlsson, netdev, piotr.raczynski,
maciej.machnikowski, lirongqing
On 2020-08-25 13:29, Maciej Fijalkowski wrote:
> On Tue, Aug 25, 2020 at 01:25:16PM +0200, Björn Töpel wrote:
>> On 2020-08-25 13:13, Maciej Fijalkowski wrote:
>>> On Tue, Aug 25, 2020 at 11:16:27AM +0200, Björn Töpel wrote:
>> [...]
>>>> struct i40e_rx_buffer *rx_buffer;
>>>> rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean);
>>>> + *rx_buffer_pgcnt = i40e_rx_buffer_page_count(rx_buffer);
>>>
>>> What i previously meant was:
>>>
>>> #if (PAGE_SIZE < 8192)
>>> *rx_buffer_pgcnt = page_count(rx_buffer->page);
>>> #endif
>>>
>>> and see below
>>>
>>
>> Right...
>>
>>>> prefetchw(rx_buffer->page);
>>>> /* we are reusing so sync this buffer for CPU use */
>>>> @@ -2112,9 +2124,10 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
>>>> * either recycle the buffer or unmap it and free the associated resources.
>>>> */
>>>> static void i40e_put_rx_buffer(struct i40e_ring *rx_ring,
>>>> - struct i40e_rx_buffer *rx_buffer)
>>>> + struct i40e_rx_buffer *rx_buffer,
>>>> + int rx_buffer_pgcnt)
>>>> {
>>>> - if (i40e_can_reuse_rx_page(rx_buffer)) {
>>>> + if (i40e_can_reuse_rx_page(rx_buffer, rx_buffer_pgcnt)) {
>>>> /* hand second half of page back to the ring */
>>>> i40e_reuse_rx_page(rx_ring, rx_buffer);
>>>> } else {
>>>> @@ -2319,6 +2332,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>>>> unsigned int xdp_xmit = 0;
>>>> bool failure = false;
>>>> struct xdp_buff xdp;
>>>> + int rx_buffer_pgcnt;
>>>
>>> you could move scope this variable only for the
>>>
>>> while (likely(total_rx_packets < (unsigned int)budget))
>>>
>>> loop and init this to 0. then you could drop the helper function you've
>>> added. and BTW the page_count is not being used for big pages but i agree
>>> that it's better to have it set to 0.
>>>
>>
>> ...but isn't it a bit nasty with an output parameter that relies on the that
>> the input was set to zero. I guess it's a matter of taste, but I find that
>> more error prone.
>>
>> Let me know if you have strong feelings about this, and I'll respin (but I
>> rather not!).
>
> Up to you. No strong feelings, i just think that i40e_rx_buffer_page_count
> is not needed. But if you want to keep it, then i was usually asking
> people to provide the doxygen descriptions for newly introduced
> functions... :P
>
> but scoping it still makes sense to me, static analysis tools would agree
> with me I guess.
>
Fair enough! I'll spin a v2! Thanks for taking a look!
Björn
>>
>>
>> Björn
>>
>>
>>>> #if (PAGE_SIZE < 8192)
>>>> xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
>>>> @@ -2370,7 +2384,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>>>> break;
>>>> i40e_trace(clean_rx_irq, rx_ring, rx_desc, skb);
>>>> - rx_buffer = i40e_get_rx_buffer(rx_ring, size);
>>>> + rx_buffer = i40e_get_rx_buffer(rx_ring, size, &rx_buffer_pgcnt);
>>>> /* retrieve a buffer from the ring */
>>>> if (!skb) {
>>>> @@ -2413,7 +2427,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>>>> break;
>>>> }
>>>> - i40e_put_rx_buffer(rx_ring, rx_buffer);
>>>> + i40e_put_rx_buffer(rx_ring, rx_buffer, rx_buffer_pgcnt);
>>>> cleaned_count++;
>>>> if (i40e_is_non_eop(rx_ring, rx_desc, skb))
>>>> --
>>>> 2.25.1
>>>>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-08-25 11:40 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25 9:16 [PATCH net 0/3] Avoid premature Rx buffer reuse for XDP_REDIRECT Björn Töpel
2020-08-25 9:16 ` [PATCH net 1/3] i40e: avoid premature Rx buffer reuse Björn Töpel
2020-08-25 11:13 ` Maciej Fijalkowski
2020-08-25 11:25 ` Björn Töpel
2020-08-25 11:29 ` Maciej Fijalkowski
2020-08-25 11:37 ` Björn Töpel
2020-08-25 9:16 ` [PATCH net 2/3] ixgbe: " Björn Töpel
2020-08-25 9:55 ` Li,Rongqing
2020-08-25 10:00 ` Björn Töpel
2020-08-25 9:16 ` [PATCH net 3/3] ice: " Björn Töpel
2020-08-25 9:55 ` Li,Rongqing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).