Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
@ 2021-09-10 16:14 Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 01/18] net: skbuff: add size metadata to skb_shared_info for xdp Lorenzo Bianconi
                   ` (18 more replies)
  0 siblings, 19 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 8555 bytes --]

This series introduce XDP multi-buffer support. The mvneta driver is
the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
please focus on how these new types of xdp_{buff,frame} packets
traverse the different layers and the layout design. It is on purpose
that BPF-helpers are kept simple, as we don't want to expose the
internal layout to allow later changes.

The main idea for the new multi-buffer layout is to reuse the same
structure used for non-linear SKB. This rely on the "skb_shared_info"
struct at the end of the first buffer to link together subsequent
buffers. Keeping the layout compatible with SKBs is also done to ease
and speedup creating a SKB from an xdp_{buff,frame}.
Converting xdp_frame to SKB and deliver it to the network stack is shown
in patch 05/18 (e.g. cpumaps).

A multi-buffer bit (mb) has been introduced in the flags field of xdp_{buff,frame}
structure to notify the bpf/network layer if this is a xdp multi-buffer frame
(mb = 1) or not (mb = 0).
The mb bit will be set by a xdp multi-buffer capable driver only for
non-linear frames maintaining the capability to receive linear frames
without any extra cost since the skb_shared_info structure at the end
of the first buffer will be initialized only if mb is set.
Moreover the flags field in xdp_{buff,frame} will be reused even for
xdp rx csum offloading in future series.

Typical use cases for this series are:
- Jumbo-frames
- Packet header split (please see Google’s use-case @ NetDevConf 0x14, [0])
- TSO/GRO for XDP_REDIRECT

The two following ebpf helpers (and related selftests) has been introduced:
- bpf_xdp_adjust_data:
  Move xdp_md->data and xdp_md->data_end pointers in subsequent fragments
  according to the offset provided by the ebpf program. This helper can be
  used to read/write values in frame payload.
- bpf_xdp_get_buff_len:
  Return the total frame size (linear + paged parts)

bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
account xdp multi-buff frames.

A multi-buffer enabled NIC may receive XDP frames with multiple frags. If a BPF
program does not understand mb layouts its possible to contrive a BPF program
that incorrectly views data_end as the end of data when there is more data in
the payload. Note helpers will generally due the correct thing, for example
perf_output will consume entire payload. But, it is still possible some programs
could do the wrong thing even if in an edge case. Although we expect most BPF
programs not to be impacted we can't rule out, you've been warned.

More info about the main idea behind this approach can be found here [1][2].

Changes since v13:
- use u32 for xdp_buff/xdp_frame flags field
- rename xdp_frags_tsize in xdp_frags_truesize
- fixed comments

Changes since v12:
- fix bpf_xdp_adjust_data helper for single-buffer use case
- return -EFAULT in bpf_xdp_adjust_{head,tail} in case the data pointers are not
  properly reset
- collect ACKs from John

Changes since v11:
- add missing static to bpf_xdp_get_buff_len_proto structure
- fix bpf_xdp_adjust_data helper when offset is smaller than linear area length.

Changes since v10:
- move xdp->data to the requested payload offset instead of to the beginning of
  the fragment in bpf_xdp_adjust_data()

Changes since v9:
- introduce bpf_xdp_adjust_data helper and related selftest
- add xdp_frags_size and xdp_frags_tsize fields in skb_shared_info
- introduce xdp_update_skb_shared_info utility routine in ordere to not reset
  frags array in skb_shared_info converting from a xdp_buff/xdp_frame to a skb 
- simplify bpf_xdp_copy routine

Changes since v8:
- add proper dma unmapping if XDP_TX fails on mvneta for a xdp multi-buff
- switch back to skb_shared_info implementation from previous xdp_shared_info
  one
- avoid using a bietfield in xdp_buff/xdp_frame since it introduces performance
  regressions. Tested now on 10G NIC (ixgbe) to verify there are no performance
  penalties for regular codebase
- add bpf_xdp_get_buff_len helper and remove frame_length field in xdp ctx
- add data_len field in skb_shared_info struct
- introduce XDP_FLAGS_FRAGS_PF_MEMALLOC flag

Changes since v7:
- rebase on top of bpf-next
- fix sparse warnings
- improve comments for frame_length in include/net/xdp.h

Changes since v6:
- the main difference respect to previous versions is the new approach proposed
  by Eelco to pass full length of the packet to eBPF layer in XDP context
- reintroduce multi-buff support to eBPF kself-tests
- reintroduce multi-buff support to bpf_xdp_adjust_tail helper
- introduce multi-buffer support to bpf_xdp_copy helper
- rebase on top of bpf-next

Changes since v5:
- rebase on top of bpf-next
- initialize mb bit in xdp_init_buff() and drop per-driver initialization
- drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
- postpone introduction of frame_length field in XDP ctx to another series
- minor changes

Changes since v4:
- rebase ontop of bpf-next
- introduce xdp_shared_info to build xdp multi-buff instead of using the
  skb_shared_info struct
- introduce frame_length in xdp ctx
- drop previous bpf helpers
- fix bpf_xdp_adjust_tail for xdp multi-buff
- introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
- fix xdp_return_frame_bulk for xdp multi-buff

Changes since v3:
- rebase ontop of bpf-next
- add patch 10/13 to copy back paged data from a xdp multi-buff frame to
  userspace buffer for xdp multi-buff selftests

Changes since v2:
- add throughput measurements
- drop bpf_xdp_adjust_mb_header bpf helper
- introduce selftest for xdp multibuffer
- addressed comments on bpf_xdp_get_frags_count
- introduce xdp multi-buff support to cpumaps

Changes since v1:
- Fix use-after-free in xdp_return_{buff/frame}
- Introduce bpf helpers
- Introduce xdp_mb sample program
- access skb_shared_info->nr_frags only on the last fragment

Changes since RFC:
- squash multi-buffer bit initialization in a single patch
- add mvneta non-linear XDP buff support for tx side

[0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
[2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)

Eelco Chaudron (3):
  bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  bpf: add multi-buffer support to xdp copy helpers
  bpf: update xdp_adjust_tail selftest to include multi-buffer

Lorenzo Bianconi (15):
  net: skbuff: add size metadata to skb_shared_info for xdp
  xdp: introduce flags field in xdp_buff/xdp_frame
  net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  net: mvneta: simplify mvneta_swbm_add_rx_fragment management
  net: xdp: add xdp_update_skb_shared_info utility routine
  net: marvell: rely on xdp_update_skb_shared_info utility routine
  xdp: add multi-buff support to xdp_return_{buff/frame}
  net: mvneta: add multi buffer support to XDP_TX
  net: mvneta: enable jumbo frames for XDP
  bpf: introduce bpf_xdp_get_buff_len helper
  bpf: move user_size out of bpf_test_init
  bpf: introduce multibuff support to bpf_prog_test_run_xdp()
  bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
    signature
  net: xdp: introduce bpf_xdp_adjust_data helper
  bpf: add bpf_xdp_adjust_data selftest

 drivers/net/ethernet/marvell/mvneta.c         | 204 ++++++++++-------
 include/linux/skbuff.h                        |   6 +-
 include/net/xdp.h                             |  95 +++++++-
 include/uapi/linux/bpf.h                      |  39 ++++
 kernel/trace/bpf_trace.c                      |   3 +
 net/bpf/test_run.c                            | 117 ++++++++--
 net/core/filter.c                             | 216 +++++++++++++++++-
 net/core/xdp.c                                |  76 +++++-
 tools/include/uapi/linux/bpf.h                |  39 ++++
 .../bpf/prog_tests/xdp_adjust_data.c          |  63 +++++
 .../bpf/prog_tests/xdp_adjust_tail.c          | 118 ++++++++++
 .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 151 ++++++++----
 .../bpf/progs/test_xdp_adjust_tail_grow.c     |  10 +-
 .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++-
 .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
 .../bpf/progs/test_xdp_update_frags.c         |  45 ++++
 16 files changed, 1051 insertions(+), 165 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_adjust_data.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_update_frags.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 01/18] net: skbuff: add size metadata to skb_shared_info for xdp
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:18   ` Jesper Dangaard Brouer
  2021-09-10 16:14 ` [PATCH v14 bpf-next 02/18] xdp: introduce flags field in xdp_buff/xdp_frame Lorenzo Bianconi
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Introduce xdp_frags_truesize field in skb_shared_info data structure
to store xdp_buff/xdp_frame truesize (xdp_frags_truesize will be used
in xdp multi-buff support). In order to not increase skb_shared_info
size we will use a hole due to skb_shared_info alignment.
Introduce xdp_frags_size field in skb_shared_info data structure
reusing gso_type field in order to store xdp_buff/xdp_frame paged size.
xdp_frags_size will be used in xdp multi-buff support.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/linux/skbuff.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6bdb0db3e825..769ffd09f975 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -522,13 +522,17 @@ struct skb_shared_info {
 	unsigned short	gso_segs;
 	struct sk_buff	*frag_list;
 	struct skb_shared_hwtstamps hwtstamps;
-	unsigned int	gso_type;
+	union {
+		unsigned int	gso_type;
+		unsigned int	xdp_frags_size;
+	};
 	u32		tskey;
 
 	/*
 	 * Warning : all fields before dataref are cleared in __alloc_skb()
 	 */
 	atomic_t	dataref;
+	unsigned int	xdp_frags_truesize;
 
 	/* Intermediate layers must ensure that destructor_arg
 	 * remains valid until skb destructor */
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 02/18] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 01/18] net: skbuff: add size metadata to skb_shared_info for xdp Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:19   ` Jesper Dangaard Brouer
  2021-09-10 16:14 ` [PATCH v14 bpf-next 03/18] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Introduce flags field in xdp_frame and xdp_buffer data structures
to define additional buffer features. At the moment the only
supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
frame (mb = 1). In the latter case the driver is expected to initialize
the skb_shared_info structure at the end of the first buffer to link
together subsequent buffers belonging to the same frame.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index ad5b02dcb6f4..fd31cc33aa4f 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -66,6 +66,10 @@ struct xdp_txq_info {
 	struct net_device *dev;
 };
 
+enum xdp_buff_flags {
+	XDP_FLAGS_MULTI_BUFF	= BIT(0), /* non-linear xdp buff */
+};
+
 struct xdp_buff {
 	void *data;
 	void *data_end;
@@ -74,13 +78,30 @@ struct xdp_buff {
 	struct xdp_rxq_info *rxq;
 	struct xdp_txq_info *txq;
 	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
+	u32 flags; /* supported values defined in xdp_buff_flags */
 };
 
+static __always_inline bool xdp_buff_is_mb(struct xdp_buff *xdp)
+{
+	return !!(xdp->flags & XDP_FLAGS_MULTI_BUFF);
+}
+
+static __always_inline void xdp_buff_set_mb(struct xdp_buff *xdp)
+{
+	xdp->flags |= XDP_FLAGS_MULTI_BUFF;
+}
+
+static __always_inline void xdp_buff_clear_mb(struct xdp_buff *xdp)
+{
+	xdp->flags &= ~XDP_FLAGS_MULTI_BUFF;
+}
+
 static __always_inline void
 xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
 {
 	xdp->frame_sz = frame_sz;
 	xdp->rxq = rxq;
+	xdp->flags = 0;
 }
 
 static __always_inline void
@@ -122,8 +143,14 @@ struct xdp_frame {
 	 */
 	struct xdp_mem_info mem;
 	struct net_device *dev_rx; /* used by cpumap */
+	u32 flags; /* supported values defined in xdp_buff_flags */
 };
 
+static __always_inline bool xdp_frame_is_mb(struct xdp_frame *frame)
+{
+	return !!(frame->flags & XDP_FLAGS_MULTI_BUFF);
+}
+
 #define XDP_BULK_QUEUE_SIZE	16
 struct xdp_frame_bulk {
 	int count;
@@ -180,6 +207,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
 	xdp->data_end = frame->data + frame->len;
 	xdp->data_meta = frame->data - frame->metasize;
 	xdp->frame_sz = frame->frame_sz;
+	xdp->flags = frame->flags;
 }
 
 static inline
@@ -206,6 +234,7 @@ int xdp_update_frame_from_buff(struct xdp_buff *xdp,
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
 	xdp_frame->frame_sz = xdp->frame_sz;
+	xdp_frame->flags = xdp->flags;
 
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 03/18] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 01/18] net: skbuff: add size metadata to skb_shared_info for xdp Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 02/18] xdp: introduce flags field in xdp_buff/xdp_frame Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-20  8:25   ` Shay Agroskin
  2021-09-10 16:14 ` [PATCH v14 bpf-next 04/18] net: mvneta: simplify mvneta_swbm_add_rx_fragment management Lorenzo Bianconi
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Update multi-buffer bit (mb) in xdp_buff to notify XDP/eBPF layer and
XDP remote drivers if this is a "non-linear" XDP buffer. Access
skb_shared_info only if xdp_buff mb is set in order to avoid possible
cache-misses.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 9d460a270601..0c7b84ca6efc 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2037,9 +2037,14 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 {
 	int i;
 
+	if (likely(!xdp_buff_is_mb(xdp)))
+		goto out;
+
 	for (i = 0; i < sinfo->nr_frags; i++)
 		page_pool_put_full_page(rxq->page_pool,
 					skb_frag_page(&sinfo->frags[i]), true);
+
+out:
 	page_pool_put_page(rxq->page_pool, virt_to_head_page(xdp->data),
 			   sync_len, true);
 }
@@ -2241,7 +2246,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 	int data_len = -MVNETA_MH_SIZE, len;
 	struct net_device *dev = pp->dev;
 	enum dma_data_direction dma_dir;
-	struct skb_shared_info *sinfo;
 
 	if (*size > MVNETA_MAX_RX_BUF_SIZE) {
 		len = MVNETA_MAX_RX_BUF_SIZE;
@@ -2261,11 +2265,9 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 
 	/* Prefetch header */
 	prefetch(data);
+	xdp_buff_clear_mb(xdp);
 	xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
 			 data_len, false);
-
-	sinfo = xdp_get_shared_info_from_buff(xdp);
-	sinfo->nr_frags = 0;
 }
 
 static void
@@ -2299,6 +2301,9 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 		skb_frag_off_set(frag, pp->rx_offset_correction);
 		skb_frag_size_set(frag, data_len);
 		__skb_frag_set_page(frag, page);
+
+		if (!xdp_buff_is_mb(xdp))
+			xdp_buff_set_mb(xdp);
 	} else {
 		page_pool_put_full_page(rxq->page_pool, page, true);
 	}
@@ -2320,8 +2325,12 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 		      struct xdp_buff *xdp, u32 desc_status)
 {
 	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
-	int i, num_frags = sinfo->nr_frags;
 	struct sk_buff *skb;
+	u8 num_frags;
+	int i;
+
+	if (unlikely(xdp_buff_is_mb(xdp)))
+		num_frags = sinfo->nr_frags;
 
 	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
 	if (!skb)
@@ -2333,6 +2342,9 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 	skb_put(skb, xdp->data_end - xdp->data);
 	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
 
+	if (likely(!xdp_buff_is_mb(xdp)))
+		goto out;
+
 	for (i = 0; i < num_frags; i++) {
 		skb_frag_t *frag = &sinfo->frags[i];
 
@@ -2341,6 +2353,7 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 				skb_frag_size(frag), PAGE_SIZE);
 	}
 
+out:
 	return skb;
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 04/18] net: mvneta: simplify mvneta_swbm_add_rx_fragment management
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (2 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 03/18] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 05/18] net: xdp: add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Relying on xdp mb bit, remove skb_shared_info structure allocated on the
stack in mvneta_rx_swbm routine and simplify mvneta_swbm_add_rx_fragment
accessing skb_shared_info in the xdp_buff structure directly. There is no
performance penalty in this approach since mvneta_swbm_add_rx_fragment
is run just for multi-buff use-case.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 42 ++++++++++-----------------
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 0c7b84ca6efc..99976679c6e5 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2032,9 +2032,9 @@ int mvneta_rx_refill_queue(struct mvneta_port *pp, struct mvneta_rx_queue *rxq)
 
 static void
 mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
-		    struct xdp_buff *xdp, struct skb_shared_info *sinfo,
-		    int sync_len)
+		    struct xdp_buff *xdp, int sync_len)
 {
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
 	int i;
 
 	if (likely(!xdp_buff_is_mb(xdp)))
@@ -2182,7 +2182,6 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	       struct bpf_prog *prog, struct xdp_buff *xdp,
 	       u32 frame_sz, struct mvneta_stats *stats)
 {
-	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
 	unsigned int len, data_len, sync;
 	u32 ret, act;
 
@@ -2203,7 +2202,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (unlikely(err)) {
-			mvneta_xdp_put_buff(pp, rxq, xdp, sinfo, sync);
+			mvneta_xdp_put_buff(pp, rxq, xdp, sync);
 			ret = MVNETA_XDP_DROPPED;
 		} else {
 			ret = MVNETA_XDP_REDIR;
@@ -2214,7 +2213,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
 		if (ret != MVNETA_XDP_TX)
-			mvneta_xdp_put_buff(pp, rxq, xdp, sinfo, sync);
+			mvneta_xdp_put_buff(pp, rxq, xdp, sync);
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2223,7 +2222,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		trace_xdp_exception(pp->dev, prog, act);
 		fallthrough;
 	case XDP_DROP:
-		mvneta_xdp_put_buff(pp, rxq, xdp, sinfo, sync);
+		mvneta_xdp_put_buff(pp, rxq, xdp, sync);
 		ret = MVNETA_XDP_DROPPED;
 		stats->xdp_drop++;
 		break;
@@ -2275,9 +2274,9 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 			    struct mvneta_rx_desc *rx_desc,
 			    struct mvneta_rx_queue *rxq,
 			    struct xdp_buff *xdp, int *size,
-			    struct skb_shared_info *xdp_sinfo,
 			    struct page *page)
 {
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
 	struct net_device *dev = pp->dev;
 	enum dma_data_direction dma_dir;
 	int data_len, len;
@@ -2295,8 +2294,11 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 				len, dma_dir);
 	rx_desc->buf_phys_addr = 0;
 
-	if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
-		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
+	if (!xdp_buff_is_mb(xdp))
+		sinfo->nr_frags = 0;
+
+	if (data_len > 0 && sinfo->nr_frags < MAX_SKB_FRAGS) {
+		skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags++];
 
 		skb_frag_off_set(frag, pp->rx_offset_correction);
 		skb_frag_size_set(frag, data_len);
@@ -2307,16 +2309,6 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 	} else {
 		page_pool_put_full_page(rxq->page_pool, page, true);
 	}
-
-	/* last fragment */
-	if (len == *size) {
-		struct skb_shared_info *sinfo;
-
-		sinfo = xdp_get_shared_info_from_buff(xdp);
-		sinfo->nr_frags = xdp_sinfo->nr_frags;
-		memcpy(sinfo->frags, xdp_sinfo->frags,
-		       sinfo->nr_frags * sizeof(skb_frag_t));
-	}
 	*size -= len;
 }
 
@@ -2364,7 +2356,6 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 {
 	int rx_proc = 0, rx_todo, refill, size = 0;
 	struct net_device *dev = pp->dev;
-	struct skb_shared_info sinfo;
 	struct mvneta_stats ps = {};
 	struct bpf_prog *xdp_prog;
 	u32 desc_status, frame_sz;
@@ -2373,8 +2364,6 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 	xdp_init_buff(&xdp_buf, PAGE_SIZE, &rxq->xdp_rxq);
 	xdp_buf.data_hard_start = NULL;
 
-	sinfo.nr_frags = 0;
-
 	/* Get number of received packets */
 	rx_todo = mvneta_rxq_busy_desc_num_get(pp, rxq);
 
@@ -2416,7 +2405,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 			}
 
 			mvneta_swbm_add_rx_fragment(pp, rx_desc, rxq, &xdp_buf,
-						    &size, &sinfo, page);
+						    &size, page);
 		} /* Middle or Last descriptor */
 
 		if (!(rx_status & MVNETA_RXD_LAST_DESC))
@@ -2424,7 +2413,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 			continue;
 
 		if (size) {
-			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
+			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, -1);
 			goto next;
 		}
 
@@ -2436,7 +2425,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 		if (IS_ERR(skb)) {
 			struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
 
-			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
+			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, -1);
 
 			u64_stats_update_begin(&stats->syncp);
 			stats->es.skb_alloc_error++;
@@ -2453,11 +2442,10 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 		napi_gro_receive(napi, skb);
 next:
 		xdp_buf.data_hard_start = NULL;
-		sinfo.nr_frags = 0;
 	}
 
 	if (xdp_buf.data_hard_start)
-		mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
+		mvneta_xdp_put_buff(pp, rxq, &xdp_buf, -1);
 
 	if (ps.xdp_redirect)
 		xdp_do_flush_map();
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 05/18] net: xdp: add xdp_update_skb_shared_info utility routine
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (3 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 04/18] net: mvneta: simplify mvneta_swbm_add_rx_fragment management Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 06/18] net: marvell: rely on " Lorenzo Bianconi
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Introduce xdp_update_skb_shared_info routine to update frags array
metadata in skb_shared_info data structure converting to a skb from
a xdp_buff or xdp_frame.
According to the current skb_shared_info architecture in
xdp_frame/xdp_buff and to the xdp multi-buff support, there is
no need to run skb_add_rx_frag() and reset frags array converting the buffer
to a skb since the frag array will be in the same position for xdp_buff/xdp_frame
and for the skb, we just need to update memory metadata.
Introduce XDP_FLAGS_PF_MEMALLOC flag in xdp_buff_flags in order to mark
the xdp_buff or xdp_frame as under memory-pressure if pages of the frags array
are under memory pressure. Doing so we can avoid looping over all fragments in
xdp_update_skb_shared_info routine. The driver is expected to set the
flag constructing the xdp_buffer using xdp_buff_set_frag_pfmemalloc
utility routine.
Rely on xdp_update_skb_shared_info in __xdp_build_skb_from_frame routine
converting the multi-buff xdp_frame to a skb after performing a XDP_REDIRECT.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 33 ++++++++++++++++++++++++++++++++-
 net/core/xdp.c    | 17 +++++++++++++++++
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index fd31cc33aa4f..c4d68b693431 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -67,7 +67,10 @@ struct xdp_txq_info {
 };
 
 enum xdp_buff_flags {
-	XDP_FLAGS_MULTI_BUFF	= BIT(0), /* non-linear xdp buff */
+	XDP_FLAGS_MULTI_BUFF		= BIT(0), /* non-linear xdp buff */
+	XDP_FLAGS_FRAGS_PF_MEMALLOC	= BIT(1), /* xdp multi-buff paged memory
+						   * is under pressure
+						   */
 };
 
 struct xdp_buff {
@@ -96,6 +99,16 @@ static __always_inline void xdp_buff_clear_mb(struct xdp_buff *xdp)
 	xdp->flags &= ~XDP_FLAGS_MULTI_BUFF;
 }
 
+static __always_inline bool xdp_buff_is_frag_pfmemalloc(struct xdp_buff *xdp)
+{
+	return !!(xdp->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC);
+}
+
+static __always_inline void xdp_buff_set_frag_pfmemalloc(struct xdp_buff *xdp)
+{
+	xdp->flags |= XDP_FLAGS_FRAGS_PF_MEMALLOC;
+}
+
 static __always_inline void
 xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
 {
@@ -151,6 +164,11 @@ static __always_inline bool xdp_frame_is_mb(struct xdp_frame *frame)
 	return !!(frame->flags & XDP_FLAGS_MULTI_BUFF);
 }
 
+static __always_inline bool xdp_frame_is_frag_pfmemalloc(struct xdp_frame *frame)
+{
+	return !!(frame->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC);
+}
+
 #define XDP_BULK_QUEUE_SIZE	16
 struct xdp_frame_bulk {
 	int count;
@@ -186,6 +204,19 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
 	frame->dev_rx = NULL;
 }
 
+static inline void
+xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
+			   unsigned int size, unsigned int truesize,
+			   bool pfmemalloc)
+{
+	skb_shinfo(skb)->nr_frags = nr_frags;
+
+	skb->len += size;
+	skb->data_len += size;
+	skb->truesize += truesize;
+	skb->pfmemalloc |= pfmemalloc;
+}
+
 /* Avoids inlining WARN macro in fast-path */
 void xdp_warn(const char *msg, const char *func, const int line);
 #define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index cc92ccb38432..da10b7c25054 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -531,8 +531,20 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct sk_buff *skb,
 					   struct net_device *dev)
 {
+	unsigned int frag_size, frag_truesize;
 	unsigned int headroom, frame_size;
 	void *hard_start;
+	u8 nr_frags;
+
+	/* xdp multi-buff frame */
+	if (unlikely(xdp_frame_is_mb(xdpf))) {
+		struct skb_shared_info *sinfo;
+
+		sinfo = xdp_get_shared_info_from_frame(xdpf);
+		frag_truesize = sinfo->xdp_frags_truesize;
+		frag_size = sinfo->xdp_frags_size;
+		nr_frags = sinfo->nr_frags;
+	}
 
 	/* Part of headroom was reserved to xdpf */
 	headroom = sizeof(*xdpf) + xdpf->headroom;
@@ -552,6 +564,11 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 	if (xdpf->metasize)
 		skb_metadata_set(skb, xdpf->metasize);
 
+	if (unlikely(xdp_frame_is_mb(xdpf)))
+		xdp_update_skb_shared_info(skb, nr_frags,
+					   frag_size, frag_truesize,
+					   xdp_frame_is_frag_pfmemalloc(xdpf));
+
 	/* Essential SKB info: protocol and skb->dev */
 	skb->protocol = eth_type_trans(skb, dev);
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 06/18] net: marvell: rely on xdp_update_skb_shared_info utility routine
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (4 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 05/18] net: xdp: add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 07/18] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Rely on xdp_update_skb_shared_info routine in order to avoid
resetting frags array in skb_shared_info structure building
the skb in mvneta_swbm_build_skb(). Frags array is expected to
be initialized by the receiving driver building the xdp_buff
and here we just need to update memory metadata.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 35 +++++++++++++++------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 99976679c6e5..0d52473cc066 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2304,11 +2304,19 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 		skb_frag_size_set(frag, data_len);
 		__skb_frag_set_page(frag, page);
 
-		if (!xdp_buff_is_mb(xdp))
+		if (!xdp_buff_is_mb(xdp)) {
+			sinfo->xdp_frags_size = *size;
 			xdp_buff_set_mb(xdp);
+		}
+		if (page_is_pfmemalloc(page))
+			xdp_buff_set_frag_pfmemalloc(xdp);
 	} else {
 		page_pool_put_full_page(rxq->page_pool, page, true);
 	}
+
+	/* last fragment */
+	if (len == *size)
+		sinfo->xdp_frags_truesize = sinfo->nr_frags * PAGE_SIZE;
 	*size -= len;
 }
 
@@ -2316,13 +2324,18 @@ static struct sk_buff *
 mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 		      struct xdp_buff *xdp, u32 desc_status)
 {
-	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+	unsigned int size, truesize;
 	struct sk_buff *skb;
 	u8 num_frags;
-	int i;
 
-	if (unlikely(xdp_buff_is_mb(xdp)))
+	if (unlikely(xdp_buff_is_mb(xdp))) {
+		struct skb_shared_info *sinfo;
+
+		sinfo = xdp_get_shared_info_from_buff(xdp);
+		truesize = sinfo->xdp_frags_truesize;
+		size = sinfo->xdp_frags_size;
 		num_frags = sinfo->nr_frags;
+	}
 
 	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
 	if (!skb)
@@ -2334,18 +2347,10 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 	skb_put(skb, xdp->data_end - xdp->data);
 	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
 
-	if (likely(!xdp_buff_is_mb(xdp)))
-		goto out;
-
-	for (i = 0; i < num_frags; i++) {
-		skb_frag_t *frag = &sinfo->frags[i];
-
-		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
-				skb_frag_page(frag), skb_frag_off(frag),
-				skb_frag_size(frag), PAGE_SIZE);
-	}
+	if (unlikely(xdp_buff_is_mb(xdp)))
+		xdp_update_skb_shared_info(skb, num_frags, size, truesize,
+					   xdp_buff_is_frag_pfmemalloc(xdp));
 
-out:
 	return skb;
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 07/18] xdp: add multi-buff support to xdp_return_{buff/frame}
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (5 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 06/18] net: marvell: rely on " Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 08/18] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Take into account if the received xdp_buff/xdp_frame is non-linear
recycling/returning the frame memory to the allocator or into
xdp_frame_bulk.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 18 ++++++++++++++--
 net/core/xdp.c    | 54 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index c4d68b693431..e44964329fd1 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -306,10 +306,24 @@ void __xdp_release_frame(void *data, struct xdp_mem_info *mem);
 static inline void xdp_release_frame(struct xdp_frame *xdpf)
 {
 	struct xdp_mem_info *mem = &xdpf->mem;
+	struct skb_shared_info *sinfo;
+	int i;
 
 	/* Curr only page_pool needs this */
-	if (mem->type == MEM_TYPE_PAGE_POOL)
-		__xdp_release_frame(xdpf->data, mem);
+	if (mem->type != MEM_TYPE_PAGE_POOL)
+		return;
+
+	if (likely(!xdp_frame_is_mb(xdpf)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_release_frame(page_address(page), mem);
+	}
+out:
+	__xdp_release_frame(xdpf->data, mem);
 }
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
diff --git a/net/core/xdp.c b/net/core/xdp.c
index da10b7c25054..ffdc776a9e95 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -376,12 +376,38 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_frame_is_mb(xdpf)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdpf->mem, false, NULL);
+	}
+out:
 	__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
 {
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_frame_is_mb(xdpf)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdpf->mem, true, NULL);
+	}
+out:
 	__xdp_return(xdpf->data, &xdpf->mem, true, NULL);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
@@ -417,7 +443,7 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 	struct xdp_mem_allocator *xa;
 
 	if (mem->type != MEM_TYPE_PAGE_POOL) {
-		__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
+		xdp_return_frame(xdpf);
 		return;
 	}
 
@@ -436,12 +462,38 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 		bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
 	}
 
+	if (unlikely(xdp_frame_is_mb(xdpf))) {
+		struct skb_shared_info *sinfo;
+		int i;
+
+		sinfo = xdp_get_shared_info_from_frame(xdpf);
+		for (i = 0; i < sinfo->nr_frags; i++) {
+			skb_frag_t *frag = &sinfo->frags[i];
+
+			bq->q[bq->count++] = skb_frag_address(frag);
+			if (bq->count == XDP_BULK_QUEUE_SIZE)
+				xdp_flush_frame_bulk(bq);
+		}
+	}
 	bq->q[bq->count++] = xdpf->data;
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
 
 void xdp_return_buff(struct xdp_buff *xdp)
 {
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_buff_is_mb(xdp)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdp->rxq->mem, true, xdp);
+	}
+out:
 	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 08/18] net: mvneta: add multi buffer support to XDP_TX
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (6 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 07/18] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 09/18] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Introduce the capability to map non-linear xdp buffer running
mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 112 +++++++++++++++++---------
 1 file changed, 76 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 0d52473cc066..25f63f9efdf0 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1856,8 +1856,8 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
 			bytes_compl += buf->skb->len;
 			pkts_compl++;
 			dev_kfree_skb_any(buf->skb);
-		} else if (buf->type == MVNETA_TYPE_XDP_TX ||
-			   buf->type == MVNETA_TYPE_XDP_NDO) {
+		} else if ((buf->type == MVNETA_TYPE_XDP_TX ||
+			    buf->type == MVNETA_TYPE_XDP_NDO) && buf->xdpf) {
 			if (napi && buf->type == MVNETA_TYPE_XDP_TX)
 				xdp_return_frame_rx_napi(buf->xdpf);
 			else
@@ -2051,47 +2051,87 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 
 static int
 mvneta_xdp_submit_frame(struct mvneta_port *pp, struct mvneta_tx_queue *txq,
-			struct xdp_frame *xdpf, bool dma_map)
+			struct xdp_frame *xdpf, int *nxmit_byte, bool dma_map)
 {
-	struct mvneta_tx_desc *tx_desc;
-	struct mvneta_tx_buf *buf;
-	dma_addr_t dma_addr;
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
+	struct device *dev = pp->dev->dev.parent;
+	struct mvneta_tx_desc *tx_desc = NULL;
+	int i, num_frames = 1;
+	struct page *page;
+
+	if (unlikely(xdp_frame_is_mb(xdpf)))
+		num_frames += sinfo->nr_frags;
 
-	if (txq->count >= txq->tx_stop_threshold)
+	if (txq->count + num_frames >= txq->size)
 		return MVNETA_XDP_DROPPED;
 
-	tx_desc = mvneta_txq_next_desc_get(txq);
+	for (i = 0; i < num_frames; i++) {
+		struct mvneta_tx_buf *buf = &txq->buf[txq->txq_put_index];
+		skb_frag_t *frag = NULL;
+		int len = xdpf->len;
+		dma_addr_t dma_addr;
 
-	buf = &txq->buf[txq->txq_put_index];
-	if (dma_map) {
-		/* ndo_xdp_xmit */
-		dma_addr = dma_map_single(pp->dev->dev.parent, xdpf->data,
-					  xdpf->len, DMA_TO_DEVICE);
-		if (dma_mapping_error(pp->dev->dev.parent, dma_addr)) {
-			mvneta_txq_desc_put(txq);
-			return MVNETA_XDP_DROPPED;
+		if (unlikely(i)) { /* paged area */
+			frag = &sinfo->frags[i - 1];
+			len = skb_frag_size(frag);
 		}
-		buf->type = MVNETA_TYPE_XDP_NDO;
-	} else {
-		struct page *page = virt_to_page(xdpf->data);
 
-		dma_addr = page_pool_get_dma_addr(page) +
-			   sizeof(*xdpf) + xdpf->headroom;
-		dma_sync_single_for_device(pp->dev->dev.parent, dma_addr,
-					   xdpf->len, DMA_BIDIRECTIONAL);
-		buf->type = MVNETA_TYPE_XDP_TX;
+		tx_desc = mvneta_txq_next_desc_get(txq);
+		if (dma_map) {
+			/* ndo_xdp_xmit */
+			void *data;
+
+			data = unlikely(frag) ? skb_frag_address(frag)
+					      : xdpf->data;
+			dma_addr = dma_map_single(dev, data, len,
+						  DMA_TO_DEVICE);
+			if (dma_mapping_error(dev, dma_addr)) {
+				mvneta_txq_desc_put(txq);
+				goto unmap;
+			}
+
+			buf->type = MVNETA_TYPE_XDP_NDO;
+		} else {
+			page = unlikely(frag) ? skb_frag_page(frag)
+					      : virt_to_page(xdpf->data);
+			dma_addr = page_pool_get_dma_addr(page);
+			if (unlikely(frag))
+				dma_addr += skb_frag_off(frag);
+			else
+				dma_addr += sizeof(*xdpf) + xdpf->headroom;
+			dma_sync_single_for_device(dev, dma_addr, len,
+						   DMA_BIDIRECTIONAL);
+			buf->type = MVNETA_TYPE_XDP_TX;
+		}
+		buf->xdpf = unlikely(i) ? NULL : xdpf;
+
+		tx_desc->command = unlikely(i) ? 0 : MVNETA_TXD_F_DESC;
+		tx_desc->buf_phys_addr = dma_addr;
+		tx_desc->data_size = len;
+		*nxmit_byte += len;
+
+		mvneta_txq_inc_put(txq);
 	}
-	buf->xdpf = xdpf;
 
-	tx_desc->command = MVNETA_TXD_FLZ_DESC;
-	tx_desc->buf_phys_addr = dma_addr;
-	tx_desc->data_size = xdpf->len;
+	/*last descriptor */
+	if (likely(tx_desc))
+		tx_desc->command |= MVNETA_TXD_L_DESC | MVNETA_TXD_Z_PAD;
 
-	mvneta_txq_inc_put(txq);
-	txq->pending++;
-	txq->count++;
+	txq->pending += num_frames;
+	txq->count += num_frames;
 
 	return MVNETA_XDP_TX;
+
+unmap:
+	for (i--; i >= 0; i--) {
+		mvneta_txq_desc_put(txq);
+		tx_desc = txq->descs + txq->next_desc_to_proc;
+		dma_unmap_single(dev, tx_desc->buf_phys_addr,
+				 tx_desc->data_size,
+				 DMA_TO_DEVICE);
+	}
+
+	return MVNETA_XDP_DROPPED;
 }
 
 static int
@@ -2100,8 +2140,8 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
 	struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
 	struct mvneta_tx_queue *txq;
 	struct netdev_queue *nq;
+	int cpu, nxmit_byte = 0;
 	struct xdp_frame *xdpf;
-	int cpu;
 	u32 ret;
 
 	xdpf = xdp_convert_buff_to_frame(xdp);
@@ -2113,10 +2153,10 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
 	nq = netdev_get_tx_queue(pp->dev, txq->id);
 
 	__netif_tx_lock(nq, cpu);
-	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, false);
+	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, &nxmit_byte, false);
 	if (ret == MVNETA_XDP_TX) {
 		u64_stats_update_begin(&stats->syncp);
-		stats->es.ps.tx_bytes += xdpf->len;
+		stats->es.ps.tx_bytes += nxmit_byte;
 		stats->es.ps.tx_packets++;
 		stats->es.ps.xdp_tx++;
 		u64_stats_update_end(&stats->syncp);
@@ -2155,11 +2195,11 @@ mvneta_xdp_xmit(struct net_device *dev, int num_frame,
 
 	__netif_tx_lock(nq, cpu);
 	for (i = 0; i < num_frame; i++) {
-		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], true);
+		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], &nxmit_byte,
+					      true);
 		if (ret != MVNETA_XDP_TX)
 			break;
 
-		nxmit_byte += frames[i]->len;
 		nxmit++;
 	}
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 09/18] net: mvneta: enable jumbo frames for XDP
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (7 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 08/18] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 10/18] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Enable the capability to receive jumbo frames even if the interface is
running in XDP mode

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 25f63f9efdf0..f7a39cfb0f1a 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3767,11 +3767,6 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 		mtu = ALIGN(MVNETA_RX_PKT_SIZE(mtu), 8);
 	}
 
-	if (pp->xdp_prog && mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		netdev_info(dev, "Illegal MTU value %d for XDP mode\n", mtu);
-		return -EINVAL;
-	}
-
 	dev->mtu = mtu;
 
 	if (!netif_running(dev)) {
@@ -4481,11 +4476,6 @@ static int mvneta_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
 	struct mvneta_port *pp = netdev_priv(dev);
 	struct bpf_prog *old_prog;
 
-	if (prog && dev->mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		NL_SET_ERR_MSG_MOD(extack, "MTU too large for XDP");
-		return -EOPNOTSUPP;
-	}
-
 	if (pp->bm_priv) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Hardware Buffer Management not supported on XDP");
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 10/18] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (8 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 09/18] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-16 16:55   ` Jakub Kicinski
  2021-09-10 16:14 ` [PATCH v14 bpf-next 11/18] bpf: introduce bpf_xdp_get_buff_len helper Lorenzo Bianconi
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

From: Eelco Chaudron <echaudro@redhat.com>

This change adds support for tail growing and shrinking for XDP multi-buff.

When called on a multi-buffer packet with a grow request, it will always
work on the last fragment of the packet. So the maximum grow size is the
last fragments tailroom, i.e. no new buffer will be allocated.

When shrinking, it will work from the last fragment, all the way down to
the base buffer depending on the shrinking size. It's important to mention
that once you shrink down the fragment(s) are freed, so you can not grow
again to the original size.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 include/net/xdp.h |  9 +++++++
 net/core/filter.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++
 net/core/xdp.c    |  5 ++--
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index e44964329fd1..789251e464de 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -145,6 +145,13 @@ xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
 	return (struct skb_shared_info *)xdp_data_hard_end(xdp);
 }
 
+static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
+{
+	struct page *page = skb_frag_page(frag);
+
+	return page_size(page) - skb_frag_size(frag) - skb_frag_off(frag);
+}
+
 struct xdp_frame {
 	void *data;
 	u16 len;
@@ -290,6 +297,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
 	return xdp_frame;
 }
 
+void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+		  struct xdp_buff *xdp);
 void xdp_return_frame(struct xdp_frame *xdpf);
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
 void xdp_return_buff(struct xdp_buff *xdp);
diff --git a/net/core/filter.c b/net/core/filter.c
index 2e32cee2c469..49feba8f8966 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3818,11 +3818,71 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
+{
+	struct skb_shared_info *sinfo;
+
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+	if (offset >= 0) {
+		skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
+		int size;
+
+		if (unlikely(offset > xdp_get_frag_tailroom(frag)))
+			return -EINVAL;
+
+		size = skb_frag_size(frag);
+		memset(skb_frag_address(frag) + size, 0, offset);
+		skb_frag_size_set(frag, size + offset);
+		sinfo->xdp_frags_size += offset;
+	} else {
+		int i, n_frags_free = 0, len_free = 0, tlen_free = 0;
+
+		offset = abs(offset);
+		if (unlikely(offset > ((int)(xdp->data_end - xdp->data) +
+				       sinfo->xdp_frags_size - ETH_HLEN)))
+			return -EINVAL;
+
+		for (i = sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) {
+			skb_frag_t *frag = &sinfo->frags[i];
+			int size = skb_frag_size(frag);
+			int shrink = min_t(int, offset, size);
+
+			len_free += shrink;
+			offset -= shrink;
+
+			if (unlikely(size == shrink)) {
+				struct page *page = skb_frag_page(frag);
+
+				__xdp_return(page_address(page), &xdp->rxq->mem,
+					     false, NULL);
+				tlen_free += page_size(page);
+				n_frags_free++;
+			} else {
+				skb_frag_size_set(frag, size - shrink);
+				break;
+			}
+		}
+		sinfo->nr_frags -= n_frags_free;
+		sinfo->xdp_frags_size -= len_free;
+		sinfo->xdp_frags_truesize -= tlen_free;
+
+		if (unlikely(offset > 0)) {
+			xdp_buff_clear_mb(xdp);
+			xdp->data_end -= offset;
+		}
+	}
+
+	return 0;
+}
+
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
 	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
 	void *data_end = xdp->data_end + offset;
 
+	if (unlikely(xdp_buff_is_mb(xdp)))
+		return bpf_xdp_mb_adjust_tail(xdp, offset);
+
 	/* Notice that xdp_data_hard_end have reserved some tailroom */
 	if (unlikely(data_end > data_hard_end))
 		return -EINVAL;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index ffdc776a9e95..c3a3f36416b8 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -339,8 +339,8 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
  * is used for those calls sites.  Thus, allowing for faster recycling
  * of xdp_frames/pages in those cases.
  */
-static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
-			 struct xdp_buff *xdp)
+void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+		  struct xdp_buff *xdp)
 {
 	struct xdp_mem_allocator *xa;
 	struct page *page;
@@ -373,6 +373,7 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 		break;
 	}
 }
+EXPORT_SYMBOL_GPL(__xdp_return);
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 11/18] bpf: introduce bpf_xdp_get_buff_len helper
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (9 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 10/18] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 12/18] bpf: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Introduce bpf_xdp_get_buff_len helper in order to return the xdp buffer
total size (linear and paged area)

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/uapi/linux/bpf.h       |  7 +++++++
 net/core/filter.c              | 23 +++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  7 +++++++
 3 files changed, 37 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 791f31dd0abe..1fd87bd5848b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4877,6 +4877,12 @@ union bpf_attr {
  *		Get the struct pt_regs associated with **task**.
  *	Return
  *		A pointer to struct pt_regs.
+ *
+ * u64 bpf_xdp_get_buff_len(struct xdp_buff *xdp_md)
+ *	Description
+ *		Get the total size of a given xdp buff (linear and paged area)
+ *	Return
+ *		The total size of a given xdp buffer.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5055,6 +5061,7 @@ union bpf_attr {
 	FN(get_func_ip),		\
 	FN(get_attach_cookie),		\
 	FN(task_pt_regs),		\
+	FN(xdp_get_buff_len),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/core/filter.c b/net/core/filter.c
index 49feba8f8966..d4982a20c8bd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3784,6 +3784,27 @@ static const struct bpf_func_proto sk_skb_change_head_proto = {
 	.arg2_type	= ARG_ANYTHING,
 	.arg3_type	= ARG_ANYTHING,
 };
+
+BPF_CALL_1(bpf_xdp_get_buff_len, struct  xdp_buff*, xdp)
+{
+	u64 len = xdp->data_end - xdp->data;
+
+	if (unlikely(xdp_buff_is_mb(xdp))) {
+		struct skb_shared_info *sinfo;
+
+		sinfo = xdp_get_shared_info_from_buff(xdp);
+		len += sinfo->xdp_frags_size;
+	}
+	return len;
+}
+
+static const struct bpf_func_proto bpf_xdp_get_buff_len_proto = {
+	.func		= bpf_xdp_get_buff_len,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
 static unsigned long xdp_get_metalen(const struct xdp_buff *xdp)
 {
 	return xdp_data_meta_unsupported(xdp) ? 0 :
@@ -7531,6 +7552,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_redirect_map_proto;
 	case BPF_FUNC_xdp_adjust_tail:
 		return &bpf_xdp_adjust_tail_proto;
+	case BPF_FUNC_xdp_get_buff_len:
+		return &bpf_xdp_get_buff_len_proto;
 	case BPF_FUNC_fib_lookup:
 		return &bpf_xdp_fib_lookup_proto;
 	case BPF_FUNC_check_mtu:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 791f31dd0abe..1fd87bd5848b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4877,6 +4877,12 @@ union bpf_attr {
  *		Get the struct pt_regs associated with **task**.
  *	Return
  *		A pointer to struct pt_regs.
+ *
+ * u64 bpf_xdp_get_buff_len(struct xdp_buff *xdp_md)
+ *	Description
+ *		Get the total size of a given xdp buff (linear and paged area)
+ *	Return
+ *		The total size of a given xdp buffer.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5055,6 +5061,7 @@ union bpf_attr {
 	FN(get_func_ip),		\
 	FN(get_attach_cookie),		\
 	FN(task_pt_regs),		\
+	FN(xdp_get_buff_len),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 12/18] bpf: add multi-buffer support to xdp copy helpers
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (10 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 11/18] bpf: introduce bpf_xdp_get_buff_len helper Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 13/18] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

From: Eelco Chaudron <echaudro@redhat.com>

This patch adds support for multi-buffer for the following helpers:
  - bpf_xdp_output()
  - bpf_perf_event_output()

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 kernel/trace/bpf_trace.c                      |   3 +
 net/core/filter.c                             |  68 +++++++-
 .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 151 +++++++++++++-----
 .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
 4 files changed, 180 insertions(+), 44 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8e2eb950aa82..279a365620ee 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1441,6 +1441,7 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
 
 extern const struct bpf_func_proto bpf_skb_output_proto;
 extern const struct bpf_func_proto bpf_xdp_output_proto;
+extern const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto;
 
 BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
 	   struct bpf_map *, map, u64, flags)
@@ -1538,6 +1539,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sock_from_file_proto;
 	case BPF_FUNC_get_socket_cookie:
 		return &bpf_get_socket_ptr_cookie_proto;
+	case BPF_FUNC_xdp_get_buff_len:
+		return &bpf_xdp_get_buff_len_trace_proto;
 #endif
 	case BPF_FUNC_seq_printf:
 		return prog->expected_attach_type == BPF_TRACE_ITER ?
diff --git a/net/core/filter.c b/net/core/filter.c
index d4982a20c8bd..e1dc86f0930f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3805,6 +3805,15 @@ static const struct bpf_func_proto bpf_xdp_get_buff_len_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
+BTF_ID_LIST_SINGLE(bpf_xdp_get_buff_len_bpf_ids, struct, xdp_buff)
+
+const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto = {
+	.func		= bpf_xdp_get_buff_len,
+	.gpl_only	= false,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg1_btf_id	= &bpf_xdp_get_buff_len_bpf_ids[0],
+};
+
 static unsigned long xdp_get_metalen(const struct xdp_buff *xdp)
 {
 	return xdp_data_meta_unsupported(xdp) ? 0 :
@@ -4619,10 +4628,52 @@ static const struct bpf_func_proto bpf_sk_ancestor_cgroup_id_proto = {
 };
 #endif
 
-static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
+static unsigned long bpf_xdp_copy(void *dst_buff, const void *ctx,
 				  unsigned long off, unsigned long len)
 {
-	memcpy(dst_buff, src_buff + off, len);
+	unsigned long base_len, copy_len, frag_off_total;
+	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_buff_is_mb(xdp))) {
+		memcpy(dst_buff, xdp->data + off, len);
+		return 0;
+	}
+
+	base_len = xdp->data_end - xdp->data;
+	frag_off_total = base_len;
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+
+	/* If we need to copy data from the base buffer do it */
+	if (off < base_len) {
+		copy_len = min(len, base_len - off);
+		memcpy(dst_buff, xdp->data + off, copy_len);
+
+		off += copy_len;
+		len -= copy_len;
+		dst_buff += copy_len;
+	}
+
+	/* Copy any remaining data from the fragments */
+	for (i = 0; len && i < sinfo->nr_frags; i++) {
+		skb_frag_t *frag = &sinfo->frags[i];
+		unsigned long frag_len, frag_off;
+
+		frag_len = skb_frag_size(frag);
+		frag_off = off - frag_off_total;
+		if (frag_off < frag_len) {
+			copy_len = min(len, frag_len - frag_off);
+			memcpy(dst_buff,
+			       skb_frag_address(frag) + frag_off, copy_len);
+
+			off += copy_len;
+			len -= copy_len;
+			dst_buff += copy_len;
+		}
+		frag_off_total += frag_len;
+	}
+
 	return 0;
 }
 
@@ -4634,10 +4685,19 @@ BPF_CALL_5(bpf_xdp_event_output, struct xdp_buff *, xdp, struct bpf_map *, map,
 	if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK)))
 		return -EINVAL;
 	if (unlikely(!xdp ||
-		     xdp_size > (unsigned long)(xdp->data_end - xdp->data)))
+		     (likely(!xdp_buff_is_mb(xdp)) &&
+		      xdp_size > (unsigned long)(xdp->data_end - xdp->data))))
 		return -EFAULT;
+	if (unlikely(xdp_buff_is_mb(xdp))) {
+		struct skb_shared_info *sinfo;
+
+		sinfo = xdp_get_shared_info_from_buff(xdp);
+		if (unlikely(xdp_size > ((int)(xdp->data_end - xdp->data) +
+					 sinfo->xdp_frags_size)))
+			return -EFAULT;
+	}
 
-	return bpf_event_output(map, flags, meta, meta_size, xdp->data,
+	return bpf_event_output(map, flags, meta, meta_size, xdp,
 				xdp_size, bpf_xdp_copy);
 }
 
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c b/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
index 3bd5904b4db5..fe279c1c0e48 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
@@ -10,11 +10,20 @@ struct meta {
 	int pkt_len;
 };
 
+struct test_ctx_s {
+	bool passed;
+	int pkt_size;
+};
+
+struct test_ctx_s test_ctx;
+
 static void on_sample(void *ctx, int cpu, void *data, __u32 size)
 {
-	int duration = 0;
 	struct meta *meta = (struct meta *)data;
 	struct ipv4_packet *trace_pkt_v4 = data + sizeof(*meta);
+	unsigned char *raw_pkt = data + sizeof(*meta);
+	struct test_ctx_s *tst_ctx = ctx;
+	int duration = 0;
 
 	if (CHECK(size < sizeof(pkt_v4) + sizeof(*meta),
 		  "check_size", "size %u < %zu\n",
@@ -25,25 +34,114 @@ static void on_sample(void *ctx, int cpu, void *data, __u32 size)
 		  "meta->ifindex = %d\n", meta->ifindex))
 		return;
 
-	if (CHECK(meta->pkt_len != sizeof(pkt_v4), "check_meta_pkt_len",
-		  "meta->pkt_len = %zd\n", sizeof(pkt_v4)))
+	if (CHECK(meta->pkt_len != tst_ctx->pkt_size, "check_meta_pkt_len",
+		  "meta->pkt_len = %d\n", tst_ctx->pkt_size))
 		return;
 
 	if (CHECK(memcmp(trace_pkt_v4, &pkt_v4, sizeof(pkt_v4)),
 		  "check_packet_content", "content not the same\n"))
 		return;
 
-	*(bool *)ctx = true;
+	if (meta->pkt_len > sizeof(pkt_v4)) {
+		for (int i = 0; i < (meta->pkt_len - sizeof(pkt_v4)); i++) {
+			if (raw_pkt[i + sizeof(pkt_v4)] != (unsigned char)i) {
+				CHECK(true, "check_packet_content",
+				      "byte %zu does not match %u != %u\n",
+				      i + sizeof(pkt_v4),
+				      raw_pkt[i + sizeof(pkt_v4)],
+				      (unsigned char)i);
+				break;
+			}
+		}
+	}
+
+	tst_ctx->passed = true;
 }
 
-void test_xdp_bpf2bpf(void)
+#define BUF_SZ	9000
+
+static int run_xdp_bpf2bpf_pkt_size(int pkt_fd, struct perf_buffer *pb,
+				    struct test_xdp_bpf2bpf *ftrace_skel,
+				    int pkt_size)
 {
 	__u32 duration = 0, retval, size;
-	char buf[128];
+	__u8 *buf, *buf_in;
+	int err, ret = 0;
+
+	if (pkt_size > BUF_SZ || pkt_size < sizeof(pkt_v4))
+		return -EINVAL;
+
+	buf_in = malloc(BUF_SZ);
+	if (CHECK(!buf_in, "buf_in malloc()", "error:%s\n", strerror(errno)))
+		return -ENOMEM;
+
+	buf = malloc(BUF_SZ);
+	if (CHECK(!buf, "buf malloc()", "error:%s\n", strerror(errno))) {
+		ret = -ENOMEM;
+		goto free_buf_in;
+	}
+
+	test_ctx.passed = false;
+	test_ctx.pkt_size = pkt_size;
+
+	memcpy(buf_in, &pkt_v4, sizeof(pkt_v4));
+	if (pkt_size > sizeof(pkt_v4)) {
+		for (int i = 0; i < (pkt_size - sizeof(pkt_v4)); i++)
+			buf_in[i + sizeof(pkt_v4)] = i;
+	}
+
+	/* Run test program */
+	err = bpf_prog_test_run(pkt_fd, 1, buf_in, pkt_size,
+				buf, &size, &retval, &duration);
+
+	if (CHECK(err || retval != XDP_PASS || size != pkt_size,
+		  "ipv4", "err %d errno %d retval %d size %d\n",
+		  err, errno, retval, size)) {
+		ret = err ? err : -EINVAL;
+		goto free_buf;
+	}
+
+	/* Make sure bpf_xdp_output() was triggered and it sent the expected
+	 * data to the perf ring buffer.
+	 */
+	err = perf_buffer__poll(pb, 100);
+	if (CHECK(err <= 0, "perf_buffer__poll", "err %d\n", err)) {
+		ret = -EINVAL;
+		goto free_buf;
+	}
+
+	if (CHECK_FAIL(!test_ctx.passed)) {
+		ret = -EINVAL;
+		goto free_buf;
+	}
+
+	/* Verify test results */
+	if (CHECK(ftrace_skel->bss->test_result_fentry != if_nametoindex("lo"),
+		  "result", "fentry failed err %llu\n",
+		  ftrace_skel->bss->test_result_fentry)) {
+		ret = -EINVAL;
+		goto free_buf;
+	}
+
+	if (CHECK(ftrace_skel->bss->test_result_fexit != XDP_PASS, "result",
+		  "fexit failed err %llu\n",
+		  ftrace_skel->bss->test_result_fexit))
+		ret = -EINVAL;
+
+free_buf:
+	free(buf);
+free_buf_in:
+	free(buf_in);
+
+	return ret;
+}
+
+void test_xdp_bpf2bpf(void)
+{
 	int err, pkt_fd, map_fd;
-	bool passed = false;
-	struct iphdr *iph = (void *)buf + sizeof(struct ethhdr);
-	struct iptnl_info value4 = {.family = AF_INET};
+	__u32 duration = 0;
+	int pkt_sizes[] = {sizeof(pkt_v4), 1024, 4100, 8200};
+	struct iptnl_info value4 = {.family = AF_INET6};
 	struct test_xdp *pkt_skel = NULL;
 	struct test_xdp_bpf2bpf *ftrace_skel = NULL;
 	struct vip key4 = {.protocol = 6, .family = AF_INET};
@@ -87,40 +185,15 @@ void test_xdp_bpf2bpf(void)
 
 	/* Set up perf buffer */
 	pb_opts.sample_cb = on_sample;
-	pb_opts.ctx = &passed;
+	pb_opts.ctx = &test_ctx;
 	pb = perf_buffer__new(bpf_map__fd(ftrace_skel->maps.perf_buf_map),
-			      1, &pb_opts);
+			      8, &pb_opts);
 	if (!ASSERT_OK_PTR(pb, "perf_buf__new"))
 		goto out;
 
-	/* Run test program */
-	err = bpf_prog_test_run(pkt_fd, 1, &pkt_v4, sizeof(pkt_v4),
-				buf, &size, &retval, &duration);
-
-	if (CHECK(err || retval != XDP_TX || size != 74 ||
-		  iph->protocol != IPPROTO_IPIP, "ipv4",
-		  "err %d errno %d retval %d size %d\n",
-		  err, errno, retval, size))
-		goto out;
-
-	/* Make sure bpf_xdp_output() was triggered and it sent the expected
-	 * data to the perf ring buffer.
-	 */
-	err = perf_buffer__poll(pb, 100);
-	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
-		goto out;
-
-	CHECK_FAIL(!passed);
-
-	/* Verify test results */
-	if (CHECK(ftrace_skel->bss->test_result_fentry != if_nametoindex("lo"),
-		  "result", "fentry failed err %llu\n",
-		  ftrace_skel->bss->test_result_fentry))
-		goto out;
-
-	CHECK(ftrace_skel->bss->test_result_fexit != XDP_TX, "result",
-	      "fexit failed err %llu\n", ftrace_skel->bss->test_result_fexit);
-
+	for (int i = 0; i < ARRAY_SIZE(pkt_sizes); i++)
+		run_xdp_bpf2bpf_pkt_size(pkt_fd, pb, ftrace_skel,
+					 pkt_sizes[i]);
 out:
 	if (pb)
 		perf_buffer__free(pb);
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
index a038e827f850..902b54190377 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
@@ -49,7 +49,7 @@ int BPF_PROG(trace_on_entry, struct xdp_buff *xdp)
 	void *data = (void *)(long)xdp->data;
 
 	meta.ifindex = xdp->rxq->dev->ifindex;
-	meta.pkt_len = data_end - data;
+	meta.pkt_len = bpf_xdp_get_buff_len((struct xdp_md *)xdp);
 	bpf_xdp_output(xdp, &perf_buf_map,
 		       ((__u64) meta.pkt_len << 32) |
 		       BPF_F_CURRENT_CPU,
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 13/18] bpf: move user_size out of bpf_test_init
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (11 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 12/18] bpf: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 14/18] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Rely on data_size_in in bpf_test_init routine signature. This is a
preliminary patch to introduce xdp multi-buff selftest

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 1153b89c9d93..82b34632a66c 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -246,11 +246,10 @@ bool bpf_prog_test_check_kfunc_call(u32 kfunc_id)
 	return btf_id_set_contains(&test_sk_kfunc_ids, kfunc_id);
 }
 
-static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
-			   u32 headroom, u32 tailroom)
+static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
+			   u32 size, u32 headroom, u32 tailroom)
 {
 	void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
-	u32 user_size = kattr->test.data_size_in;
 	void *data;
 
 	if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom)
@@ -569,7 +568,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (kattr->test.flags || kattr->test.cpu)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
+	data = bpf_test_init(kattr, kattr->test.data_size_in,
+			     size, NET_SKB_PAD + NET_IP_ALIGN,
 			     SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
 	if (IS_ERR(data))
 		return PTR_ERR(data);
@@ -780,7 +780,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	/* XDP have extra tailroom as (most) drivers use full page */
 	max_data_sz = 4096 - headroom - tailroom;
 
-	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
+	data = bpf_test_init(kattr, kattr->test.data_size_in,
+			     max_data_sz, headroom, tailroom);
 	if (IS_ERR(data)) {
 		ret = PTR_ERR(data);
 		goto free_ctx;
@@ -864,7 +865,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (size < ETH_HLEN)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, 0, 0);
+	data = bpf_test_init(kattr, kattr->test.data_size_in, size, 0, 0);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 14/18] bpf: introduce multibuff support to bpf_prog_test_run_xdp()
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (12 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 13/18] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 15/18] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Introduce the capability to allocate a xdp multi-buff in
bpf_prog_test_run_xdp routine. This is a preliminary patch to introduce
the selftests for new xdp multi-buff ebpf helpers

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 54 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 11 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 82b34632a66c..d4200dc63f5f 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -748,16 +748,16 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
 	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	u32 headroom = XDP_PACKET_HEADROOM;
 	u32 size = kattr->test.data_size_in;
+	u32 headroom = XDP_PACKET_HEADROOM;
+	u32 retval, duration, max_data_sz;
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
+	struct skb_shared_info *sinfo;
 	struct xdp_buff xdp = {};
-	u32 retval, duration;
+	int i, ret = -EINVAL;
 	struct xdp_md *ctx;
-	u32 max_data_sz;
 	void *data;
-	int ret = -EINVAL;
 
 	if (prog->expected_attach_type == BPF_XDP_DEVMAP ||
 	    prog->expected_attach_type == BPF_XDP_CPUMAP)
@@ -777,11 +777,10 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		headroom -= ctx->data;
 	}
 
-	/* XDP have extra tailroom as (most) drivers use full page */
 	max_data_sz = 4096 - headroom - tailroom;
+	size = min_t(u32, size, max_data_sz);
 
-	data = bpf_test_init(kattr, kattr->test.data_size_in,
-			     max_data_sz, headroom, tailroom);
+	data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom);
 	if (IS_ERR(data)) {
 		ret = PTR_ERR(data);
 		goto free_ctx;
@@ -791,11 +790,45 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	xdp_init_buff(&xdp, headroom + max_data_sz + tailroom,
 		      &rxqueue->xdp_rxq);
 	xdp_prepare_buff(&xdp, data, headroom, size, true);
+	sinfo = xdp_get_shared_info_from_buff(&xdp);
 
 	ret = xdp_convert_md_to_buff(ctx, &xdp);
 	if (ret)
 		goto free_data;
 
+	if (unlikely(kattr->test.data_size_in > size)) {
+		void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
+
+		while (size < kattr->test.data_size_in) {
+			struct page *page;
+			skb_frag_t *frag;
+			int data_len;
+
+			page = alloc_page(GFP_KERNEL);
+			if (!page) {
+				ret = -ENOMEM;
+				goto out;
+			}
+
+			frag = &sinfo->frags[sinfo->nr_frags++];
+			__skb_frag_set_page(frag, page);
+
+			data_len = min_t(int, kattr->test.data_size_in - size,
+					 PAGE_SIZE);
+			skb_frag_size_set(frag, data_len);
+
+			if (copy_from_user(page_address(page), data_in + size,
+					   data_len)) {
+				ret = -EFAULT;
+				goto out;
+			}
+			sinfo->xdp_frags_truesize += PAGE_SIZE;
+			sinfo->xdp_frags_size += data_len;
+			size += data_len;
+		}
+		xdp_buff_set_mb(&xdp);
+	}
+
 	bpf_prog_change_xdp(NULL, prog);
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	/* We convert the xdp_buff back to an xdp_md before checking the return
@@ -806,10 +839,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (ret)
 		goto out;
 
-	if (xdp.data_meta != data + headroom ||
-	    xdp.data_end != xdp.data_meta + size)
-		size = xdp.data_end - xdp.data_meta;
-
+	size = xdp.data_end - xdp.data_meta + sinfo->xdp_frags_size;
 	ret = bpf_test_finish(kattr, uattr, xdp.data_meta, size, retval,
 			      duration);
 	if (!ret)
@@ -819,6 +849,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 out:
 	bpf_prog_change_xdp(prog, NULL);
 free_data:
+	for (i = 0; i < sinfo->nr_frags; i++)
+		__free_page(skb_frag_page(&sinfo->frags[i]));
 	kfree(data);
 free_ctx:
 	kfree(ctx);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 15/18] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (13 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 14/18] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 16/18] bpf: update xdp_adjust_tail selftest to include multi-buffer Lorenzo Bianconi
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

introduce xdp_shared_info pointer in bpf_test_finish signature in order
to copy back paged data from a xdp multi-buff frame to userspace buffer

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 48 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index d4200dc63f5f..4f5c28c4f888 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -129,7 +129,8 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 
 static int bpf_test_finish(const union bpf_attr *kattr,
 			   union bpf_attr __user *uattr, const void *data,
-			   u32 size, u32 retval, u32 duration)
+			   struct skb_shared_info *sinfo, u32 size,
+			   u32 retval, u32 duration)
 {
 	void __user *data_out = u64_to_user_ptr(kattr->test.data_out);
 	int err = -EFAULT;
@@ -144,8 +145,36 @@ static int bpf_test_finish(const union bpf_attr *kattr,
 		err = -ENOSPC;
 	}
 
-	if (data_out && copy_to_user(data_out, data, copy_size))
-		goto out;
+	if (data_out) {
+		int len = sinfo ? copy_size - sinfo->xdp_frags_size : copy_size;
+
+		if (copy_to_user(data_out, data, len))
+			goto out;
+
+		if (sinfo) {
+			int i, offset = len, data_len;
+
+			for (i = 0; i < sinfo->nr_frags; i++) {
+				skb_frag_t *frag = &sinfo->frags[i];
+
+				if (offset >= copy_size) {
+					err = -ENOSPC;
+					break;
+				}
+
+				data_len = min_t(int, copy_size - offset,
+						 skb_frag_size(frag));
+
+				if (copy_to_user(data_out + offset,
+						 skb_frag_address(frag),
+						 data_len))
+					goto out;
+
+				offset += data_len;
+			}
+		}
+	}
+
 	if (copy_to_user(&uattr->test.data_size_out, &size, sizeof(size)))
 		goto out;
 	if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval)))
@@ -672,7 +701,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	/* bpf program can never convert linear skb to non-linear */
 	if (WARN_ON_ONCE(skb_is_nonlinear(skb)))
 		size = skb_headlen(skb);
-	ret = bpf_test_finish(kattr, uattr, skb->data, size, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, skb->data, NULL, size, retval,
+			      duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, ctx,
 				     sizeof(struct __sk_buff));
@@ -840,8 +870,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		goto out;
 
 	size = xdp.data_end - xdp.data_meta + sinfo->xdp_frags_size;
-	ret = bpf_test_finish(kattr, uattr, xdp.data_meta, size, retval,
-			      duration);
+	ret = bpf_test_finish(kattr, uattr, xdp.data_meta, sinfo, size,
+			      retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, ctx,
 				     sizeof(struct xdp_md));
@@ -932,8 +962,8 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (ret < 0)
 		goto out;
 
-	ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys),
-			      retval, duration);
+	ret = bpf_test_finish(kattr, uattr, &flow_keys, NULL,
+			      sizeof(flow_keys), retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, user_ctx,
 				     sizeof(struct bpf_flow_keys));
@@ -1037,7 +1067,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat
 		user_ctx->cookie = sock_gen_cookie(ctx.selected_sk);
 	}
 
-	ret = bpf_test_finish(kattr, uattr, NULL, 0, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, NULL, NULL, 0, retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, user_ctx, sizeof(*user_ctx));
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 16/18] bpf: update xdp_adjust_tail selftest to include multi-buffer
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (14 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 15/18] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 17/18] net: xdp: introduce bpf_xdp_adjust_data helper Lorenzo Bianconi
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

From: Eelco Chaudron <echaudro@redhat.com>

This change adds test cases for the multi-buffer scenarios when shrinking
and growing.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 .../bpf/prog_tests/xdp_adjust_tail.c          | 118 ++++++++++++++++++
 .../bpf/progs/test_xdp_adjust_tail_grow.c     |  10 +-
 .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++++-
 3 files changed, 153 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index d5c98f2cb12f..40f7ae798fd1 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -130,6 +130,120 @@ void test_xdp_adjust_tail_grow2(void)
 	bpf_object__close(obj);
 }
 
+void test_xdp_adjust_mb_tail_shrink(void)
+{
+	const char *file = "./test_xdp_adjust_tail_shrink.o";
+	__u32 duration, retval, size, exp_size;
+	struct bpf_object *obj;
+	int err, prog_fd;
+	__u8 *buf;
+
+	/* For the individual test cases, the first byte in the packet
+	 * indicates which test will be run.
+	 */
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	buf = malloc(9000);
+	if (CHECK(!buf, "malloc()", "error:%s\n", strerror(errno)))
+		return;
+
+	memset(buf, 0, 9000);
+
+	/* Test case removing 10 bytes from last frag, NOT freeing it */
+	exp_size = 8990; /* 9000 - 10 */
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-10b", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	/* Test case removing one of two pages, assuming 4K pages */
+	buf[0] = 1;
+	exp_size = 4900; /* 9000 - 4100 */
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-1p", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	/* Test case removing two pages resulting in a non mb xdp_buff */
+	buf[0] = 2;
+	exp_size = 800; /* 9000 - 8200 */
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-2p", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	free(buf);
+
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_mb_tail_grow(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	__u32 duration, retval, size, exp_size;
+	struct bpf_object *obj;
+	int err, i, prog_fd;
+	__u8 *buf;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	buf = malloc(16384);
+	if (CHECK(!buf, "malloc()", "error:%s\n", strerror(errno)))
+		return;
+
+	/* Test case add 10 bytes to last frag */
+	memset(buf, 1, 16384);
+	size = 9000;
+	exp_size = size + 10;
+	err = bpf_prog_test_run(prog_fd, 1, buf, size,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k+10b", "err %d retval %d[%d] size %d[%u]\n",
+	      err, retval, XDP_TX, size, exp_size);
+
+	for (i = 0; i < 9000; i++)
+		CHECK(buf[i] != 1, "9k+10b-old",
+		      "Old data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	for (i = 9000; i < 9010; i++)
+		CHECK(buf[i] != 0, "9k+10b-new",
+		      "New data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	for (i = 9010; i < 16384; i++)
+		CHECK(buf[i] != 1, "9k+10b-untouched",
+		      "Unused data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	/* Test a too large grow */
+	memset(buf, 1, 16384);
+	size = 9001;
+	exp_size = size;
+	err = bpf_prog_test_run(prog_fd, 1, buf, size,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_DROP || size != exp_size,
+	      "9k+10b", "err %d retval %d[%d] size %d[%u]\n",
+	      err, retval, XDP_TX, size, exp_size);
+
+	free(buf);
+
+	bpf_object__close(obj);
+}
+
 void test_xdp_adjust_tail(void)
 {
 	if (test__start_subtest("xdp_adjust_tail_shrink"))
@@ -138,4 +252,8 @@ void test_xdp_adjust_tail(void)
 		test_xdp_adjust_tail_grow();
 	if (test__start_subtest("xdp_adjust_tail_grow2"))
 		test_xdp_adjust_tail_grow2();
+	if (test__start_subtest("xdp_adjust_mb_tail_shrink"))
+		test_xdp_adjust_mb_tail_shrink();
+	if (test__start_subtest("xdp_adjust_mb_tail_grow"))
+		test_xdp_adjust_mb_tail_grow();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
index 3d66599eee2e..3d43defb0e00 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
@@ -7,11 +7,10 @@ int _xdp_adjust_tail_grow(struct xdp_md *xdp)
 {
 	void *data_end = (void *)(long)xdp->data_end;
 	void *data = (void *)(long)xdp->data;
-	unsigned int data_len;
+	int data_len = bpf_xdp_get_buff_len(xdp);
 	int offset = 0;
 
 	/* Data length determine test case */
-	data_len = data_end - data;
 
 	if (data_len == 54) { /* sizeof(pkt_v4) */
 		offset = 4096; /* test too large offset */
@@ -20,7 +19,12 @@ int _xdp_adjust_tail_grow(struct xdp_md *xdp)
 	} else if (data_len == 64) {
 		offset = 128;
 	} else if (data_len == 128) {
-		offset = 4096 - 256 - 320 - data_len; /* Max tail grow 3520 */
+		/* Max tail grow 3520 */
+		offset = 4096 - 256 - 320 - data_len;
+	} else if (data_len == 9000) {
+		offset = 10;
+	} else if (data_len == 9001) {
+		offset = 4096;
 	} else {
 		return XDP_ABORTED; /* No matching test */
 	}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
index 22065a9cfb25..64177597ac29 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
@@ -14,14 +14,38 @@ int _version SEC("version") = 1;
 SEC("xdp_adjust_tail_shrink")
 int _xdp_adjust_tail_shrink(struct xdp_md *xdp)
 {
-	void *data_end = (void *)(long)xdp->data_end;
-	void *data = (void *)(long)xdp->data;
+	__u8 *data_end = (void *)(long)xdp->data_end;
+	__u8 *data = (void *)(long)xdp->data;
 	int offset = 0;
 
-	if (data_end - data == 54) /* sizeof(pkt_v4) */
+	switch (bpf_xdp_get_buff_len(xdp)) {
+	case 54:
+		/* sizeof(pkt_v4) */
 		offset = 256; /* shrink too much */
-	else
+		break;
+	case 9000:
+		/* Multi-buffer test cases */
+		if (data + 1 > data_end)
+			return XDP_DROP;
+
+		switch (data[0]) {
+		case 0:
+			offset = 10;
+			break;
+		case 1:
+			offset = 4100;
+			break;
+		case 2:
+			offset = 8200;
+			break;
+		default:
+			return XDP_DROP;
+		}
+		break;
+	default:
 		offset = 20;
+		break;
+	}
 	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
 		return XDP_DROP;
 	return XDP_TX;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 17/18] net: xdp: introduce bpf_xdp_adjust_data helper
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (15 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 16/18] bpf: update xdp_adjust_tail selftest to include multi-buffer Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-10 16:14 ` [PATCH v14 bpf-next 18/18] bpf: add bpf_xdp_adjust_data selftest Lorenzo Bianconi
  2021-09-16 16:55 ` [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Jakub Kicinski
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

For XDP frames split over multiple buffers, the xdp_md->data and
xdp_md->data_end pointers will point to the start and end of the first
fragment only. bpf_xdp_adjust_data can be used to access subsequent
fragments by moving the data pointers. To use, an XDP program can call
this helper with the byte offset of the packet payload that
it wants to access; the helper will move xdp_md->data and xdp_md ->data_end
so they point to the requested payload offset and to the end of the
fragment containing this byte offset, and return the byte offset of the
start of the fragment.
To move back to the beginning of the packet, simply call the
helper with an offset of '0'.
Note also that the helpers that modify the packet boundaries
(bpf_xdp_adjust_head(), bpf_xdp_adjust_tail() and
bpf_xdp_adjust_meta()) will fail if the pointers have been
moved; it is the responsibility of the BPF program to move them
back before using these helpers.

Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h              |  8 +++++
 include/uapi/linux/bpf.h       | 32 +++++++++++++++++
 net/bpf/test_run.c             |  8 +++++
 net/core/filter.c              | 65 +++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h | 32 +++++++++++++++++
 5 files changed, 144 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 789251e464de..9d8f4c1dc8e0 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -82,6 +82,11 @@ struct xdp_buff {
 	struct xdp_txq_info *txq;
 	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
 	u32 flags; /* supported values defined in xdp_buff_flags */
+	/* xdp multi-buff metadata used for frags iteration */
+	struct {
+		u16 headroom;	/* frame headroom: data - data_hard_start */
+		u16 headlen;	/* first buffer length: data_end - data */
+	} mb;
 };
 
 static __always_inline bool xdp_buff_is_mb(struct xdp_buff *xdp)
@@ -127,6 +132,9 @@ xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
 	xdp->data = data;
 	xdp->data_end = data + data_len;
 	xdp->data_meta = meta_valid ? data : data + 1;
+	/* mb metadata for frags iteration */
+	xdp->mb.headroom = headroom;
+	xdp->mb.headlen = data_len;
 }
 
 /* Reserve memory area at end-of data area.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1fd87bd5848b..4f56ba0fd1dd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4883,6 +4883,37 @@ union bpf_attr {
  *		Get the total size of a given xdp buff (linear and paged area)
  *	Return
  *		The total size of a given xdp buffer.
+ *
+ * long bpf_xdp_adjust_data(struct xdp_buff *xdp_md, u32 offset)
+ *	Description
+ *		For XDP frames split over multiple buffers, the
+ *		*xdp_md*\ **->data** and *xdp_md *\ **->data_end** pointers
+ *		will point to the start and end of the first fragment only.
+ *		This helper can be used to access subsequent fragments by
+ *		moving the data pointers. To use, an XDP program can call
+ *		this helper with the byte offset of the packet payload that
+ *		it wants to access; the helper will move *xdp_md*\ **->data**
+ *		and *xdp_md *\ **->data_end** so they point to the requested
+ *		payload offset and to the end of the fragment containing this
+ *		byte offset, and return the byte offset of the start of the
+ *		fragment.
+ *		To move back to the beginning of the packet, simply call the
+ *		helper with an offset of '0'.
+ *		Note also that the helpers that modify the packet boundaries
+ *		(*bpf_xdp_adjust_head()*, *bpf_xdp_adjust_tail()* and
+ *		*bpf_xdp_adjust_meta()*) will fail if the pointers have been
+ *		moved; it is the responsibility of the BPF program to move them
+ *		back before using these helpers.
+ *
+ *		A call to this helper is susceptible to change the underlying
+ *		packet buffer. Therefore, at load time, all checks on pointers
+ *		previously done by the verifier are invalidated and must be
+ *		performed again, if the helper is used in combination with
+ *		direct packet access.
+ *	Return
+ *		offset between the beginning of the current fragment and
+ *		original *xdp_md*\ **->data** on success, or a negative error
+ *		in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5062,6 +5093,7 @@ union bpf_attr {
 	FN(get_attach_cookie),		\
 	FN(task_pt_regs),		\
 	FN(xdp_get_buff_len),		\
+	FN(xdp_adjust_data),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 4f5c28c4f888..9e6f156e6c24 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -755,6 +755,8 @@ static int xdp_convert_md_to_buff(struct xdp_md *xdp_md, struct xdp_buff *xdp)
 	}
 
 	xdp->data = xdp->data_meta + xdp_md->data;
+	xdp->mb.headroom = xdp->data - xdp->data_hard_start;
+	xdp->mb.headlen = xdp->data_end - xdp->data;
 	return 0;
 
 free_dev:
@@ -869,6 +871,12 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (ret)
 		goto out;
 
+	/* data pointers need to be reset after frag iteration */
+	if (unlikely(xdp.data_hard_start + xdp.mb.headroom != xdp.data)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
 	size = xdp.data_end - xdp.data_meta + sinfo->xdp_frags_size;
 	ret = bpf_test_finish(kattr, uattr, xdp.data_meta, sinfo, size,
 			      retval, duration);
diff --git a/net/core/filter.c b/net/core/filter.c
index e1dc86f0930f..6111e95b50df 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3827,6 +3827,10 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset)
 	void *data_start = xdp_frame_end + metalen;
 	void *data = xdp->data + offset;
 
+	/* data pointers need to be reset after frag iteration */
+	if (unlikely(xdp->data_hard_start + xdp->mb.headroom != xdp->data))
+		return -EFAULT;
+
 	if (unlikely(data < data_start ||
 		     data > xdp->data_end - ETH_HLEN))
 		return -EINVAL;
@@ -3836,6 +3840,9 @@ BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset)
 			xdp->data_meta, metalen);
 	xdp->data_meta += offset;
 	xdp->data = data;
+	/* update metada for multi-buff frag iteration */
+	xdp->mb.headroom = xdp->data - xdp->data_hard_start;
+	xdp->mb.headlen = xdp->data_end - xdp->data;
 
 	return 0;
 }
@@ -3910,6 +3917,10 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
 	void *data_end = xdp->data_end + offset;
 
+	/* data pointer needs to be reset after frag iteration */
+	if (unlikely(xdp->data + xdp->mb.headlen != xdp->data_end))
+		return -EFAULT;
+
 	if (unlikely(xdp_buff_is_mb(xdp)))
 		return bpf_xdp_mb_adjust_tail(xdp, offset);
 
@@ -3949,6 +3960,10 @@ BPF_CALL_2(bpf_xdp_adjust_meta, struct xdp_buff *, xdp, int, offset)
 	void *meta = xdp->data_meta + offset;
 	unsigned long metalen = xdp->data - meta;
 
+	/* data pointer needs to be reset after frag iteration */
+	if (unlikely(xdp->data_hard_start + xdp->mb.headroom != xdp->data))
+		return -EFAULT;
+
 	if (xdp_data_meta_unsupported(xdp))
 		return -ENOTSUPP;
 	if (unlikely(meta < xdp_frame_end ||
@@ -3970,6 +3985,51 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_xdp_adjust_data, struct xdp_buff *, xdp, u32, offset)
+{
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+	u32 base_offset = xdp->mb.headlen, max_offset = xdp->mb.headlen;
+	int i;
+
+	if (xdp_buff_is_mb(xdp))
+		max_offset += sinfo->xdp_frags_size;
+
+	if (offset > max_offset)
+		return -EINVAL;
+
+	if (offset < xdp->mb.headlen) {
+		/* linear area */
+		xdp->data = xdp->data_hard_start + xdp->mb.headroom + offset;
+		xdp->data_end = xdp->data_hard_start + xdp->mb.headroom +
+				xdp->mb.headlen;
+		return 0;
+	}
+
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		/* paged area */
+		skb_frag_t *frag = &sinfo->frags[i];
+		unsigned int size = skb_frag_size(frag);
+
+		if (offset < base_offset + size) {
+			u8 *addr = skb_frag_address(frag);
+
+			xdp->data = addr + offset - base_offset;
+			xdp->data_end = addr + size;
+			break;
+		}
+		base_offset += size;
+	}
+	return base_offset;
+}
+
+static const struct bpf_func_proto bpf_xdp_adjust_data_proto = {
+	.func		= bpf_xdp_adjust_data,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+};
+
 /* XDP_REDIRECT works by a three-step process, implemented in the functions
  * below:
  *
@@ -7261,7 +7321,8 @@ bool bpf_helper_changes_pkt_data(void *func)
 	    func == bpf_sock_ops_store_hdr_opt ||
 #endif
 	    func == bpf_lwt_in_push_encap ||
-	    func == bpf_lwt_xmit_push_encap)
+	    func == bpf_lwt_xmit_push_encap ||
+	    func == bpf_xdp_adjust_data)
 		return true;
 
 	return false;
@@ -7614,6 +7675,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_adjust_tail_proto;
 	case BPF_FUNC_xdp_get_buff_len:
 		return &bpf_xdp_get_buff_len_proto;
+	case BPF_FUNC_xdp_adjust_data:
+		return &bpf_xdp_adjust_data_proto;
 	case BPF_FUNC_fib_lookup:
 		return &bpf_xdp_fib_lookup_proto;
 	case BPF_FUNC_check_mtu:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1fd87bd5848b..4f56ba0fd1dd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4883,6 +4883,37 @@ union bpf_attr {
  *		Get the total size of a given xdp buff (linear and paged area)
  *	Return
  *		The total size of a given xdp buffer.
+ *
+ * long bpf_xdp_adjust_data(struct xdp_buff *xdp_md, u32 offset)
+ *	Description
+ *		For XDP frames split over multiple buffers, the
+ *		*xdp_md*\ **->data** and *xdp_md *\ **->data_end** pointers
+ *		will point to the start and end of the first fragment only.
+ *		This helper can be used to access subsequent fragments by
+ *		moving the data pointers. To use, an XDP program can call
+ *		this helper with the byte offset of the packet payload that
+ *		it wants to access; the helper will move *xdp_md*\ **->data**
+ *		and *xdp_md *\ **->data_end** so they point to the requested
+ *		payload offset and to the end of the fragment containing this
+ *		byte offset, and return the byte offset of the start of the
+ *		fragment.
+ *		To move back to the beginning of the packet, simply call the
+ *		helper with an offset of '0'.
+ *		Note also that the helpers that modify the packet boundaries
+ *		(*bpf_xdp_adjust_head()*, *bpf_xdp_adjust_tail()* and
+ *		*bpf_xdp_adjust_meta()*) will fail if the pointers have been
+ *		moved; it is the responsibility of the BPF program to move them
+ *		back before using these helpers.
+ *
+ *		A call to this helper is susceptible to change the underlying
+ *		packet buffer. Therefore, at load time, all checks on pointers
+ *		previously done by the verifier are invalidated and must be
+ *		performed again, if the helper is used in combination with
+ *		direct packet access.
+ *	Return
+ *		offset between the beginning of the current fragment and
+ *		original *xdp_md*\ **->data** on success, or a negative error
+ *		in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5062,6 +5093,7 @@ union bpf_attr {
 	FN(get_attach_cookie),		\
 	FN(task_pt_regs),		\
 	FN(xdp_get_buff_len),		\
+	FN(xdp_adjust_data),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v14 bpf-next 18/18] bpf: add bpf_xdp_adjust_data selftest
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (16 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 17/18] net: xdp: introduce bpf_xdp_adjust_data helper Lorenzo Bianconi
@ 2021-09-10 16:14 ` Lorenzo Bianconi
  2021-09-16 16:55 ` [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Jakub Kicinski
  18 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-10 16:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

Introduce kernel selftest for new bpf_xdp_adjust_data helper.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 .../bpf/prog_tests/xdp_adjust_data.c          | 63 +++++++++++++++++++
 .../bpf/progs/test_xdp_update_frags.c         | 45 +++++++++++++
 2 files changed, 108 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_adjust_data.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_update_frags.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_data.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_data.c
new file mode 100644
index 000000000000..dfb8bce8ff55
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_data.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+
+void test_xdp_update_frag(void)
+{
+	const char *file = "./test_xdp_update_frags.o";
+	__u32 duration, retval, size;
+	struct bpf_object *obj;
+	int err, prog_fd;
+	__u32 *offset;
+	__u8 *buf;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	buf = malloc(128);
+	if (CHECK(!buf, "malloc()", "error:%s\n", strerror(errno)))
+		return;
+
+	memset(buf, 0, 128);
+	offset = (__u32 *)buf;
+	*offset = 16;
+	buf[*offset] = 0xaa; /* marker at offset 16 */
+
+	err = bpf_prog_test_run(prog_fd, 1, buf, 128,
+				buf, &size, &retval, &duration);
+
+	/* test_xdp_update_frags: buf[16]: 0xaa -> 0xbb */
+	CHECK(err || retval != XDP_PASS || buf[16] != 0xbb,
+	      "128b", "err %d errno %d retval %d size %d\n",
+	      err, errno, retval, size);
+
+	free(buf);
+
+	buf = malloc(9000);
+	if (CHECK(!buf, "malloc()", "error:%s\n", strerror(errno)))
+		return;
+
+	memset(buf, 0, 9000);
+	offset = (__u32 *)buf;
+	*offset = 5000;
+	buf[*offset] = 0xaa; /* marker at offset 5000 (frag0) */
+
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	/* test_xdp_update_frags: buf[5000]: 0xaa -> 0xbb */
+	CHECK(err || retval != XDP_PASS || buf[5000] != 0xbb,
+	      "9000b", "err %d errno %d retval %d size %d\n",
+	      err, errno, retval, size);
+
+	free(buf);
+
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_data(void)
+{
+	if (test__start_subtest("xdp_adjust_data"))
+		test_xdp_update_frag();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_update_frags.c b/tools/testing/selftests/bpf/progs/test_xdp_update_frags.c
new file mode 100644
index 000000000000..d06504228265
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_update_frags.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <bpf/bpf_helpers.h>
+
+int _version SEC("version") = 1;
+
+SEC("xdp_adjust_frags")
+int _xdp_adjust_frags(struct xdp_md *xdp)
+{
+	__u8 *data_end = (void *)(long)xdp->data_end;
+	__u8 *data = (void *)(long)xdp->data;
+	int base_offset, ret = XDP_DROP;
+	__u32 offset;
+
+	if (data + sizeof(__u32) > data_end)
+		return XDP_DROP;
+
+	offset = *(__u32 *)data;
+	base_offset = bpf_xdp_adjust_data(xdp, offset);
+	if (base_offset < 0 || base_offset > offset)
+		return XDP_DROP;
+
+	data_end = (void *)(long)xdp->data_end;
+	data = (void *)(long)xdp->data;
+
+	if (data + 1 > data_end)
+		goto out;
+
+	if (*data != 0xaa) /* marker */
+		goto out;
+
+	*data = 0xbb; /* update the marker */
+	ret = XDP_PASS;
+out:
+	bpf_xdp_adjust_data(xdp, 0);
+	return ret;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.31.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 01/18] net: skbuff: add size metadata to skb_shared_info for xdp
  2021-09-10 16:14 ` [PATCH v14 bpf-next 01/18] net: skbuff: add size metadata to skb_shared_info for xdp Lorenzo Bianconi
@ 2021-09-10 16:18   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 58+ messages in thread
From: Jesper Dangaard Brouer @ 2021-09-10 16:18 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf, netdev
  Cc: brouer, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, echaudro, jasowang, alexander.duyck,
	saeed, maciej.fijalkowski, magnus.karlsson, tirthendu.sarkar,
	toke


On 10/09/2021 18.14, Lorenzo Bianconi wrote:
> Introduce xdp_frags_truesize field in skb_shared_info data structure
> to store xdp_buff/xdp_frame truesize (xdp_frags_truesize will be used
> in xdp multi-buff support). In order to not increase skb_shared_info
> size we will use a hole due to skb_shared_info alignment.
> Introduce xdp_frags_size field in skb_shared_info data structure
> reusing gso_type field in order to store xdp_buff/xdp_frame paged size.
> xdp_frags_size will be used in xdp multi-buff support.
> 
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>   include/linux/skbuff.h | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>


> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 6bdb0db3e825..769ffd09f975 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -522,13 +522,17 @@ struct skb_shared_info {
>   	unsigned short	gso_segs;
>   	struct sk_buff	*frag_list;
>   	struct skb_shared_hwtstamps hwtstamps;
> -	unsigned int	gso_type;
> +	union {
> +		unsigned int	gso_type;
> +		unsigned int	xdp_frags_size;
> +	};
>   	u32		tskey;
>   
>   	/*
>   	 * Warning : all fields before dataref are cleared in __alloc_skb()
>   	 */
>   	atomic_t	dataref;
> +	unsigned int	xdp_frags_truesize;
>   
>   	/* Intermediate layers must ensure that destructor_arg
>   	 * remains valid until skb destructor */
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 02/18] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-09-10 16:14 ` [PATCH v14 bpf-next 02/18] xdp: introduce flags field in xdp_buff/xdp_frame Lorenzo Bianconi
@ 2021-09-10 16:19   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 58+ messages in thread
From: Jesper Dangaard Brouer @ 2021-09-10 16:19 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf, netdev
  Cc: brouer, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, echaudro, jasowang, alexander.duyck,
	saeed, maciej.fijalkowski, magnus.karlsson, tirthendu.sarkar,
	toke


On 10/09/2021 18.14, Lorenzo Bianconi wrote:
> Introduce flags field in xdp_frame and xdp_buffer data structures
> to define additional buffer features. At the moment the only
> supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
> is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
> frame (mb = 1). In the latter case the driver is expected to initialize
> the skb_shared_info structure at the end of the first buffer to link
> together subsequent buffers belonging to the same frame.
> 
> Acked-by: John Fastabend<john.fastabend@gmail.com>
> Signed-off-by: Lorenzo Bianconi<lorenzo@kernel.org>
> ---
>   include/net/xdp.h | 29 +++++++++++++++++++++++++++++
>   1 file changed, 29 insertions(+)

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (17 preceding siblings ...)
  2021-09-10 16:14 ` [PATCH v14 bpf-next 18/18] bpf: add bpf_xdp_adjust_data selftest Lorenzo Bianconi
@ 2021-09-16 16:55 ` Jakub Kicinski
  2021-09-17 14:51   ` Lorenzo Bianconi
  2021-09-29 10:41   ` Lorenz Bauer
  18 siblings, 2 replies; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-16 16:55 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

On Fri, 10 Sep 2021 18:14:06 +0200 Lorenzo Bianconi wrote:
> The two following ebpf helpers (and related selftests) has been introduced:
> - bpf_xdp_adjust_data:
>   Move xdp_md->data and xdp_md->data_end pointers in subsequent fragments
>   according to the offset provided by the ebpf program. This helper can be
>   used to read/write values in frame payload.
> - bpf_xdp_get_buff_len:
>   Return the total frame size (linear + paged parts)

> More info about the main idea behind this approach can be found here [1][2].

Is there much critique of the skb helpers we have? My intuition would
be to follow a similar paradigm from the API perspective. It may seem
trivial to us to switch between the two but "normal" users could easily
be confused.

By skb paradigm I mean skb_pull_data() and bpf_skb_load/store_bytes().

Alternatively how about we produce a variation on skb_header_pointer()
(use on-stack buffer or direct access if the entire region is in one
frag).

bpf_xdp_adjust_data() seems to add cost to helpers and TBH I'm not sure
how practical it would be to applications. My understanding is that the
application is not supposed to make assumptions about the fragment
geometry, meaning data can be split at any point. Parsing data
arbitrarily split into buffers is hard if pull() is not an option, let
alone making such parsing provably correct.

Won't applications end up building something like skb_header_pointer()
based on bpf_xdp_adjust_data(), anyway? In which case why don't we
provide them what they need?

say: 

void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags, 
                     u32 offset, u32 len, void *stack_buf)

flags and offset can be squashed into one u64 as needed. Helper returns
pointer to packet data, either real one or stack_buf. Verifier has to
be taught that the return value is NULL or a pointer which is safe with
offsets up to @len.

If the reason for access is write we'd also need:

void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, 
                           u32 offset, u32 len, void *stack_buf)

Same inputs, if stack buffer was used it does write back, otherwise nop.

Sorry for the longish email if I'm missing something obvious and/or
discussed earlier.


The other thing I wanted to double check - was the decision on program
compatibility made? Is a new program type an option? It'd be extremely
useful operationally to be able to depend on kernel enforcement.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 10/18] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-09-10 16:14 ` [PATCH v14 bpf-next 10/18] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
@ 2021-09-16 16:55   ` Jakub Kicinski
  2021-09-17 10:02     ` Lorenzo Bianconi
  0 siblings, 1 reply; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-16 16:55 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

On Fri, 10 Sep 2021 18:14:16 +0200 Lorenzo Bianconi wrote:
> From: Eelco Chaudron <echaudro@redhat.com>
> 
> This change adds support for tail growing and shrinking for XDP multi-buff.
> 
> When called on a multi-buffer packet with a grow request, it will always
> work on the last fragment of the packet. So the maximum grow size is the
> last fragments tailroom, i.e. no new buffer will be allocated.
> 
> When shrinking, it will work from the last fragment, all the way down to
> the base buffer depending on the shrinking size. It's important to mention
> that once you shrink down the fragment(s) are freed, so you can not grow
> again to the original size.
> 
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

> +static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
> +{
> +	struct page *page = skb_frag_page(frag);
> +
> +	return page_size(page) - skb_frag_size(frag) - skb_frag_off(frag);
> +}

How do we deal with NICs which can pack multiple skbs into a page frag?
skb_shared_info field to mark the end of last fragment? Just want to make 
sure there is a path to supporting such designs.

> +static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
> +{
> +	struct skb_shared_info *sinfo;
> +
> +	sinfo = xdp_get_shared_info_from_buff(xdp);
> +	if (offset >= 0) {
> +		skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
> +		int size;
> +
> +		if (unlikely(offset > xdp_get_frag_tailroom(frag)))
> +			return -EINVAL;
> +
> +		size = skb_frag_size(frag);
> +		memset(skb_frag_address(frag) + size, 0, offset);
> +		skb_frag_size_set(frag, size + offset);
> +		sinfo->xdp_frags_size += offset;
> +	} else {
> +		int i, n_frags_free = 0, len_free = 0, tlen_free = 0;
> +
> +		offset = abs(offset);
> +		if (unlikely(offset > ((int)(xdp->data_end - xdp->data) +
> +				       sinfo->xdp_frags_size - ETH_HLEN)))
> +			return -EINVAL;
> +
> +		for (i = sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) {
> +			skb_frag_t *frag = &sinfo->frags[i];
> +			int size = skb_frag_size(frag);
> +			int shrink = min_t(int, offset, size);
> +
> +			len_free += shrink;
> +			offset -= shrink;
> +
> +			if (unlikely(size == shrink)) {
> +				struct page *page = skb_frag_page(frag);
> +
> +				__xdp_return(page_address(page), &xdp->rxq->mem,
> +					     false, NULL);
> +				tlen_free += page_size(page);
> +				n_frags_free++;
> +			} else {
> +				skb_frag_size_set(frag, size - shrink);
> +				break;
> +			}
> +		}
> +		sinfo->nr_frags -= n_frags_free;
> +		sinfo->xdp_frags_size -= len_free;
> +		sinfo->xdp_frags_truesize -= tlen_free;
> +
> +		if (unlikely(offset > 0)) {
> +			xdp_buff_clear_mb(xdp);
> +			xdp->data_end -= offset;
> +		}
> +	}
> +
> +	return 0;
> +}

nit: most of this function is indented, situation is ripe for splitting
     it into two

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 10/18] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-09-16 16:55   ` Jakub Kicinski
@ 2021-09-17 10:02     ` Lorenzo Bianconi
  2021-09-17 13:03       ` Jakub Kicinski
  0 siblings, 1 reply; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-17 10:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: bpf, netdev, lorenzo.bianconi, davem, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]

On Sep 16, Jakub Kicinski wrote:
> On Fri, 10 Sep 2021 18:14:16 +0200 Lorenzo Bianconi wrote:
> > From: Eelco Chaudron <echaudro@redhat.com>
> > 
> > This change adds support for tail growing and shrinking for XDP multi-buff.
> > 
> > When called on a multi-buffer packet with a grow request, it will always
> > work on the last fragment of the packet. So the maximum grow size is the
> > last fragments tailroom, i.e. no new buffer will be allocated.
> > 
> > When shrinking, it will work from the last fragment, all the way down to
> > the base buffer depending on the shrinking size. It's important to mention
> > that once you shrink down the fragment(s) are freed, so you can not grow
> > again to the original size.
> > 
> > Acked-by: John Fastabend <john.fastabend@gmail.com>
> > Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> 
> > +static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
> > +{
> > +	struct page *page = skb_frag_page(frag);
> > +
> > +	return page_size(page) - skb_frag_size(frag) - skb_frag_off(frag);
> > +}
> 
> How do we deal with NICs which can pack multiple skbs into a page frag?
> skb_shared_info field to mark the end of last fragment? Just want to make 
> sure there is a path to supporting such designs.

I guess here, intead of using page_size(page) we can rely on xdp_buff->frame_sz
or even on skb_shared_info()->xdp_frag_truesize (assuming all fragments from a
given hw have the same truesize, but I think this is something we can rely on)

static inline unsigned int xdp_get_frag_tailroom(struct xdp_buff *xdp,
						 const skb_frag_t *frag)
{
	return xdp->frame_sz - skb_frag_size(frag) - skb_frag_off(frag);
}

what do you think?

> 
> > +static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
> > +{
> > +	struct skb_shared_info *sinfo;
> > +
> > +	sinfo = xdp_get_shared_info_from_buff(xdp);
> > +	if (offset >= 0) {
> > +		skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
> > +		int size;
> > +
> > +		if (unlikely(offset > xdp_get_frag_tailroom(frag)))
> > +			return -EINVAL;
> > +
> > +		size = skb_frag_size(frag);
> > +		memset(skb_frag_address(frag) + size, 0, offset);
> > +		skb_frag_size_set(frag, size + offset);
> > +		sinfo->xdp_frags_size += offset;
> > +	} else {
> > +		int i, n_frags_free = 0, len_free = 0, tlen_free = 0;
> > +
> > +		offset = abs(offset);
> > +		if (unlikely(offset > ((int)(xdp->data_end - xdp->data) +
> > +				       sinfo->xdp_frags_size - ETH_HLEN)))
> > +			return -EINVAL;
> > +
> > +		for (i = sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) {
> > +			skb_frag_t *frag = &sinfo->frags[i];
> > +			int size = skb_frag_size(frag);
> > +			int shrink = min_t(int, offset, size);
> > +
> > +			len_free += shrink;
> > +			offset -= shrink;
> > +
> > +			if (unlikely(size == shrink)) {
> > +				struct page *page = skb_frag_page(frag);
> > +
> > +				__xdp_return(page_address(page), &xdp->rxq->mem,
> > +					     false, NULL);
> > +				tlen_free += page_size(page);
> > +				n_frags_free++;
> > +			} else {
> > +				skb_frag_size_set(frag, size - shrink);
> > +				break;
> > +			}
> > +		}
> > +		sinfo->nr_frags -= n_frags_free;
> > +		sinfo->xdp_frags_size -= len_free;
> > +		sinfo->xdp_frags_truesize -= tlen_free;
> > +
> > +		if (unlikely(offset > 0)) {
> > +			xdp_buff_clear_mb(xdp);
> > +			xdp->data_end -= offset;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> 
> nit: most of this function is indented, situation is ripe for splitting
>      it into two

sure, I will fix it.

Regards,
Lorenzo


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 10/18] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-09-17 10:02     ` Lorenzo Bianconi
@ 2021-09-17 13:03       ` Jakub Kicinski
  0 siblings, 0 replies; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-17 13:03 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

On Fri, 17 Sep 2021 12:02:46 +0200 Lorenzo Bianconi wrote:
> > > +static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
> > > +{
> > > +	struct page *page = skb_frag_page(frag);
> > > +
> > > +	return page_size(page) - skb_frag_size(frag) - skb_frag_off(frag);
> > > +}  
> > 
> > How do we deal with NICs which can pack multiple skbs into a page frag?
> > skb_shared_info field to mark the end of last fragment? Just want to make 
> > sure there is a path to supporting such designs.  
> 
> I guess here, intead of using page_size(page) we can rely on xdp_buff->frame_sz
> or even on skb_shared_info()->xdp_frag_truesize (assuming all fragments from a
> given hw have the same truesize, but I think this is something we can rely on)
> 
> static inline unsigned int xdp_get_frag_tailroom(struct xdp_buff *xdp,
> 						 const skb_frag_t *frag)
> {
> 	return xdp->frame_sz - skb_frag_size(frag) - skb_frag_off(frag);
> }
> 
> what do you think?

Could work! We'd need to document the semantics of frame_sz for mb
frames clearly but I don't see why not. 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-16 16:55 ` [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Jakub Kicinski
@ 2021-09-17 14:51   ` Lorenzo Bianconi
  2021-09-17 18:33     ` Jakub Kicinski
  2021-09-29 10:41   ` Lorenz Bauer
  1 sibling, 1 reply; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-17 14:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: bpf, netdev, lorenzo.bianconi, davem, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

[-- Attachment #1: Type: text/plain, Size: 3936 bytes --]

> On Fri, 10 Sep 2021 18:14:06 +0200 Lorenzo Bianconi wrote:
> > The two following ebpf helpers (and related selftests) has been introduced:
> > - bpf_xdp_adjust_data:
> >   Move xdp_md->data and xdp_md->data_end pointers in subsequent fragments
> >   according to the offset provided by the ebpf program. This helper can be
> >   used to read/write values in frame payload.
> > - bpf_xdp_get_buff_len:
> >   Return the total frame size (linear + paged parts)
> 
> > More info about the main idea behind this approach can be found here [1][2].
> 
> Is there much critique of the skb helpers we have? My intuition would
> be to follow a similar paradigm from the API perspective. It may seem
> trivial to us to switch between the two but "normal" users could easily
> be confused.
> 
> By skb paradigm I mean skb_pull_data() and bpf_skb_load/store_bytes().
> 
> Alternatively how about we produce a variation on skb_header_pointer()
> (use on-stack buffer or direct access if the entire region is in one
> frag).
> 
> bpf_xdp_adjust_data() seems to add cost to helpers and TBH I'm not sure
> how practical it would be to applications. My understanding is that the
> application is not supposed to make assumptions about the fragment
> geometry, meaning data can be split at any point. Parsing data
> arbitrarily split into buffers is hard if pull() is not an option, let
> alone making such parsing provably correct.
> 
> Won't applications end up building something like skb_header_pointer()
> based on bpf_xdp_adjust_data(), anyway? In which case why don't we
> provide them what they need?

Please correct me if I am wrong, here you mean in bpf_xdp_adjust_data()
we are moving the logic to read/write data across fragment boundaries
to the caller. Right.
I do not have a clear view about what could be a real use-case for the helper
(maybe John can help on this), but similar to what you are suggesting, what
about doing something like bpf_skb_load/store_bytes()?

- bpf_xdp_load_bytes(struct xdp_buff *xdp_md, u32 offset, u32 len,
		     void *data)

- bpf_xdp_store_bytes(struct xdp_buff *xdp_md, u32 offset, u32 len,
		      void *data)

the helper can take care of reading/writing across fragment boundaries
and remove any layout info from the caller. The only downside here
(as for bpf_skb_load/store_bytes()) is we need to copy. But in a
real application, is it actually an issue? (since we have much less
pps for xdp multi-buff).
Moreover I do not know if this solution will requires some verifier
changes.

@John: can this approach works in your use-case?

Anyway I think we should try to get everyone on the same page here since the
helper can change according to specific use-case. Since this series is on the
agenda for LPC next week, I hope you and others who have an opinion about this
will find the time to come and discuss it during the conference :)

Regards,
Lorenzo
> 
> say: 
> 
> void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags, 
>                      u32 offset, u32 len, void *stack_buf)
> 
> flags and offset can be squashed into one u64 as needed. Helper returns
> pointer to packet data, either real one or stack_buf. Verifier has to
> be taught that the return value is NULL or a pointer which is safe with
> offsets up to @len.
> 
> If the reason for access is write we'd also need:
> 
> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, 
>                            u32 offset, u32 len, void *stack_buf)
> 
> Same inputs, if stack buffer was used it does write back, otherwise nop.
> 
> Sorry for the longish email if I'm missing something obvious and/or
> discussed earlier.
> 
> 
> The other thing I wanted to double check - was the decision on program
> compatibility made? Is a new program type an option? It'd be extremely
> useful operationally to be able to depend on kernel enforcement.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-17 14:51   ` Lorenzo Bianconi
@ 2021-09-17 18:33     ` Jakub Kicinski
  2021-09-17 18:43       ` Alexei Starovoitov
  0 siblings, 1 reply; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-17 18:33 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

On Fri, 17 Sep 2021 16:51:06 +0200 Lorenzo Bianconi wrote:
> > Is there much critique of the skb helpers we have? My intuition would
> > be to follow a similar paradigm from the API perspective. It may seem
> > trivial to us to switch between the two but "normal" users could easily
> > be confused.
> > 
> > By skb paradigm I mean skb_pull_data() and bpf_skb_load/store_bytes().
> > 
> > Alternatively how about we produce a variation on skb_header_pointer()
> > (use on-stack buffer or direct access if the entire region is in one
> > frag).
> > 
> > bpf_xdp_adjust_data() seems to add cost to helpers and TBH I'm not sure
> > how practical it would be to applications. My understanding is that the
> > application is not supposed to make assumptions about the fragment
> > geometry, meaning data can be split at any point. Parsing data
> > arbitrarily split into buffers is hard if pull() is not an option, let
> > alone making such parsing provably correct.
> > 
> > Won't applications end up building something like skb_header_pointer()
> > based on bpf_xdp_adjust_data(), anyway? In which case why don't we
> > provide them what they need?  
> 
> Please correct me if I am wrong, here you mean in bpf_xdp_adjust_data()
> we are moving the logic to read/write data across fragment boundaries
> to the caller. Right.
> I do not have a clear view about what could be a real use-case for the helper
> (maybe John can help on this), but similar to what you are suggesting, what
> about doing something like bpf_skb_load/store_bytes()?
> 
> - bpf_xdp_load_bytes(struct xdp_buff *xdp_md, u32 offset, u32 len,
> 		     void *data)
> 
> - bpf_xdp_store_bytes(struct xdp_buff *xdp_md, u32 offset, u32 len,
> 		      void *data)
> 
> the helper can take care of reading/writing across fragment boundaries
> and remove any layout info from the caller. The only downside here
> (as for bpf_skb_load/store_bytes()) is we need to copy.

If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we
can start with that. In all honesty I don't know what the exact
use cases for looking at data are, either. I'm primarily worried 
about exposing the kernel internals too early.

What I'm imagining is that data access mostly works around bad
header/data split or loading some simple >L4 headers. On one hand
such headers will fall into one frag, so 99.9% of the time the copy
isn't really required. But as I said I'm happy with load/store, to
begin with.

> But in a real application, is it actually an issue? (since we have
> much less pps for xdp multi-buff).
> Moreover I do not know if this solution will requires some verifier
> changes.
> 
> @John: can this approach works in your use-case?
> 
> Anyway I think we should try to get everyone on the same page here
> since the helper can change according to specific use-case. Since
> this series is on the agenda for LPC next week, I hope you and others
> who have an opinion about this will find the time to come and discuss
> it during the conference :)

Yup, I'll be there. I'm not good at thinking on my feet tho, hence
sharing my comments now.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-17 18:33     ` Jakub Kicinski
@ 2021-09-17 18:43       ` Alexei Starovoitov
  2021-09-17 19:00         ` Jakub Kicinski
  0 siblings, 1 reply; 58+ messages in thread
From: Alexei Starovoitov @ 2021-09-17 18:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Karlsson, Magnus, tirthendu.sarkar,
	Toke Høiland-Jørgensen

On Fri, Sep 17, 2021 at 11:33 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> > >
> > > Alternatively how about we produce a variation on skb_header_pointer()
> > > (use on-stack buffer or direct access if the entire region is in one
> > > frag).
>
> If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we
> can start with that. In all honesty I don't know what the exact
> use cases for looking at data are, either. I'm primarily worried
> about exposing the kernel internals too early.

I don't mind the xdp equivalent of skb_load_bytes,
but skb_header_pointer() idea is superior.
When we did xdp with data/data_end there was no refine_retval_range
concept in the verifier (iirc or we just missed that opportunity).
We'd need something more advanced: a pointer with valid range
refined by input argument 'len' or NULL.
The verifier doesn't have such thing yet, but it fits as a combination of
value_or_null plus refine_retval_range.
The bpf_xdp_header_pointer() and bpf_skb_header_pointer()
would probably simplify bpf programs as well.
There would be no need to deal with data/data_end.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-17 18:43       ` Alexei Starovoitov
@ 2021-09-17 19:00         ` Jakub Kicinski
  2021-09-17 19:10           ` Alexei Starovoitov
  0 siblings, 1 reply; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-17 19:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Karlsson, Magnus, tirthendu.sarkar,
	Toke Høiland-Jørgensen

On Fri, 17 Sep 2021 11:43:07 -0700 Alexei Starovoitov wrote:
> > If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we
> > can start with that. In all honesty I don't know what the exact
> > use cases for looking at data are, either. I'm primarily worried
> > about exposing the kernel internals too early.  
> 
> I don't mind the xdp equivalent of skb_load_bytes,
> but skb_header_pointer() idea is superior.
> When we did xdp with data/data_end there was no refine_retval_range
> concept in the verifier (iirc or we just missed that opportunity).
> We'd need something more advanced: a pointer with valid range
> refined by input argument 'len' or NULL.
> The verifier doesn't have such thing yet, but it fits as a combination of
> value_or_null plus refine_retval_range.
> The bpf_xdp_header_pointer() and bpf_skb_header_pointer()
> would probably simplify bpf programs as well.
> There would be no need to deal with data/data_end.

What are your thoughts on inlining? Can we inline the common case 
of the header being in the "head"? Otherwise data/end comparisons 
would be faster.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-17 19:00         ` Jakub Kicinski
@ 2021-09-17 19:10           ` Alexei Starovoitov
  2021-09-17 22:07             ` John Fastabend
  0 siblings, 1 reply; 58+ messages in thread
From: Alexei Starovoitov @ 2021-09-17 19:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Karlsson, Magnus, tirthendu.sarkar,
	Toke Høiland-Jørgensen

On Fri, Sep 17, 2021 at 12:00 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 17 Sep 2021 11:43:07 -0700 Alexei Starovoitov wrote:
> > > If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we
> > > can start with that. In all honesty I don't know what the exact
> > > use cases for looking at data are, either. I'm primarily worried
> > > about exposing the kernel internals too early.
> >
> > I don't mind the xdp equivalent of skb_load_bytes,
> > but skb_header_pointer() idea is superior.
> > When we did xdp with data/data_end there was no refine_retval_range
> > concept in the verifier (iirc or we just missed that opportunity).
> > We'd need something more advanced: a pointer with valid range
> > refined by input argument 'len' or NULL.
> > The verifier doesn't have such thing yet, but it fits as a combination of
> > value_or_null plus refine_retval_range.
> > The bpf_xdp_header_pointer() and bpf_skb_header_pointer()
> > would probably simplify bpf programs as well.
> > There would be no need to deal with data/data_end.
>
> What are your thoughts on inlining? Can we inline the common case
> of the header being in the "head"? Otherwise data/end comparisons
> would be faster.

Yeah. It can be inlined by the verifier.
It would still look like a call from bpf prog pov with llvm doing spill/fill
of scratched regs, but it's minor.

Also we can use the same bpf_header_pointer(ctx, ...)
helper for both xdp and skb program types. They will have different
implementation underneath, but this might make possible writing bpf
programs that could work in both xdp and skb context.
I believe cilium has fancy macros to achieve that.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-17 19:10           ` Alexei Starovoitov
@ 2021-09-17 22:07             ` John Fastabend
  2021-09-18 11:53               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 58+ messages in thread
From: John Fastabend @ 2021-09-17 22:07 UTC (permalink / raw)
  To: Alexei Starovoitov, Jakub Kicinski
  Cc: Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Karlsson, Magnus, tirthendu.sarkar,
	Toke Høiland-Jørgensen

Alexei Starovoitov wrote:
> On Fri, Sep 17, 2021 at 12:00 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 17 Sep 2021 11:43:07 -0700 Alexei Starovoitov wrote:
> > > > If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we
> > > > can start with that. In all honesty I don't know what the exact
> > > > use cases for looking at data are, either. I'm primarily worried
> > > > about exposing the kernel internals too early.
> > >
> > > I don't mind the xdp equivalent of skb_load_bytes,
> > > but skb_header_pointer() idea is superior.
> > > When we did xdp with data/data_end there was no refine_retval_range
> > > concept in the verifier (iirc or we just missed that opportunity).
> > > We'd need something more advanced: a pointer with valid range
> > > refined by input argument 'len' or NULL.
> > > The verifier doesn't have such thing yet, but it fits as a combination of
> > > value_or_null plus refine_retval_range.
> > > The bpf_xdp_header_pointer() and bpf_skb_header_pointer()
> > > would probably simplify bpf programs as well.
> > > There would be no need to deal with data/data_end.
> >
> > What are your thoughts on inlining? Can we inline the common case
> > of the header being in the "head"? Otherwise data/end comparisons
> > would be faster.
> 
> Yeah. It can be inlined by the verifier.
> It would still look like a call from bpf prog pov with llvm doing spill/fill
> of scratched regs, but it's minor.
> 
> Also we can use the same bpf_header_pointer(ctx, ...)
> helper for both xdp and skb program types. They will have different
> implementation underneath, but this might make possible writing bpf
> programs that could work in both xdp and skb context.
> I believe cilium has fancy macros to achieve that.

Hi,

First a header_pointer() logic that works across skb and xdp seems like
a great idea to me. I wonder though if instead of doing the copy
into a new buffer for offset past the initial frag like what is done in
skb_header_pointer could we just walk the frags and point at the new offset.
This is what we do on the socket side with bpf_msg_pull-data() for example.
For XDP it should also work. The skb case would depend on clone state
and things so might be a bit more tricky there.

This has the advantage of only doing the copy when its necessary. This
can be useful for example when reading the tail of an IPsec packet. With
blind copy most packets will get hit with a copy. By just writing the
pkt->data and pkt->data_end we can avoid this case.

Lorenz originally implemented something similar earlier and we had the
refine retval logic. It failed on no-alu32 for some reason we could
revisit. I didn't mind the current help returning with data pointer set
to the start of the frag so we stopped following up on it.

I agree though the current implementation puts a lot on the BPF writer.
So getting both cases covered, I want to take pains in my BPF prog
to avoid copies and I just want these bytes handled behind a single
helper seems good to me.

Thanks,
John

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-17 22:07             ` John Fastabend
@ 2021-09-18 11:53               ` Toke Høiland-Jørgensen
  2021-09-20 18:02                 ` Jakub Kicinski
  0 siblings, 1 reply; 58+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-18 11:53 UTC (permalink / raw)
  To: John Fastabend, Alexei Starovoitov, Jakub Kicinski
  Cc: Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Karlsson, Magnus, tirthendu.sarkar

John Fastabend <john.fastabend@gmail.com> writes:

> Alexei Starovoitov wrote:
>> On Fri, Sep 17, 2021 at 12:00 PM Jakub Kicinski <kuba@kernel.org> wrote:
>> >
>> > On Fri, 17 Sep 2021 11:43:07 -0700 Alexei Starovoitov wrote:
>> > > > If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we
>> > > > can start with that. In all honesty I don't know what the exact
>> > > > use cases for looking at data are, either. I'm primarily worried
>> > > > about exposing the kernel internals too early.
>> > >
>> > > I don't mind the xdp equivalent of skb_load_bytes,
>> > > but skb_header_pointer() idea is superior.
>> > > When we did xdp with data/data_end there was no refine_retval_range
>> > > concept in the verifier (iirc or we just missed that opportunity).
>> > > We'd need something more advanced: a pointer with valid range
>> > > refined by input argument 'len' or NULL.
>> > > The verifier doesn't have such thing yet, but it fits as a combination of
>> > > value_or_null plus refine_retval_range.
>> > > The bpf_xdp_header_pointer() and bpf_skb_header_pointer()
>> > > would probably simplify bpf programs as well.
>> > > There would be no need to deal with data/data_end.
>> >
>> > What are your thoughts on inlining? Can we inline the common case
>> > of the header being in the "head"? Otherwise data/end comparisons
>> > would be faster.
>> 
>> Yeah. It can be inlined by the verifier.
>> It would still look like a call from bpf prog pov with llvm doing spill/fill
>> of scratched regs, but it's minor.
>> 
>> Also we can use the same bpf_header_pointer(ctx, ...)
>> helper for both xdp and skb program types. They will have different
>> implementation underneath, but this might make possible writing bpf
>> programs that could work in both xdp and skb context.
>> I believe cilium has fancy macros to achieve that.
>
> Hi,
>
> First a header_pointer() logic that works across skb and xdp seems like
> a great idea to me. I wonder though if instead of doing the copy
> into a new buffer for offset past the initial frag like what is done in
> skb_header_pointer could we just walk the frags and point at the new offset.
> This is what we do on the socket side with bpf_msg_pull-data() for example.
> For XDP it should also work. The skb case would depend on clone state
> and things so might be a bit more tricky there.
>
> This has the advantage of only doing the copy when its necessary. This
> can be useful for example when reading the tail of an IPsec packet. With
> blind copy most packets will get hit with a copy. By just writing the
> pkt->data and pkt->data_end we can avoid this case.
>
> Lorenz originally implemented something similar earlier and we had the
> refine retval logic. It failed on no-alu32 for some reason we could
> revisit. I didn't mind the current help returning with data pointer set
> to the start of the frag so we stopped following up on it.
>
> I agree though the current implementation puts a lot on the BPF writer.
> So getting both cases covered, I want to take pains in my BPF prog
> to avoid copies and I just want these bytes handled behind a single
> helper seems good to me.

I'm OK with a bpf_header_pointer()-type helper - I quite like the
in-kernel version of this for SKBs, so replicating it as a BPF helper
would be great. But I'm a little worried about taking a performance hit.

I.e., if you do:

ptr = bpf_header_pointer(pkt, offset, len, stack_ptr)
*ptr = xxx;

then, if the helper ended up copying the data into the stack pointer,
you didn't actually change anything in the packet, so you need to do a
writeback.

Jakub suggested up-thread that this should be done with some kind of
flush() helper. But you don't know whether the header_pointer()-helper
copied the data, so you always need to call the flush() helper, which
will incur overhead. If the verifier can in-line the helpers that will
lower it, but will it be enough to make it negligible?

-Toke


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 03/18] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  2021-09-10 16:14 ` [PATCH v14 bpf-next 03/18] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
@ 2021-09-20  8:25   ` Shay Agroskin
  2021-09-20  8:37     ` Lorenzo Bianconi
  0 siblings, 1 reply; 58+ messages in thread
From: Shay Agroskin @ 2021-09-20  8:25 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke


Lorenzo Bianconi <lorenzo@kernel.org> writes:

> ...
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index 9d460a270601..0c7b84ca6efc 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> ...
> @@ -2320,8 +2325,12 @@ mvneta_swbm_build_skb(struct mvneta_port 
> *pp, struct page_pool *pool,
>  		      struct xdp_buff *xdp, u32 desc_status)
>  {
>  	struct skb_shared_info *sinfo = 
>  xdp_get_shared_info_from_buff(xdp);
> -	int i, num_frags = sinfo->nr_frags;
>  	struct sk_buff *skb;
> +	u8 num_frags;
> +	int i;
> +
> +	if (unlikely(xdp_buff_is_mb(xdp)))
> +		num_frags = sinfo->nr_frags;

Hi,
nit, it seems that the num_frags assignment can be moved after the 
other 'if' condition you added (right before the 'for' for 
num_frags), or even be eliminated completely so that 
sinfo->nr_frags is used directly.
Either way it looks like you can remove one 'if'.

Shay

>  
>  	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
>  	if (!skb)
> @@ -2333,6 +2342,9 @@ mvneta_swbm_build_skb(struct mvneta_port 
> *pp, struct page_pool *pool,
>  	skb_put(skb, xdp->data_end - xdp->data);
>  	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
>  
> +	if (likely(!xdp_buff_is_mb(xdp)))
> +		goto out;
> +
>  	for (i = 0; i < num_frags; i++) {
>  		skb_frag_t *frag = &sinfo->frags[i];
>  
> @@ -2341,6 +2353,7 @@ mvneta_swbm_build_skb(struct mvneta_port 
> *pp, struct page_pool *pool,
>  				skb_frag_size(frag), PAGE_SIZE);
>  	}
>  
> +out:
>  	return skb;
>  }


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 03/18] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  2021-09-20  8:25   ` Shay Agroskin
@ 2021-09-20  8:37     ` Lorenzo Bianconi
  2021-09-20  8:45       ` Shay Agroskin
  0 siblings, 1 reply; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-20  8:37 UTC (permalink / raw)
  To: Shay Agroskin
  Cc: Lorenzo Bianconi, bpf, netdev, davem, kuba, ast, daniel,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

> 
> Lorenzo Bianconi <lorenzo@kernel.org> writes:
> 
> > ...
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c
> > b/drivers/net/ethernet/marvell/mvneta.c
> > index 9d460a270601..0c7b84ca6efc 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > ...
> > @@ -2320,8 +2325,12 @@ mvneta_swbm_build_skb(struct mvneta_port *pp,
> > struct page_pool *pool,
> >  		      struct xdp_buff *xdp, u32 desc_status)
> >  {
> >  	struct skb_shared_info *sinfo =  xdp_get_shared_info_from_buff(xdp);
> > -	int i, num_frags = sinfo->nr_frags;
> >  	struct sk_buff *skb;
> > +	u8 num_frags;
> > +	int i;
> > +
> > +	if (unlikely(xdp_buff_is_mb(xdp)))
> > +		num_frags = sinfo->nr_frags;
> 
> Hi,
> nit, it seems that the num_frags assignment can be moved after the other
> 'if' condition you added (right before the 'for' for num_frags), or even be
> eliminated completely so that sinfo->nr_frags is used directly.
> Either way it looks like you can remove one 'if'.
> 
> Shay

Hi Shay,

we can't move nr_frags assignement after build_skb() since this field will be
overwritten by that call.

Regards,
Lorenzo

> 
> >  	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
> >  	if (!skb)
> > @@ -2333,6 +2342,9 @@ mvneta_swbm_build_skb(struct mvneta_port *pp,
> > struct page_pool *pool,
> >  	skb_put(skb, xdp->data_end - xdp->data);
> >  	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
> > +	if (likely(!xdp_buff_is_mb(xdp)))
> > +		goto out;
> > +
> >  	for (i = 0; i < num_frags; i++) {
> >  		skb_frag_t *frag = &sinfo->frags[i];
> >   @@ -2341,6 +2353,7 @@ mvneta_swbm_build_skb(struct mvneta_port *pp,
> > struct page_pool *pool,
> >  				skb_frag_size(frag), PAGE_SIZE);
> >  	}
> > +out:
> >  	return skb;
> >  }
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 03/18] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  2021-09-20  8:37     ` Lorenzo Bianconi
@ 2021-09-20  8:45       ` Shay Agroskin
  2021-09-20  9:00         ` Lorenzo Bianconi
  0 siblings, 1 reply; 58+ messages in thread
From: Shay Agroskin @ 2021-09-20  8:45 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, bpf, netdev, davem, kuba, ast, daniel,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke


Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:

>> 
>> Lorenzo Bianconi <lorenzo@kernel.org> writes:
>> 
>> > ...
>> > diff --git a/drivers/net/ethernet/marvell/mvneta.c
>> > b/drivers/net/ethernet/marvell/mvneta.c
>> > index 9d460a270601..0c7b84ca6efc 100644
>> > --- a/drivers/net/ethernet/marvell/mvneta.c
>> > +++ b/drivers/net/ethernet/marvell/mvneta.c
>> > ...
>> > @@ -2320,8 +2325,12 @@ mvneta_swbm_build_skb(struct 
>> > mvneta_port *pp,
>> > struct page_pool *pool,
>> >  		      struct xdp_buff *xdp, u32 desc_status)
>> >  {
>> >  	struct skb_shared_info *sinfo = 
>> >  xdp_get_shared_info_from_buff(xdp);
>> > -	int i, num_frags = sinfo->nr_frags;
>> >  	struct sk_buff *skb;
>> > +	u8 num_frags;
>> > +	int i;
>> > +
>> > +	if (unlikely(xdp_buff_is_mb(xdp)))
>> > +		num_frags = sinfo->nr_frags;
>> 
>> Hi,
>> nit, it seems that the num_frags assignment can be moved after 
>> the other
>> 'if' condition you added (right before the 'for' for 
>> num_frags), or even be
>> eliminated completely so that sinfo->nr_frags is used directly.
>> Either way it looks like you can remove one 'if'.
>> 
>> Shay
>
> Hi Shay,
>
> we can't move nr_frags assignement after build_skb() since this 
> field will be
> overwritten by that call.
>
> Regards,
> Lorenzo
>

Sorry, silly mistake of me.

Guess this assignment can be done anyway since there doesn't seem 
to be new cache misses introduced by it.
Anyway, nice catch, sorry for misleading you

>> 
>> >  	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
>> >  	if (!skb)
>> > @@ -2333,6 +2342,9 @@ mvneta_swbm_build_skb(struct 
>> > mvneta_port *pp,
>> > struct page_pool *pool,
>> >  	skb_put(skb, xdp->data_end - xdp->data);
>> >  	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
>> > +	if (likely(!xdp_buff_is_mb(xdp)))
>> > +		goto out;
>> > +
>> >  	for (i = 0; i < num_frags; i++) {
>> >  		skb_frag_t *frag = &sinfo->frags[i];
>> >   @@ -2341,6 +2353,7 @@ mvneta_swbm_build_skb(struct 
>> >   mvneta_port *pp,
>> > struct page_pool *pool,
>> >  				skb_frag_size(frag), PAGE_SIZE);
>> >  	}
>> > +out:
>> >  	return skb;
>> >  }
>> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 03/18] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  2021-09-20  8:45       ` Shay Agroskin
@ 2021-09-20  9:00         ` Lorenzo Bianconi
  0 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-09-20  9:00 UTC (permalink / raw)
  To: Shay Agroskin
  Cc: Lorenzo Bianconi, bpf, netdev, davem, kuba, ast, daniel,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

[-- Attachment #1: Type: text/plain, Size: 2559 bytes --]

> 
> Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:
> 
> > > 
> > > Lorenzo Bianconi <lorenzo@kernel.org> writes:
> > > 
> > > > ...
> > > > diff --git a/drivers/net/ethernet/marvell/mvneta.c
> > > > b/drivers/net/ethernet/marvell/mvneta.c
> > > > index 9d460a270601..0c7b84ca6efc 100644
> > > > --- a/drivers/net/ethernet/marvell/mvneta.c
> > > > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > > > ...
> > > > @@ -2320,8 +2325,12 @@ mvneta_swbm_build_skb(struct > mvneta_port
> > > *pp,
> > > > struct page_pool *pool,
> > > >  		      struct xdp_buff *xdp, u32 desc_status)
> > > >  {
> > > >  	struct skb_shared_info *sinfo = >
> > > xdp_get_shared_info_from_buff(xdp);
> > > > -	int i, num_frags = sinfo->nr_frags;
> > > >  	struct sk_buff *skb;
> > > > +	u8 num_frags;
> > > > +	int i;
> > > > +
> > > > +	if (unlikely(xdp_buff_is_mb(xdp)))
> > > > +		num_frags = sinfo->nr_frags;
> > > 
> > > Hi,
> > > nit, it seems that the num_frags assignment can be moved after the
> > > other
> > > 'if' condition you added (right before the 'for' for num_frags), or
> > > even be
> > > eliminated completely so that sinfo->nr_frags is used directly.
> > > Either way it looks like you can remove one 'if'.
> > > 
> > > Shay
> > 
> > Hi Shay,
> > 
> > we can't move nr_frags assignement after build_skb() since this field
> > will be
> > overwritten by that call.
> > 
> > Regards,
> > Lorenzo
> > 
> 
> Sorry, silly mistake of me.
> 
> Guess this assignment can be done anyway since there doesn't seem to be new
> cache misses introduced by it.
> Anyway, nice catch, sorry for misleading you

actually we probably have a cache miss in this case for the single-buffer use case
since skb_shared_info will not be in the same cache-line.

Regards,
Lorenzo

> 
> > > 
> > > >  	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
> > > >  	if (!skb)
> > > > @@ -2333,6 +2342,9 @@ mvneta_swbm_build_skb(struct > mvneta_port
> > > *pp,
> > > > struct page_pool *pool,
> > > >  	skb_put(skb, xdp->data_end - xdp->data);
> > > >  	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
> > > > +	if (likely(!xdp_buff_is_mb(xdp)))
> > > > +		goto out;
> > > > +
> > > >  	for (i = 0; i < num_frags; i++) {
> > > >  		skb_frag_t *frag = &sinfo->frags[i];
> > > >   @@ -2341,6 +2353,7 @@ mvneta_swbm_build_skb(struct >
> > > mvneta_port *pp,
> > > > struct page_pool *pool,
> > > >  				skb_frag_size(frag), PAGE_SIZE);
> > > >  	}
> > > > +out:
> > > >  	return skb;
> > > >  }
> > > 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-18 11:53               ` Toke Høiland-Jørgensen
@ 2021-09-20 18:02                 ` Jakub Kicinski
  2021-09-20 21:01                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-20 18:02 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Alexei Starovoitov, Lorenzo Bianconi, bpf,
	Network Development, Lorenzo Bianconi, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, David Ahern,
	Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Karlsson,
	Magnus, tirthendu.sarkar

On Sat, 18 Sep 2021 13:53:35 +0200 Toke Høiland-Jørgensen wrote:
> I'm OK with a bpf_header_pointer()-type helper - I quite like the
> in-kernel version of this for SKBs, so replicating it as a BPF helper
> would be great. But I'm a little worried about taking a performance hit.
> 
> I.e., if you do:
> 
> ptr = bpf_header_pointer(pkt, offset, len, stack_ptr)
> *ptr = xxx;
> 
> then, if the helper ended up copying the data into the stack pointer,
> you didn't actually change anything in the packet, so you need to do a
> writeback.
> 
> Jakub suggested up-thread that this should be done with some kind of
> flush() helper. But you don't know whether the header_pointer()-helper
> copied the data, so you always need to call the flush() helper, which
> will incur overhead. If the verifier can in-line the helpers that will
> lower it, but will it be enough to make it negligible?

Depends on the assumptions the program otherwise makes, right?

For reading I'd expect a *layout-independent* TC program would 
replace approximately:

ptr = <some_ptr>;
if (ptr + CONST >= md->ptr_end)
	if (bpf_pull_data(md, off + CONST))
		return DROP;
	ptr = <some_ptr>;
	if (ptr + CONST >= md->ptr_end)
		return DROP; /* da hell? */
}

With this (pre-inlining):

ptr = bpf_header_pointer(md, offset, len, stack);
if (!ptr)
	return DROP;

Post-inlining (assuming static validation of args to prevent wraps):

if (md->ptr + args->off + args->len < md->ptr_end)
	ptr = md->ptr + args->off;
else
	ptr = __bpf_header_pointer(md, offset, len, stack);
if (!ptr)
	return DROP;

But that's based on guesswork so perhaps I'm off base.


Regarding the flush() I was expecting that most progs will not modify
the packet (or at least won't modify most headers they load) so no
point paying the price of tracking changes auto-magically.

In fact I don't think there is anything infra can do better for
flushing than the prog itself:

	bool mod = false;

	ptr = bpf_header_pointer(...);
	...
	if (some_cond(...)) {
		change_packet(...);
		mod = true;
	}
	...
	if (mod)
		bpf_header_pointer_flush();


is simple enough.. to me.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-20 18:02                 ` Jakub Kicinski
@ 2021-09-20 21:01                   ` Toke Høiland-Jørgensen
  2021-09-20 21:25                     ` Jakub Kicinski
  0 siblings, 1 reply; 58+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-20 21:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Fastabend, Alexei Starovoitov, Lorenzo Bianconi, bpf,
	Network Development, Lorenzo Bianconi, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, David Ahern,
	Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Karlsson,
	Magnus, tirthendu.sarkar

Jakub Kicinski <kuba@kernel.org> writes:

> On Sat, 18 Sep 2021 13:53:35 +0200 Toke Høiland-Jørgensen wrote:
>> I'm OK with a bpf_header_pointer()-type helper - I quite like the
>> in-kernel version of this for SKBs, so replicating it as a BPF helper
>> would be great. But I'm a little worried about taking a performance hit.
>> 
>> I.e., if you do:
>> 
>> ptr = bpf_header_pointer(pkt, offset, len, stack_ptr)
>> *ptr = xxx;
>> 
>> then, if the helper ended up copying the data into the stack pointer,
>> you didn't actually change anything in the packet, so you need to do a
>> writeback.
>> 
>> Jakub suggested up-thread that this should be done with some kind of
>> flush() helper. But you don't know whether the header_pointer()-helper
>> copied the data, so you always need to call the flush() helper, which
>> will incur overhead. If the verifier can in-line the helpers that will
>> lower it, but will it be enough to make it negligible?
>
> Depends on the assumptions the program otherwise makes, right?
>
> For reading I'd expect a *layout-independent* TC program would 
> replace approximately:
>
> ptr = <some_ptr>;
> if (ptr + CONST >= md->ptr_end)
> 	if (bpf_pull_data(md, off + CONST))
> 		return DROP;
> 	ptr = <some_ptr>;
> 	if (ptr + CONST >= md->ptr_end)
> 		return DROP; /* da hell? */
> }
>
> With this (pre-inlining):
>
> ptr = bpf_header_pointer(md, offset, len, stack);
> if (!ptr)
> 	return DROP;
>
> Post-inlining (assuming static validation of args to prevent wraps):
>
> if (md->ptr + args->off + args->len < md->ptr_end)
> 	ptr = md->ptr + args->off;
> else
> 	ptr = __bpf_header_pointer(md, offset, len, stack);
> if (!ptr)
> 	return DROP;
>
> But that's based on guesswork so perhaps I'm off base.

Yeah, that's more or less what I was thinking...

> Regarding the flush() I was expecting that most progs will not modify
> the packet (or at least won't modify most headers they load) so no
> point paying the price of tracking changes auto-magically.

... but I guess my mental model assumed that packet writes would be just
as frequent as reads, in which case it would be a win if you could amend
your example like:

> In fact I don't think there is anything infra can do better for
> flushing than the prog itself:
>
> 	bool mod = false;
>
> 	ptr = bpf_header_pointer(...);
> 	...
> 	if (some_cond(...)) {
> 		change_packet(...);
> 		mod = true;
> 	}
> 	...
> 	if (mod)

to have an additional check like:

if (mod && ptr == stack)

(or something to that effect). No?

-Toke

> 		bpf_header_pointer_flush();
>
>
> is simple enough.. to me.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-20 21:01                   ` Toke Høiland-Jørgensen
@ 2021-09-20 21:25                     ` Jakub Kicinski
  2021-09-20 22:44                       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-20 21:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: John Fastabend, Alexei Starovoitov, Lorenzo Bianconi, bpf,
	Network Development, Lorenzo Bianconi, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, David Ahern,
	Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Karlsson,
	Magnus, tirthendu.sarkar

On Mon, 20 Sep 2021 23:01:48 +0200 Toke Høiland-Jørgensen wrote:
> > In fact I don't think there is anything infra can do better for
> > flushing than the prog itself:
> >
> > 	bool mod = false;
> >
> > 	ptr = bpf_header_pointer(...);
> > 	...
> > 	if (some_cond(...)) {
> > 		change_packet(...);
> > 		mod = true;
> > 	}
> > 	...
> > 	if (mod)  
> 
> to have an additional check like:
> 
> if (mod && ptr == stack)
> 
> (or something to that effect). No?

Good point. Do you think we should have the kernel add/inline this
optimization or have the user do it explicitly.

The draft API was:

void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, 
                           u32 offset, u32 len, void *stack_buf)

Which does not take the ptr returned by header_pointer(), but that's
easy to add (well, easy other than the fact it'd be the 6th arg).

BTW I drafted the API this way to cater to the case where flush()
is called without a prior call to header_pointer(). For when packet
trailer or header is populated directly from a map value. Dunno if
that's actually useful, either.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-20 21:25                     ` Jakub Kicinski
@ 2021-09-20 22:44                       ` Toke Høiland-Jørgensen
  2021-09-21 10:03                         ` Eelco Chaudron
  2021-09-29 10:36                         ` Lorenz Bauer
  0 siblings, 2 replies; 58+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-20 22:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Fastabend, Alexei Starovoitov, Lorenzo Bianconi, bpf,
	Network Development, Lorenzo Bianconi, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, David Ahern,
	Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Karlsson,
	Magnus, tirthendu.sarkar

Jakub Kicinski <kuba@kernel.org> writes:

> On Mon, 20 Sep 2021 23:01:48 +0200 Toke Høiland-Jørgensen wrote:
>> > In fact I don't think there is anything infra can do better for
>> > flushing than the prog itself:
>> >
>> > 	bool mod = false;
>> >
>> > 	ptr = bpf_header_pointer(...);
>> > 	...
>> > 	if (some_cond(...)) {
>> > 		change_packet(...);
>> > 		mod = true;
>> > 	}
>> > 	...
>> > 	if (mod)  
>> 
>> to have an additional check like:
>> 
>> if (mod && ptr == stack)
>> 
>> (or something to that effect). No?
>
> Good point. Do you think we should have the kernel add/inline this
> optimization or have the user do it explicitly.

Hmm, good question. On the one hand it seems like an easy optimisation
to add, but on the other hand maybe the caller has other logic that can
better know how/when to omit the check.

Hmm, but the helper needs to check it anyway, doesn't it? At least it
can't just blindly memcpy() if the source and destination would be the
same...

> The draft API was:
>
> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, 
>                            u32 offset, u32 len, void *stack_buf)
>
> Which does not take the ptr returned by header_pointer(), but that's
> easy to add (well, easy other than the fact it'd be the 6th arg).

I guess we could play some trickery with stuffing offset/len/flags into
one or two u64s to save an argument or two?

> BTW I drafted the API this way to cater to the case where flush()
> is called without a prior call to header_pointer(). For when packet
> trailer or header is populated directly from a map value. Dunno if
> that's actually useful, either.

Ah, didn't think of that; so then it really becomes a generic
xdp_store_bytes()-type helper? Might be useful, I suppose. Adding
headers is certainly a fairly common occurrence, but dunno to what
extent they'd be copied wholesale from a map (hadn't thought about doing
that before either).

-Toke


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-20 22:44                       ` Toke Høiland-Jørgensen
@ 2021-09-21 10:03                         ` Eelco Chaudron
  2021-09-28 11:48                           ` Magnus Karlsson
  2021-09-29 10:36                         ` Lorenz Bauer
  1 sibling, 1 reply; 58+ messages in thread
From: Eelco Chaudron @ 2021-09-21 10:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, John Fastabend, Alexei Starovoitov,
	Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	David Ahern, Jesper Dangaard Brouer, Jason Wang, Alexander Duyck,
	Saeed Mahameed, Fijalkowski, Maciej, Karlsson, Magnus,
	tirthendu.sarkar



On 21 Sep 2021, at 0:44, Toke Høiland-Jørgensen wrote:

> Jakub Kicinski <kuba@kernel.org> writes:
>
>> On Mon, 20 Sep 2021 23:01:48 +0200 Toke Høiland-Jørgensen wrote:
>>>> In fact I don't think there is anything infra can do better for
>>>> flushing than the prog itself:
>>>>
>>>> 	bool mod = false;
>>>>
>>>> 	ptr = bpf_header_pointer(...);
>>>> 	...
>>>> 	if (some_cond(...)) {
>>>> 		change_packet(...);
>>>> 		mod = true;
>>>> 	}
>>>> 	...
>>>> 	if (mod)
>>>
>>> to have an additional check like:
>>>
>>> if (mod && ptr == stack)
>>>
>>> (or something to that effect). No?
>>
>> Good point. Do you think we should have the kernel add/inline this
>> optimization or have the user do it explicitly.
>
> Hmm, good question. On the one hand it seems like an easy optimisation
> to add, but on the other hand maybe the caller has other logic that can
> better know how/when to omit the check.
>
> Hmm, but the helper needs to check it anyway, doesn't it? At least it
> can't just blindly memcpy() if the source and destination would be the
> same...
>
>> The draft API was:
>>
>> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags,
>>                            u32 offset, u32 len, void *stack_buf)
>>
>> Which does not take the ptr returned by header_pointer(), but that's
>> easy to add (well, easy other than the fact it'd be the 6th arg).
>
> I guess we could play some trickery with stuffing offset/len/flags into
> one or two u64s to save an argument or two?
>
>> BTW I drafted the API this way to cater to the case where flush()
>> is called without a prior call to header_pointer(). For when packet
>> trailer or header is populated directly from a map value. Dunno if
>> that's actually useful, either.
>
> Ah, didn't think of that; so then it really becomes a generic
> xdp_store_bytes()-type helper? Might be useful, I suppose. Adding
> headers is certainly a fairly common occurrence, but dunno to what
> extent they'd be copied wholesale from a map (hadn't thought about doing
> that before either).


Sorry for commenting late but I was busy and had to catch up on emails...

I like the idea, as these APIs are exactly what I proposed in April, https://lore.kernel.org/bpf/FD3E6E08-DE78-4FBA-96F6-646C93E88631@redhat.com/

I did not call it flush, as it can be used as a general function to copy data to a specific location.


//Eelco


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-21 10:03                         ` Eelco Chaudron
@ 2021-09-28 11:48                           ` Magnus Karlsson
  0 siblings, 0 replies; 58+ messages in thread
From: Magnus Karlsson @ 2021-09-28 11:48 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: Toke Høiland-Jørgensen, Jakub Kicinski, John Fastabend,
	Alexei Starovoitov, Lorenzo Bianconi, bpf, Network Development,
	Lorenzo Bianconi, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, shayagr, David Ahern, Jesper Dangaard Brouer,
	Jason Wang, Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej,
	Karlsson, Magnus, Tirthendu

On Tue, Sep 21, 2021 at 12:04 PM Eelco Chaudron <echaudro@redhat.com> wrote:
>
>
>
> On 21 Sep 2021, at 0:44, Toke Høiland-Jørgensen wrote:
>
> > Jakub Kicinski <kuba@kernel.org> writes:
> >
> >> On Mon, 20 Sep 2021 23:01:48 +0200 Toke Høiland-Jørgensen wrote:
> >>>> In fact I don't think there is anything infra can do better for
> >>>> flushing than the prog itself:
> >>>>
> >>>>    bool mod = false;
> >>>>
> >>>>    ptr = bpf_header_pointer(...);
> >>>>    ...
> >>>>    if (some_cond(...)) {
> >>>>            change_packet(...);
> >>>>            mod = true;
> >>>>    }
> >>>>    ...
> >>>>    if (mod)
> >>>
> >>> to have an additional check like:
> >>>
> >>> if (mod && ptr == stack)
> >>>
> >>> (or something to that effect). No?
> >>
> >> Good point. Do you think we should have the kernel add/inline this
> >> optimization or have the user do it explicitly.
> >
> > Hmm, good question. On the one hand it seems like an easy optimisation
> > to add, but on the other hand maybe the caller has other logic that can
> > better know how/when to omit the check.
> >
> > Hmm, but the helper needs to check it anyway, doesn't it? At least it
> > can't just blindly memcpy() if the source and destination would be the
> > same...
> >
> >> The draft API was:
> >>
> >> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags,
> >>                            u32 offset, u32 len, void *stack_buf)
> >>
> >> Which does not take the ptr returned by header_pointer(), but that's
> >> easy to add (well, easy other than the fact it'd be the 6th arg).
> >
> > I guess we could play some trickery with stuffing offset/len/flags into
> > one or two u64s to save an argument or two?
> >
> >> BTW I drafted the API this way to cater to the case where flush()
> >> is called without a prior call to header_pointer(). For when packet
> >> trailer or header is populated directly from a map value. Dunno if
> >> that's actually useful, either.
> >
> > Ah, didn't think of that; so then it really becomes a generic
> > xdp_store_bytes()-type helper? Might be useful, I suppose. Adding
> > headers is certainly a fairly common occurrence, but dunno to what
> > extent they'd be copied wholesale from a map (hadn't thought about doing
> > that before either).
>
>
> Sorry for commenting late but I was busy and had to catch up on emails...
>
> I like the idea, as these APIs are exactly what I proposed in April, https://lore.kernel.org/bpf/FD3E6E08-DE78-4FBA-96F6-646C93E88631@redhat.com/
>
> I did not call it flush, as it can be used as a general function to copy data to a specific location.

Here is some performance data (throughput) for this patch set on i40e
(40 Gbit/s NIC). All using the xdp_rxq_info sample and NO multi-buffer
packets.

With v14 only:

XDP_DROP: +4%
XDP_TX: +1%
XDP_PASS: -1%

With v14 plus multi-buffer support implemented in i40e courtesy of Tirtha:

XDP_DROP: +3%
XDP_TX: -1%
XDP_PASS: -2%

/Magnus

>
> //Eelco
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-20 22:44                       ` Toke Høiland-Jørgensen
  2021-09-21 10:03                         ` Eelco Chaudron
@ 2021-09-29 10:36                         ` Lorenz Bauer
  2021-09-29 12:25                           ` Toke Høiland-Jørgensen
  2021-09-29 17:46                           ` Jakub Kicinski
  1 sibling, 2 replies; 58+ messages in thread
From: Lorenz Bauer @ 2021-09-29 10:36 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, John Fastabend, Alexei Starovoitov,
	Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	David Ahern, Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Karlsson,
	Magnus, tirthendu.sarkar

On Mon, 20 Sept 2021 at 23:46, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > The draft API was:
> >
> > void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags,
> >                            u32 offset, u32 len, void *stack_buf)
> >
> > Which does not take the ptr returned by header_pointer(), but that's
> > easy to add (well, easy other than the fact it'd be the 6th arg).
>
> I guess we could play some trickery with stuffing offset/len/flags into
> one or two u64s to save an argument or two?

Adding another pointer arg seems really hard to explain as an API.
What happens if I pass the "wrong" ptr? What happens if I pass NULL?

How about this: instead of taking stack_ptr, take the return value
from header_pointer as an argument. Then use the already existing
(right ;P) inlining to do the following:

   if (md->ptr + args->off != ret_ptr)
     __pointer_flush(...)

This means that __pointer_flush has to deal with aliased memory
though, so it would always have to memmove. Probably OK for the "slow"
path?

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-16 16:55 ` [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Jakub Kicinski
  2021-09-17 14:51   ` Lorenzo Bianconi
@ 2021-09-29 10:41   ` Lorenz Bauer
  2021-09-29 12:10     ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 58+ messages in thread
From: Lorenz Bauer @ 2021-09-29 10:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Lorenzo Bianconi, bpf, Networking, Lorenzo Bianconi,
	David S . Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Maciej Fijalkowski, Karlsson, Magnus, tirthendu.sarkar,
	Toke Høiland-Jørgensen

On Thu, 16 Sept 2021 at 18:47, Jakub Kicinski <kuba@kernel.org> wrote:
>
> Won't applications end up building something like skb_header_pointer()
> based on bpf_xdp_adjust_data(), anyway? In which case why don't we
> provide them what they need?
>
> say:
>
> void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags,
>                      u32 offset, u32 len, void *stack_buf)
>
> flags and offset can be squashed into one u64 as needed. Helper returns
> pointer to packet data, either real one or stack_buf. Verifier has to
> be taught that the return value is NULL or a pointer which is safe with
> offsets up to @len.
>
> If the reason for access is write we'd also need:
>
> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags,
>                            u32 offset, u32 len, void *stack_buf)

Yes! This would be so much better than bpf_skb_load/store_bytes(),
especially if we can use it for both XDP and skb contexts as stated
elsewhere in this thread.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 10:41   ` Lorenz Bauer
@ 2021-09-29 12:10     ` Toke Høiland-Jørgensen
  2021-09-29 12:38       ` Lorenz Bauer
  0 siblings, 1 reply; 58+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-29 12:10 UTC (permalink / raw)
  To: Lorenz Bauer, Jakub Kicinski
  Cc: Lorenzo Bianconi, bpf, Networking, Lorenzo Bianconi,
	David S . Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Maciej Fijalkowski, Karlsson, Magnus, tirthendu.sarkar

Lorenz Bauer <lmb@cloudflare.com> writes:

> On Thu, 16 Sept 2021 at 18:47, Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> Won't applications end up building something like skb_header_pointer()
>> based on bpf_xdp_adjust_data(), anyway? In which case why don't we
>> provide them what they need?
>>
>> say:
>>
>> void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags,
>>                      u32 offset, u32 len, void *stack_buf)
>>
>> flags and offset can be squashed into one u64 as needed. Helper returns
>> pointer to packet data, either real one or stack_buf. Verifier has to
>> be taught that the return value is NULL or a pointer which is safe with
>> offsets up to @len.
>>
>> If the reason for access is write we'd also need:
>>
>> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags,
>>                            u32 offset, u32 len, void *stack_buf)
>
> Yes! This would be so much better than bpf_skb_load/store_bytes(),
> especially if we can use it for both XDP and skb contexts as stated
> elsewhere in this thread.

Alright. Let's see if we can go this route, then :)

-Toke


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 10:36                         ` Lorenz Bauer
@ 2021-09-29 12:25                           ` Toke Høiland-Jørgensen
  2021-09-29 12:32                             ` Lorenz Bauer
  2021-09-29 17:48                             ` Jakub Kicinski
  2021-09-29 17:46                           ` Jakub Kicinski
  1 sibling, 2 replies; 58+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-29 12:25 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Jakub Kicinski, John Fastabend, Alexei Starovoitov,
	Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	David Ahern, Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Karlsson,
	Magnus, tirthendu.sarkar

Lorenz Bauer <lmb@cloudflare.com> writes:

> On Mon, 20 Sept 2021 at 23:46, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> > The draft API was:
>> >
>> > void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags,
>> >                            u32 offset, u32 len, void *stack_buf)
>> >
>> > Which does not take the ptr returned by header_pointer(), but that's
>> > easy to add (well, easy other than the fact it'd be the 6th arg).
>>
>> I guess we could play some trickery with stuffing offset/len/flags into
>> one or two u64s to save an argument or two?
>
> Adding another pointer arg seems really hard to explain as an API.
> What happens if I pass the "wrong" ptr? What happens if I pass NULL?
>
> How about this: instead of taking stack_ptr, take the return value
> from header_pointer as an argument.

Hmm, that's a good point; I do think that passing the return value from
header pointer is more natural as well (you're flushing pointer you just
wrote to, after all).

> Then use the already existing (right ;P) inlining to do the following:
>
>    if (md->ptr + args->off != ret_ptr)
>      __pointer_flush(...)

The inlining is orthogonal, though, right? The helper can do this check
whether or not it's a proper CALL or not (although obviously for
performance reasons we do want it to inline, at least eventually). In
particular, I believe we can make progress on this patch series without
working out the inlining :)

> This means that __pointer_flush has to deal with aliased memory
> though, so it would always have to memmove. Probably OK for the "slow"
> path?

Erm, not sure what you mean here? Yeah, flushing is going to take longer
if you ended up using the stack pointer instead of writing directly to
the packet. That's kinda intrinsic? Or am I misunderstanding you?

-Toke


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 12:25                           ` Toke Høiland-Jørgensen
@ 2021-09-29 12:32                             ` Lorenz Bauer
  2021-09-29 17:48                             ` Jakub Kicinski
  1 sibling, 0 replies; 58+ messages in thread
From: Lorenz Bauer @ 2021-09-29 12:32 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, John Fastabend, Alexei Starovoitov,
	Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	David Ahern, Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Karlsson,
	Magnus, tirthendu.sarkar

On Wed, 29 Sept 2021 at 13:25, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> > Then use the already existing (right ;P) inlining to do the following:
> >
> >    if (md->ptr + args->off != ret_ptr)
> >      __pointer_flush(...)
>
> The inlining is orthogonal, though, right? The helper can do this check
> whether or not it's a proper CALL or not (although obviously for
> performance reasons we do want it to inline, at least eventually). In
> particular, I believe we can make progress on this patch series without
> working out the inlining :)

Yes, I was just worried that your answer would be "it's too expensive" ;)

> > This means that __pointer_flush has to deal with aliased memory
> > though, so it would always have to memmove. Probably OK for the "slow"
> > path?
>
> Erm, not sure what you mean here? Yeah, flushing is going to take longer
> if you ended up using the stack pointer instead of writing directly to
> the packet. That's kinda intrinsic? Or am I misunderstanding you?

I think I misunderstood your comment about memcpy to mean "want to
avoid aliased memory for perf reasons". Never mind!

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 12:10     ` Toke Høiland-Jørgensen
@ 2021-09-29 12:38       ` Lorenz Bauer
  2021-09-29 18:54         ` Alexei Starovoitov
  0 siblings, 1 reply; 58+ messages in thread
From: Lorenz Bauer @ 2021-09-29 12:38 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, Lorenzo Bianconi, bpf, Networking,
	Lorenzo Bianconi, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, shayagr, John Fastabend, David Ahern,
	Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Maciej Fijalkowski, Karlsson,
	Magnus, tirthendu.sarkar

On Wed, 29 Sept 2021 at 13:10, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Lorenz Bauer <lmb@cloudflare.com> writes:
>
> > On Thu, 16 Sept 2021 at 18:47, Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> Won't applications end up building something like skb_header_pointer()
> >> based on bpf_xdp_adjust_data(), anyway? In which case why don't we
> >> provide them what they need?
> >>
> >> say:
> >>
> >> void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags,
> >>                      u32 offset, u32 len, void *stack_buf)
> >>
> >> flags and offset can be squashed into one u64 as needed. Helper returns
> >> pointer to packet data, either real one or stack_buf. Verifier has to
> >> be taught that the return value is NULL or a pointer which is safe with
> >> offsets up to @len.
> >>
> >> If the reason for access is write we'd also need:
> >>
> >> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags,
> >>                            u32 offset, u32 len, void *stack_buf)
> >
> > Yes! This would be so much better than bpf_skb_load/store_bytes(),
> > especially if we can use it for both XDP and skb contexts as stated
> > elsewhere in this thread.
>
> Alright. Let's see if we can go this route, then :)

Something I forgot to mention: you could infer that an XDP program is
mb-aware if it only does packet access via the helpers. Put another
way, it might be nice if ctx->data wasn't accessible in mb XDP. That
way I know that all packet access has to handle mb-aware ctx (think
pulling in functions via headers or even pre-compiled bpf libraries).

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 10:36                         ` Lorenz Bauer
  2021-09-29 12:25                           ` Toke Høiland-Jørgensen
@ 2021-09-29 17:46                           ` Jakub Kicinski
  1 sibling, 0 replies; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-29 17:46 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Toke Høiland-Jørgensen, John Fastabend,
	Alexei Starovoitov, Lorenzo Bianconi, bpf, Network Development,
	Lorenzo Bianconi, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, shayagr, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Karlsson, Magnus, tirthendu.sarkar

On Wed, 29 Sep 2021 11:36:33 +0100 Lorenz Bauer wrote:
> On Mon, 20 Sept 2021 at 23:46, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > > The draft API was:
> > >
> > > void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags,
> > >                            u32 offset, u32 len, void *stack_buf)
> > >
> > > Which does not take the ptr returned by header_pointer(), but that's
> > > easy to add (well, easy other than the fact it'd be the 6th arg).  
> >
> > I guess we could play some trickery with stuffing offset/len/flags into
> > one or two u64s to save an argument or two?  
> 
> Adding another pointer arg seems really hard to explain as an API.
> What happens if I pass the "wrong" ptr? What happens if I pass NULL?

Sure. We can leave the checking to the program then, but that ties
our hands for the implementation changes later on.

Not sure which pointer type will be chosen for the ret value but it 
may allow error checking at verification.

> How about this: instead of taking stack_ptr, take the return value
> from header_pointer as an argument. Then use the already existing
> (right ;P) inlining to do the following:
> 
>    if (md->ptr + args->off != ret_ptr)
>      __pointer_flush(...)

That only checks for the case where pointer is in the "head" frag,
and is not generally correct. You need to check the length of the 
first frag is smaller than off. Otherwise BPF stack may "happen"
to follow the head page and math will work out.

It would also be slower than Lorenzo's current code, which allows
access to tail pages without copying.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 12:25                           ` Toke Høiland-Jørgensen
  2021-09-29 12:32                             ` Lorenz Bauer
@ 2021-09-29 17:48                             ` Jakub Kicinski
  1 sibling, 0 replies; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-29 17:48 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Lorenz Bauer, John Fastabend, Alexei Starovoitov,
	Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	David Ahern, Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Karlsson,
	Magnus, tirthendu.sarkar

On Wed, 29 Sep 2021 14:25:09 +0200 Toke Høiland-Jørgensen wrote:
> >> I guess we could play some trickery with stuffing offset/len/flags into
> >> one or two u64s to save an argument or two?  
> >
> > Adding another pointer arg seems really hard to explain as an API.
> > What happens if I pass the "wrong" ptr? What happens if I pass NULL?
> >
> > How about this: instead of taking stack_ptr, take the return value
> > from header_pointer as an argument.  
> 
> Hmm, that's a good point; I do think that passing the return value from
> header pointer is more natural as well (you're flushing pointer you just
> wrote to, after all).

It is more natural but doesn't allow for the same level of optimization.
Wouldn't consistent naming obviate the semantics?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 12:38       ` Lorenz Bauer
@ 2021-09-29 18:54         ` Alexei Starovoitov
  2021-09-29 19:22           ` Jakub Kicinski
  0 siblings, 1 reply; 58+ messages in thread
From: Alexei Starovoitov @ 2021-09-29 18:54 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Toke Høiland-Jørgensen, Jakub Kicinski,
	Lorenzo Bianconi, bpf, Networking, Lorenzo Bianconi,
	David S . Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Maciej Fijalkowski, Karlsson, Magnus, tirthendu.sarkar

On Wed, Sep 29, 2021 at 5:38 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Wed, 29 Sept 2021 at 13:10, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > Lorenz Bauer <lmb@cloudflare.com> writes:
> >
> > > On Thu, 16 Sept 2021 at 18:47, Jakub Kicinski <kuba@kernel.org> wrote:
> > >>
> > >> Won't applications end up building something like skb_header_pointer()
> > >> based on bpf_xdp_adjust_data(), anyway? In which case why don't we
> > >> provide them what they need?
> > >>
> > >> say:
> > >>
> > >> void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags,
> > >>                      u32 offset, u32 len, void *stack_buf)
> > >>
> > >> flags and offset can be squashed into one u64 as needed. Helper returns
> > >> pointer to packet data, either real one or stack_buf. Verifier has to
> > >> be taught that the return value is NULL or a pointer which is safe with
> > >> offsets up to @len.
> > >>
> > >> If the reason for access is write we'd also need:
> > >>
> > >> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags,
> > >>                            u32 offset, u32 len, void *stack_buf)

I'm missing something. Why do we need a separate flush() helper?
Can't we do:
char buf[64], *p;
p = xdp_mb_pointer(ctx, flags, off, len, buf);
read/write p[]
if (p == buf)
    xdp_store_bytes(ctx, off, buf, len, flags);

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 18:54         ` Alexei Starovoitov
@ 2021-09-29 19:22           ` Jakub Kicinski
  2021-09-29 20:39             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 58+ messages in thread
From: Jakub Kicinski @ 2021-09-29 19:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Lorenz Bauer, Toke Høiland-Jørgensen, Lorenzo Bianconi,
	bpf, Networking, Lorenzo Bianconi, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, John Fastabend,
	David Ahern, Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Maciej Fijalkowski, Karlsson,
	Magnus, tirthendu.sarkar

On Wed, 29 Sep 2021 11:54:46 -0700 Alexei Starovoitov wrote:
> I'm missing something. Why do we need a separate flush() helper?
> Can't we do:
> char buf[64], *p;
> p = xdp_mb_pointer(ctx, flags, off, len, buf);
> read/write p[]
> if (p == buf)
>     xdp_store_bytes(ctx, off, buf, len, flags);

Sure we can. That's what I meant by "leave the checking to the program".
It's bike shedding at this point.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 19:22           ` Jakub Kicinski
@ 2021-09-29 20:39             ` Toke Høiland-Jørgensen
  2021-10-01  9:03               ` Lorenzo Bianconi
  0 siblings, 1 reply; 58+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-09-29 20:39 UTC (permalink / raw)
  To: Jakub Kicinski, Alexei Starovoitov
  Cc: Lorenz Bauer, Lorenzo Bianconi, bpf, Networking,
	Lorenzo Bianconi, David S . Miller, Alexei Starovoitov,
	Daniel Borkmann, shayagr, John Fastabend, David Ahern,
	Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Maciej Fijalkowski, Karlsson,
	Magnus, tirthendu.sarkar

Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 29 Sep 2021 11:54:46 -0700 Alexei Starovoitov wrote:
>> I'm missing something. Why do we need a separate flush() helper?
>> Can't we do:
>> char buf[64], *p;
>> p = xdp_mb_pointer(ctx, flags, off, len, buf);
>> read/write p[]
>> if (p == buf)
>>     xdp_store_bytes(ctx, off, buf, len, flags);
>
> Sure we can. That's what I meant by "leave the checking to the program".
> It's bike shedding at this point.

Yeah, let's discuss the details once we have a patch :)

-Toke


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-09-29 20:39             ` Toke Høiland-Jørgensen
@ 2021-10-01  9:03               ` Lorenzo Bianconi
  2021-10-01 18:35                 ` Jakub Kicinski
  0 siblings, 1 reply; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-10-01  9:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jakub Kicinski, Alexei Starovoitov, Lorenz Bauer, bpf,
	Networking, Lorenzo Bianconi, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, John Fastabend,
	David Ahern, Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Maciej Fijalkowski, Karlsson,
	Magnus, tirthendu.sarkar

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

> Jakub Kicinski <kuba@kernel.org> writes:
> 
> > On Wed, 29 Sep 2021 11:54:46 -0700 Alexei Starovoitov wrote:
> >> I'm missing something. Why do we need a separate flush() helper?
> >> Can't we do:
> >> char buf[64], *p;
> >> p = xdp_mb_pointer(ctx, flags, off, len, buf);
> >> read/write p[]
> >> if (p == buf)
> >>     xdp_store_bytes(ctx, off, buf, len, flags);
> >
> > Sure we can. That's what I meant by "leave the checking to the program".
> > It's bike shedding at this point.
> 
> Yeah, let's discuss the details once we have a patch :)
> 
> -Toke
> 

Hi,

I implemented the xdp_mb_pointer/xdp_mb_pointer_flush logic here (according to
current discussion):
https://github.com/LorenzoBianconi/bpf-next/commit/a5c61c0fa6cb05bab8caebd96aca5fbbd9510867

For the moment I have only defined two utility routines and I have not exported
them in ebpf helpers since I need to check what are missing bits in the verifier
code (but afaik this would be orthogonal with respect to the "helper code"):
- bpf_xdp_pointer --> xdp_mb_pointer
- bpf_xdp_copy_buf --> xdp_mb_pointer_flush

In order to test them I have defined two new ebpf helpers (they use
bpf_xdp_pointer/bpf_xdp_copy_buf internally):
- bpf_xdp_load_bytes
- bpf_xdp_store_bytes

In order to test bpf_xdp_load_bytes/bpf_xdp_store_bytes +
bpf_xdp_pointer/bpf_xdp_copy_buf I added some selftests here:
https://github.com/LorenzoBianconi/bpf-next/commit/5661a491a890c00db744f2884b7ee3a6d0319384

Can you please check if the code above is aligned to current requirements or if
it is missing something?
If this code it is fine, I guess we have two option here:
- integrate the commits above in xdp multi-buff series (posting v15) and work on
  the verfier code in parallel (if xdp_mb_pointer helper is not required from day0)
- integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes
  helper (probably we will still need bpf_xdp_store_bytes) and introduce
  bpf_xdp_pointer as new ebpf helper.

I am fine both ways. If we decide for the second option I would need some
guidance on verifier changes since I am not so familiar with that code.

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-10-01  9:03               ` Lorenzo Bianconi
@ 2021-10-01 18:35                 ` Jakub Kicinski
  2021-10-06  9:32                   ` Lorenzo Bianconi
  0 siblings, 1 reply; 58+ messages in thread
From: Jakub Kicinski @ 2021-10-01 18:35 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Lorenz Bauer, bpf, Networking, Lorenzo Bianconi,
	David S . Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Maciej Fijalkowski, Karlsson, Magnus, tirthendu.sarkar

On Fri, 1 Oct 2021 11:03:58 +0200 Lorenzo Bianconi wrote:
> Can you please check if the code above is aligned to current requirements or if
> it is missing something?
> If this code it is fine, I guess we have two option here:
> - integrate the commits above in xdp multi-buff series (posting v15) and work on
>   the verfier code in parallel (if xdp_mb_pointer helper is not required from day0)
> - integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes
>   helper (probably we will still need bpf_xdp_store_bytes) and introduce
>   bpf_xdp_pointer as new ebpf helper.

It wasn't clear to me that we wanted bpf_xdp_load_bytes() to exist.
But FWIW no preference here.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-10-01 18:35                 ` Jakub Kicinski
@ 2021-10-06  9:32                   ` Lorenzo Bianconi
  2021-10-06 10:08                     ` Eelco Chaudron
  0 siblings, 1 reply; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-10-06  9:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Lorenzo Bianconi, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Lorenz Bauer, bpf, Networking,
	David S . Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Maciej Fijalkowski, Karlsson, Magnus, tirthendu.sarkar

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

> On Fri, 1 Oct 2021 11:03:58 +0200 Lorenzo Bianconi wrote:
> > Can you please check if the code above is aligned to current requirements or if
> > it is missing something?
> > If this code it is fine, I guess we have two option here:
> > - integrate the commits above in xdp multi-buff series (posting v15) and work on
> >   the verfier code in parallel (if xdp_mb_pointer helper is not required from day0)
> > - integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes
> >   helper (probably we will still need bpf_xdp_store_bytes) and introduce
> >   bpf_xdp_pointer as new ebpf helper.
> 
> It wasn't clear to me that we wanted bpf_xdp_load_bytes() to exist.
> But FWIW no preference here.
> 

ack, same here. Any other opinion about it?

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-10-06  9:32                   ` Lorenzo Bianconi
@ 2021-10-06 10:08                     ` Eelco Chaudron
  2021-10-06 12:15                       ` Lorenzo Bianconi
  0 siblings, 1 reply; 58+ messages in thread
From: Eelco Chaudron @ 2021-10-06 10:08 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Jakub Kicinski, Lorenzo Bianconi,
	Toke Høiland-Jørgensen, Alexei Starovoitov,
	Lorenz Bauer, bpf, Networking, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, John Fastabend,
	David Ahern, Jesper Dangaard Brouer, Jason Wang, Alexander Duyck,
	Saeed Mahameed, Maciej Fijalkowski, Karlsson, Magnus,
	tirthendu.sarkar



On 6 Oct 2021, at 11:32, Lorenzo Bianconi wrote:

>> On Fri, 1 Oct 2021 11:03:58 +0200 Lorenzo Bianconi wrote:
>>> Can you please check if the code above is aligned to current requirements or if
>>> it is missing something?
>>> If this code it is fine, I guess we have two option here:
>>> - integrate the commits above in xdp multi-buff series (posting v15) and work on
>>>   the verfier code in parallel (if xdp_mb_pointer helper is not required from day0)
>>> - integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes
>>>   helper (probably we will still need bpf_xdp_store_bytes) and introduce
>>>   bpf_xdp_pointer as new ebpf helper.
>>
>> It wasn't clear to me that we wanted bpf_xdp_load_bytes() to exist.
>> But FWIW no preference here.
>>
>
> ack, same here. Any other opinion about it?

I was under the impression getting a pointer might be enough. But playing with the bpf ring buffers for a bit, it still might be handy to extract some data to be sent to userspace. So I would not mind keeping it.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support
  2021-10-06 10:08                     ` Eelco Chaudron
@ 2021-10-06 12:15                       ` Lorenzo Bianconi
  0 siblings, 0 replies; 58+ messages in thread
From: Lorenzo Bianconi @ 2021-10-06 12:15 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: Jakub Kicinski, Lorenzo Bianconi,
	Toke Høiland-Jørgensen, Alexei Starovoitov,
	Lorenz Bauer, bpf, Networking, David S . Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, John Fastabend,
	David Ahern, Jesper Dangaard Brouer, Jason Wang, Alexander Duyck,
	Saeed Mahameed, Maciej Fijalkowski, Karlsson, Magnus,
	tirthendu.sarkar

[-- Attachment #1: Type: text/plain, Size: 1315 bytes --]

> 
> 
> On 6 Oct 2021, at 11:32, Lorenzo Bianconi wrote:
> 
> >> On Fri, 1 Oct 2021 11:03:58 +0200 Lorenzo Bianconi wrote:
> >>> Can you please check if the code above is aligned to current requirements or if
> >>> it is missing something?
> >>> If this code it is fine, I guess we have two option here:
> >>> - integrate the commits above in xdp multi-buff series (posting v15) and work on
> >>>   the verfier code in parallel (if xdp_mb_pointer helper is not required from day0)
> >>> - integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes
> >>>   helper (probably we will still need bpf_xdp_store_bytes) and introduce
> >>>   bpf_xdp_pointer as new ebpf helper.
> >>
> >> It wasn't clear to me that we wanted bpf_xdp_load_bytes() to exist.
> >> But FWIW no preference here.
> >>
> >
> > ack, same here. Any other opinion about it?
> 
> I was under the impression getting a pointer might be enough. But playing with the bpf ring buffers for a bit, it still might be handy to extract some data to be sent to userspace. So I would not mind keeping it.
> 

ok, so it seems we have a use-case for bpf_xdp_load_bytes(). If everybody
agree, I will post v15 with them included and we then we can work in parallel
for a bpf_xdp_pointer ebpf helper.

Regards,
Lorenzo


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2021-10-06 12:15 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-10 16:14 [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 01/18] net: skbuff: add size metadata to skb_shared_info for xdp Lorenzo Bianconi
2021-09-10 16:18   ` Jesper Dangaard Brouer
2021-09-10 16:14 ` [PATCH v14 bpf-next 02/18] xdp: introduce flags field in xdp_buff/xdp_frame Lorenzo Bianconi
2021-09-10 16:19   ` Jesper Dangaard Brouer
2021-09-10 16:14 ` [PATCH v14 bpf-next 03/18] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
2021-09-20  8:25   ` Shay Agroskin
2021-09-20  8:37     ` Lorenzo Bianconi
2021-09-20  8:45       ` Shay Agroskin
2021-09-20  9:00         ` Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 04/18] net: mvneta: simplify mvneta_swbm_add_rx_fragment management Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 05/18] net: xdp: add xdp_update_skb_shared_info utility routine Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 06/18] net: marvell: rely on " Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 07/18] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 08/18] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 09/18] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 10/18] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
2021-09-16 16:55   ` Jakub Kicinski
2021-09-17 10:02     ` Lorenzo Bianconi
2021-09-17 13:03       ` Jakub Kicinski
2021-09-10 16:14 ` [PATCH v14 bpf-next 11/18] bpf: introduce bpf_xdp_get_buff_len helper Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 12/18] bpf: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 13/18] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 14/18] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 15/18] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 16/18] bpf: update xdp_adjust_tail selftest to include multi-buffer Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 17/18] net: xdp: introduce bpf_xdp_adjust_data helper Lorenzo Bianconi
2021-09-10 16:14 ` [PATCH v14 bpf-next 18/18] bpf: add bpf_xdp_adjust_data selftest Lorenzo Bianconi
2021-09-16 16:55 ` [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support Jakub Kicinski
2021-09-17 14:51   ` Lorenzo Bianconi
2021-09-17 18:33     ` Jakub Kicinski
2021-09-17 18:43       ` Alexei Starovoitov
2021-09-17 19:00         ` Jakub Kicinski
2021-09-17 19:10           ` Alexei Starovoitov
2021-09-17 22:07             ` John Fastabend
2021-09-18 11:53               ` Toke Høiland-Jørgensen
2021-09-20 18:02                 ` Jakub Kicinski
2021-09-20 21:01                   ` Toke Høiland-Jørgensen
2021-09-20 21:25                     ` Jakub Kicinski
2021-09-20 22:44                       ` Toke Høiland-Jørgensen
2021-09-21 10:03                         ` Eelco Chaudron
2021-09-28 11:48                           ` Magnus Karlsson
2021-09-29 10:36                         ` Lorenz Bauer
2021-09-29 12:25                           ` Toke Høiland-Jørgensen
2021-09-29 12:32                             ` Lorenz Bauer
2021-09-29 17:48                             ` Jakub Kicinski
2021-09-29 17:46                           ` Jakub Kicinski
2021-09-29 10:41   ` Lorenz Bauer
2021-09-29 12:10     ` Toke Høiland-Jørgensen
2021-09-29 12:38       ` Lorenz Bauer
2021-09-29 18:54         ` Alexei Starovoitov
2021-09-29 19:22           ` Jakub Kicinski
2021-09-29 20:39             ` Toke Høiland-Jørgensen
2021-10-01  9:03               ` Lorenzo Bianconi
2021-10-01 18:35                 ` Jakub Kicinski
2021-10-06  9:32                   ` Lorenzo Bianconi
2021-10-06 10:08                     ` Eelco Chaudron
2021-10-06 12:15                       ` Lorenzo Bianconi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).