Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH net-next 0/3] tcp_mmap: optmizations
@ 2020-08-20 17:11 Eric Dumazet
  2020-08-20 17:11 ` [PATCH net-next 1/3] selftests: net: tcp_mmap: use madvise(MADV_DONTNEED) Eric Dumazet
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Eric Dumazet @ 2020-08-20 17:11 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Arjun Roy

This series updates tcp_mmap reference tool to use best pratices.

First patch is using madvise(MADV_DONTNEED) to decrease pressure
on the socket lock.

Last patches try to use huge pages when available.

Eric Dumazet (3):
  selftests: net: tcp_mmap: use madvise(MADV_DONTNEED)
  selftests: net: tcp_mmap: Use huge pages in send path
  selftests: net: tcp_mmap: Use huge pages in receive path

 tools/testing/selftests/net/tcp_mmap.c | 42 +++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 7 deletions(-)

-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/3] selftests: net: tcp_mmap: use madvise(MADV_DONTNEED)
  2020-08-20 17:11 [PATCH net-next 0/3] tcp_mmap: optmizations Eric Dumazet
@ 2020-08-20 17:11 ` Eric Dumazet
  2020-08-20 17:11 ` [PATCH net-next 2/3] selftests: net: tcp_mmap: Use huge pages in send path Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2020-08-20 17:11 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Arjun Roy

When TCP_ZEROCOPY_RECEIVE operation has been added,
I made the mistake of automatically un-mapping prior
content before mapping new pages.

This has the unfortunate effect of adding potentially long
MMU operations (like TLB flushes) while socket lock is held.

Using madvise(MADV_DONTNEED) right after pages has been used
has two benefits :

1) This releases pages sooner, allowing pages to be recycled
if they were part of a page pool in a NIC driver.

2) No more long unmap operations while preventing immediate
processing of incoming packets.

The cost of the added system call is small enough.

Arjun will submit a kernel patch allowing to opt out from
the unmap attempt in tcp_zerocopy_receive()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
---
 tools/testing/selftests/net/tcp_mmap.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/net/tcp_mmap.c b/tools/testing/selftests/net/tcp_mmap.c
index a61b7b3da5496285876b0e16b18a3060850b0803..59ec0b59f7b76ff75685bd96901d8237e0665b2b 100644
--- a/tools/testing/selftests/net/tcp_mmap.c
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -179,6 +179,10 @@ void *child_thread(void *arg)
 				total_mmap += zc.length;
 				if (xflg)
 					hash_zone(addr, zc.length);
+				/* It is more efficient to unmap the pages right now,
+				 * instead of doing this in next TCP_ZEROCOPY_RECEIVE.
+				 */
+				madvise(addr, zc.length, MADV_DONTNEED);
 				total += zc.length;
 			}
 			if (zc.recv_skip_hint) {
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/3] selftests: net: tcp_mmap: Use huge pages in send path
  2020-08-20 17:11 [PATCH net-next 0/3] tcp_mmap: optmizations Eric Dumazet
  2020-08-20 17:11 ` [PATCH net-next 1/3] selftests: net: tcp_mmap: use madvise(MADV_DONTNEED) Eric Dumazet
@ 2020-08-20 17:11 ` Eric Dumazet
  2020-08-20 17:11 ` [PATCH net-next 3/3] selftests: net: tcp_mmap: Use huge pages in receive path Eric Dumazet
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2020-08-20 17:11 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Arjun Roy

There are significant gains using huge pages when
available, as shown in [1].

This patch adds mmap_large_buffer() and uses it
in client side (tx path of this reference tool)

Following patch will use the feature for server side.

[1] https://patchwork.ozlabs.org/project/netdev/patch/20200820154359.1806305-1-edumazet@google.com/

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
---
 tools/testing/selftests/net/tcp_mmap.c | 29 +++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/tcp_mmap.c b/tools/testing/selftests/net/tcp_mmap.c
index 59ec0b59f7b76ff75685bd96901d8237e0665b2b..ca2618f3e7a12ab6863665f465dea2e8d469131b 100644
--- a/tools/testing/selftests/net/tcp_mmap.c
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -123,6 +123,28 @@ void hash_zone(void *zone, unsigned int length)
 #define ALIGN_UP(x, align_to)	(((x) + ((align_to)-1)) & ~((align_to)-1))
 #define ALIGN_PTR_UP(p, ptr_align_to)	((typeof(p))ALIGN_UP((unsigned long)(p), ptr_align_to))
 
+
+static void *mmap_large_buffer(size_t need, size_t *allocated)
+{
+	void *buffer;
+	size_t sz;
+
+	/* Attempt to use huge pages if possible. */
+	sz = ALIGN_UP(need, map_align);
+	buffer = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		      MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
+
+	if (buffer == (void *)-1) {
+		sz = need;
+		buffer = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+			      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (buffer != (void *)-1)
+			fprintf(stderr, "MAP_HUGETLB attempt failed, look at /sys/kernel/mm/hugepages for optimal performance\n");
+	}
+	*allocated = sz;
+	return buffer;
+}
+
 void *child_thread(void *arg)
 {
 	unsigned long total_mmap = 0, total = 0;
@@ -351,6 +373,7 @@ int main(int argc, char *argv[])
 	uint64_t total = 0;
 	char *host = NULL;
 	int fd, c, on = 1;
+	size_t buffer_sz;
 	char *buffer;
 	int sflg = 0;
 	int mss = 0;
@@ -441,8 +464,8 @@ int main(int argc, char *argv[])
 		}
 		do_accept(fdlisten);
 	}
-	buffer = mmap(NULL, chunk_size, PROT_READ | PROT_WRITE,
-			      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+	buffer = mmap_large_buffer(chunk_size, &buffer_sz);
 	if (buffer == (char *)-1) {
 		perror("mmap");
 		exit(1);
@@ -488,6 +511,6 @@ int main(int argc, char *argv[])
 		total += wr;
 	}
 	close(fd);
-	munmap(buffer, chunk_size);
+	munmap(buffer, buffer_sz);
 	return 0;
 }
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 3/3] selftests: net: tcp_mmap: Use huge pages in receive path
  2020-08-20 17:11 [PATCH net-next 0/3] tcp_mmap: optmizations Eric Dumazet
  2020-08-20 17:11 ` [PATCH net-next 1/3] selftests: net: tcp_mmap: use madvise(MADV_DONTNEED) Eric Dumazet
  2020-08-20 17:11 ` [PATCH net-next 2/3] selftests: net: tcp_mmap: Use huge pages in send path Eric Dumazet
@ 2020-08-20 17:11 ` Eric Dumazet
  2020-08-20 17:31 ` [PATCH net-next 0/3] tcp_mmap: optmizations Soheil Hassas Yeganeh
  2020-08-20 23:15 ` David Miller
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2020-08-20 17:11 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh, Arjun Roy

One down side of using TCP rx zerocopy is one extra TLB miss
per page after the mapping operation.

While if the application is using hugepages, the non zerocopy
recvmsg() will not have to pay these TLB costs.

This patch allows server side to use huge pages for
the non zero copy case, to allow fair comparisons when
both solutions use optimal conditions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
---
 tools/testing/selftests/net/tcp_mmap.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/net/tcp_mmap.c b/tools/testing/selftests/net/tcp_mmap.c
index ca2618f3e7a12ab6863665f465dea2e8d469131b..00f837c9bc6c4549c19dfc27aa2d08c454ea169e 100644
--- a/tools/testing/selftests/net/tcp_mmap.c
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -157,6 +157,7 @@ void *child_thread(void *arg)
 	void *addr = NULL;
 	double throughput;
 	struct rusage ru;
+	size_t buffer_sz;
 	int lu, fd;
 
 	fd = (int)(unsigned long)arg;
@@ -164,9 +165,9 @@ void *child_thread(void *arg)
 	gettimeofday(&t0, NULL);
 
 	fcntl(fd, F_SETFL, O_NDELAY);
-	buffer = malloc(chunk_size);
-	if (!buffer) {
-		perror("malloc");
+	buffer = mmap_large_buffer(chunk_size, &buffer_sz);
+	if (buffer == (void *)-1) {
+		perror("mmap");
 		goto error;
 	}
 	if (zflg) {
@@ -256,7 +257,7 @@ void *child_thread(void *arg)
 				ru.ru_nvcsw);
 	}
 error:
-	free(buffer);
+	munmap(buffer, buffer_sz);
 	close(fd);
 	if (zflg)
 		munmap(raddr, chunk_size + map_align);
-- 
2.28.0.297.g1956fa8f8d-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/3] tcp_mmap: optmizations
  2020-08-20 17:11 [PATCH net-next 0/3] tcp_mmap: optmizations Eric Dumazet
                   ` (2 preceding siblings ...)
  2020-08-20 17:11 ` [PATCH net-next 3/3] selftests: net: tcp_mmap: Use huge pages in receive path Eric Dumazet
@ 2020-08-20 17:31 ` Soheil Hassas Yeganeh
  2020-08-20 17:37   ` Arjun Roy
  2020-08-20 23:15 ` David Miller
  4 siblings, 1 reply; 7+ messages in thread
From: Soheil Hassas Yeganeh @ 2020-08-20 17:31 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, netdev, Eric Dumazet, Arjun Roy

On Thu, Aug 20, 2020 at 1:11 PM Eric Dumazet <edumazet@google.com> wrote:
>
> This series updates tcp_mmap reference tool to use best pratices.
>
> First patch is using madvise(MADV_DONTNEED) to decrease pressure
> on the socket lock.
>
> Last patches try to use huge pages when available.
>
> Eric Dumazet (3):
>   selftests: net: tcp_mmap: use madvise(MADV_DONTNEED)
>   selftests: net: tcp_mmap: Use huge pages in send path
>   selftests: net: tcp_mmap: Use huge pages in receive path

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Thank you for the patches!

>  tools/testing/selftests/net/tcp_mmap.c | 42 +++++++++++++++++++++-----
>  1 file changed, 35 insertions(+), 7 deletions(-)
>
> --
> 2.28.0.297.g1956fa8f8d-goog
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/3] tcp_mmap: optmizations
  2020-08-20 17:31 ` [PATCH net-next 0/3] tcp_mmap: optmizations Soheil Hassas Yeganeh
@ 2020-08-20 17:37   ` Arjun Roy
  0 siblings, 0 replies; 7+ messages in thread
From: Arjun Roy @ 2020-08-20 17:37 UTC (permalink / raw)
  To: Soheil Hassas Yeganeh
  Cc: Eric Dumazet, David S . Miller, netdev, Eric Dumazet

On Thu, Aug 20, 2020 at 10:32 AM Soheil Hassas Yeganeh
<soheil@google.com> wrote:
>
> On Thu, Aug 20, 2020 at 1:11 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > This series updates tcp_mmap reference tool to use best pratices.
> >
> > First patch is using madvise(MADV_DONTNEED) to decrease pressure
> > on the socket lock.
> >
> > Last patches try to use huge pages when available.
> >
> > Eric Dumazet (3):
> >   selftests: net: tcp_mmap: use madvise(MADV_DONTNEED)
> >   selftests: net: tcp_mmap: Use huge pages in send path
> >   selftests: net: tcp_mmap: Use huge pages in receive path
>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
>
> Thank you for the patches!
>

Acked-by: Arjun Roy <arjunroy@google.com>

-Arjun

> >  tools/testing/selftests/net/tcp_mmap.c | 42 +++++++++++++++++++++-----
> >  1 file changed, 35 insertions(+), 7 deletions(-)
> >
> > --
> > 2.28.0.297.g1956fa8f8d-goog
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/3] tcp_mmap: optmizations
  2020-08-20 17:11 [PATCH net-next 0/3] tcp_mmap: optmizations Eric Dumazet
                   ` (3 preceding siblings ...)
  2020-08-20 17:31 ` [PATCH net-next 0/3] tcp_mmap: optmizations Soheil Hassas Yeganeh
@ 2020-08-20 23:15 ` David Miller
  4 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2020-08-20 23:15 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, soheil, arjunroy

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 20 Aug 2020 10:11:15 -0700

> This series updates tcp_mmap reference tool to use best pratices.
> 
> First patch is using madvise(MADV_DONTNEED) to decrease pressure
> on the socket lock.
> 
> Last patches try to use huge pages when available.

Series applied, thanks Eric.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-08-20 23:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20 17:11 [PATCH net-next 0/3] tcp_mmap: optmizations Eric Dumazet
2020-08-20 17:11 ` [PATCH net-next 1/3] selftests: net: tcp_mmap: use madvise(MADV_DONTNEED) Eric Dumazet
2020-08-20 17:11 ` [PATCH net-next 2/3] selftests: net: tcp_mmap: Use huge pages in send path Eric Dumazet
2020-08-20 17:11 ` [PATCH net-next 3/3] selftests: net: tcp_mmap: Use huge pages in receive path Eric Dumazet
2020-08-20 17:31 ` [PATCH net-next 0/3] tcp_mmap: optmizations Soheil Hassas Yeganeh
2020-08-20 17:37   ` Arjun Roy
2020-08-20 23:15 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).