Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Jakub Kicinski <kuba@kernel.org>, Dave Jones <dsj@fb.com>,
"David S . Miller" <davem@davemloft.net>,
Sasha Levin <sashal@kernel.org>,
netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 4.19 48/55] net: ip: avoid OOM kills with large UDP sends over loopback
Date: Tue, 6 Jul 2021 07:26:31 -0400 [thread overview]
Message-ID: <20210706112638.2065023-48-sashal@kernel.org> (raw)
In-Reply-To: <20210706112638.2065023-1-sashal@kernel.org>
From: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit 6d123b81ac615072a8525c13c6c41b695270a15d ]
Dave observed number of machines hitting OOM on the UDP send
path. The workload seems to be sending large UDP packets over
loopback. Since loopback has MTU of 64k kernel will try to
allocate an skb with up to 64k of head space. This has a good
chance of failing under memory pressure. What's worse if
the message length is <32k the allocation may trigger an
OOM killer.
This is entirely avoidable, we can use an skb with page frags.
af_unix solves a similar problem by limiting the head
length to SKB_MAX_ALLOC. This seems like a good and simple
approach. It means that UDP messages > 16kB will now
use fragments if underlying device supports SG, if extra
allocator pressure causes regressions in real workloads
we can switch to trying the large allocation first and
falling back.
v4: pre-calculate all the additions to alloclen so
we can be sure it won't go over order-2
Reported-by: Dave Jones <dsj@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/ip_output.c | 32 ++++++++++++++++++--------------
net/ipv6/ip6_output.c | 32 +++++++++++++++++---------------
2 files changed, 35 insertions(+), 29 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e411c42d8428..e63905f7f6f9 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -940,7 +940,7 @@ static int __ip_append_data(struct sock *sk,
unsigned int datalen;
unsigned int fraglen;
unsigned int fraggap;
- unsigned int alloclen;
+ unsigned int alloclen, alloc_extra;
unsigned int pagedlen;
struct sk_buff *skb_prev;
alloc_new_skb:
@@ -960,35 +960,39 @@ static int __ip_append_data(struct sock *sk,
fraglen = datalen + fragheaderlen;
pagedlen = 0;
+ alloc_extra = hh_len + 15;
+ alloc_extra += exthdrlen;
+
+ /* The last fragment gets additional space at tail.
+ * Note, with MSG_MORE we overallocate on fragments,
+ * because we have no idea what fragment will be
+ * the last.
+ */
+ if (datalen == length + fraggap)
+ alloc_extra += rt->dst.trailer_len;
+
if ((flags & MSG_MORE) &&
!(rt->dst.dev->features&NETIF_F_SG))
alloclen = mtu;
- else if (!paged)
+ else if (!paged &&
+ (fraglen + alloc_extra < SKB_MAX_ALLOC ||
+ !(rt->dst.dev->features & NETIF_F_SG)))
alloclen = fraglen;
else {
alloclen = min_t(int, fraglen, MAX_HEADER);
pagedlen = fraglen - alloclen;
}
- alloclen += exthdrlen;
-
- /* The last fragment gets additional space at tail.
- * Note, with MSG_MORE we overallocate on fragments,
- * because we have no idea what fragment will be
- * the last.
- */
- if (datalen == length + fraggap)
- alloclen += rt->dst.trailer_len;
+ alloclen += alloc_extra;
if (transhdrlen) {
- skb = sock_alloc_send_skb(sk,
- alloclen + hh_len + 15,
+ skb = sock_alloc_send_skb(sk, alloclen,
(flags & MSG_DONTWAIT), &err);
} else {
skb = NULL;
if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
2 * sk->sk_sndbuf)
- skb = alloc_skb(alloclen + hh_len + 15,
+ skb = alloc_skb(alloclen,
sk->sk_allocation);
if (unlikely(!skb))
err = -ENOBUFS;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index e1bb7db88483..aa8f19f852cc 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1394,7 +1394,7 @@ static int __ip6_append_data(struct sock *sk,
unsigned int datalen;
unsigned int fraglen;
unsigned int fraggap;
- unsigned int alloclen;
+ unsigned int alloclen, alloc_extra;
unsigned int pagedlen;
alloc_new_skb:
/* There's no room in the current skb */
@@ -1421,17 +1421,28 @@ static int __ip6_append_data(struct sock *sk,
fraglen = datalen + fragheaderlen;
pagedlen = 0;
+ alloc_extra = hh_len;
+ alloc_extra += dst_exthdrlen;
+ alloc_extra += rt->dst.trailer_len;
+
+ /* We just reserve space for fragment header.
+ * Note: this may be overallocation if the message
+ * (without MSG_MORE) fits into the MTU.
+ */
+ alloc_extra += sizeof(struct frag_hdr);
+
if ((flags & MSG_MORE) &&
!(rt->dst.dev->features&NETIF_F_SG))
alloclen = mtu;
- else if (!paged)
+ else if (!paged &&
+ (fraglen + alloc_extra < SKB_MAX_ALLOC ||
+ !(rt->dst.dev->features & NETIF_F_SG)))
alloclen = fraglen;
else {
alloclen = min_t(int, fraglen, MAX_HEADER);
pagedlen = fraglen - alloclen;
}
-
- alloclen += dst_exthdrlen;
+ alloclen += alloc_extra;
if (datalen != length + fraggap) {
/*
@@ -1441,30 +1452,21 @@ static int __ip6_append_data(struct sock *sk,
datalen += rt->dst.trailer_len;
}
- alloclen += rt->dst.trailer_len;
fraglen = datalen + fragheaderlen;
- /*
- * We just reserve space for fragment header.
- * Note: this may be overallocation if the message
- * (without MSG_MORE) fits into the MTU.
- */
- alloclen += sizeof(struct frag_hdr);
-
copy = datalen - transhdrlen - fraggap - pagedlen;
if (copy < 0) {
err = -EINVAL;
goto error;
}
if (transhdrlen) {
- skb = sock_alloc_send_skb(sk,
- alloclen + hh_len,
+ skb = sock_alloc_send_skb(sk, alloclen,
(flags & MSG_DONTWAIT), &err);
} else {
skb = NULL;
if (refcount_read(&sk->sk_wmem_alloc) + wmem_alloc_delta <=
2 * sk->sk_sndbuf)
- skb = alloc_skb(alloclen + hh_len,
+ skb = alloc_skb(alloclen,
sk->sk_allocation);
if (unlikely(!skb))
err = -ENOBUFS;
--
2.30.2
next prev parent reply other threads:[~2021-07-06 11:40 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20210706112638.2065023-1-sashal@kernel.org>
2021-07-06 11:25 ` [PATCH AUTOSEL 4.19 05/55] net: pch_gbe: Use proper accessors to BE data in pch_ptp_match() Sasha Levin
2021-07-06 11:25 ` [PATCH AUTOSEL 4.19 08/55] atm: iphase: fix possible use-after-free in ia_module_exit() Sasha Levin
2021-07-06 11:25 ` [PATCH AUTOSEL 4.19 09/55] mISDN: fix possible use-after-free in HFC_cleanup() Sasha Levin
2021-07-06 11:25 ` [PATCH AUTOSEL 4.19 10/55] atm: nicstar: Fix possible use-after-free in nicstar_cleanup() Sasha Levin
2021-07-06 11:25 ` [PATCH AUTOSEL 4.19 11/55] net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT Sasha Levin
2021-07-06 11:25 ` [PATCH AUTOSEL 4.19 16/55] e100: handle eeprom as little endian Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 19/55] ipv6: use prandom_u32() for ID generation Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 23/55] ice: set the value of global config lock timeout longer Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 24/55] virtio_net: Remove BUG() to avoid machine dead Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 25/55] net: bcmgenet: check return value after calling platform_get_resource() Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 26/55] net: mvpp2: " Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 27/55] net: micrel: " Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 28/55] net: moxa: Use devm_platform_get_and_ioremap_resource() Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 29/55] fjes: check return value after calling platform_get_resource() Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 31/55] xfrm: Fix error reporting in xfrm_state_construct Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 32/55] wlcore/wl12xx: Fix wl12xx get_mac error if device is in ELP Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 33/55] wl1251: Fix possible buffer overflow in wl1251_cmd_scan Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 34/55] cw1200: add missing MODULE_DEVICE_TABLE Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 36/55] rtl8xxxu: Fix device info for RTL8192EU devices Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 38/55] atm: nicstar: use 'dma_free_coherent' instead of 'kfree' Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 39/55] atm: nicstar: register the interrupt handler in the right place Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 40/55] vsock: notify server to shutdown when client has pending signal Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 42/55] iwlwifi: mvm: don't change band on bound PHY contexts Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 43/55] iwlwifi: pcie: free IML DMA memory allocation Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 44/55] sfc: avoid double pci_remove of VFs Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 45/55] sfc: error code if SRIOV cannot be disabled Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 46/55] wireless: wext-spy: Fix out-of-bounds warning Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 47/55] media, bpf: Do not copy more entries than user space requested Sasha Levin
2021-07-06 11:26 ` Sasha Levin [this message]
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 50/55] Bluetooth: Fix the HCI to MGMT status conversion table Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 51/55] Bluetooth: Shutdown controller after workqueues are flushed or cancelled Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 53/55] sctp: validate from_addr_param return Sasha Levin
2021-07-06 11:26 ` [PATCH AUTOSEL 4.19 54/55] sctp: add size validation when walking chunks Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210706112638.2065023-48-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=davem@davemloft.net \
--cc=dsj@fb.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=stable@vger.kernel.org \
--subject='Re: [PATCH AUTOSEL 4.19 48/55] net: ip: avoid OOM kills with large UDP sends over loopback' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).