Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] vrf: Reset skb conntrack connection on VRF rcv
@ 2021-08-15 12:00 Lahav Schlesinger
2021-08-16 18:08 ` David Ahern
0 siblings, 1 reply; 5+ messages in thread
From: Lahav Schlesinger @ 2021-08-15 12:00 UTC (permalink / raw)
To: netdev; +Cc: dsahern, davem, kuba
To fix the "reverse-NAT" for replies.
When a packet is sent over a VRF, the POST_ROUTING hooks are called
twice: Once from the VRF interface, and once from the "actual"
interface the packet will be sent from:
1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct()
This causes the POST_ROUTING hooks to run.
2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again.
Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and
second vrf_l3_rcv() calls them again.
As an example, consider the following SNAT rule:
> iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1
In this case sending over a VRF will create 2 conntrack entries.
The first is from the VRF interface, which performs the IP SNAT.
The second will run the SNAT, but since the "expected reply" will remain
the same, conntrack randomizes the source port of the packet:
e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack
rules are:
udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1
udp 17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is
SNAT-ed from 10000 --> 61033.
But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later
conntrack entry is matched:
udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1
udp 17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
And a "port 61033 unreachable" ICMP packet is sent back.
The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(),
the skb already has a conntrack flow attached to it, which means
nf_conntrack_in() will not resolve the flow again.
This means only the dest port is "reverse-NATed" (61033 -> 10000) but
the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's
not received.
This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'.
The fix is then to reset the flow when skb is received on a VRF, to let
conntrack resolve the flow again (which now will hit the earlier flow).
To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by
running 'bash ./repro'):
$ cat run_in_A1.py
import logging
logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
from scapy.all import *
import argparse
def get_packet_to_send(udp_dst_port, msg_name):
return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \
IP(src='3.3.3.3', dst='2.2.2.2')/ \
UDP(sport=53, dport=udp_dst_port)/ \
Raw(f'{msg_name}\x0012345678901234567890')
parser = argparse.ArgumentParser()
parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True,
help="From run_in_A3.py")
parser.add_argument('-socket_port', dest="socket_port", type=str,
required=True, help="From run_in_A3.py")
parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True,
help="From script")
args, _ = parser.parse_known_args()
iface_mac = args.iface_mac
socket_port = int(args.socket_port)
v1_mac = args.v1_mac
print(f'Source port before NAT: {socket_port}')
while True:
pkts = sniff(iface='_v0', store=True, count=1, timeout=10)
if 0 == len(pkts):
print('Something failed, rerun the script :(', flush=True)
break
pkt = pkts[0]
if not pkt.haslayer('UDP'):
continue
pkt_sport = pkt.getlayer('UDP').sport
print(f'Source port after NAT: {pkt_sport}', flush=True)
pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port')
sendp(pkt_to_send, '_v0', verbose=False) # Will not be received
pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port')
sendp(pkt_to_send, '_v0', verbose=False)
break
$ cat run_in_A2.py
import socket
import netifaces
print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}",
flush=True)
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE,
str('vrf_1' + '\0').encode('utf-8'))
s.connect(('3.3.3.3', 53))
print(f'{s. getsockname()[1]}', flush=True)
s.settimeout(5)
while True:
try:
# Periodically send in order to keep the conntrack entry alive.
s.send(b'a'*40)
resp = s.recvfrom(1024)
msg_name = resp[0].decode('utf-8').split('\0')[0]
print(f"Got {msg_name}", flush=True)
except Exception as e:
pass
$ cat repro.sh
ip netns del A1 2> /dev/null
ip netns del A2 2> /dev/null
ip netns add A1
ip netns add A2
ip -n A1 link add _v0 type veth peer name _v1 netns A2
ip -n A1 link set _v0 up
ip -n A2 link add e00000 type bond
ip -n A2 link add lo0 type dummy
ip -n A2 link add vrf_1 type vrf table 10001
ip -n A2 link set vrf_1 up
ip -n A2 link set e00000 master vrf_1
ip -n A2 addr add 1.1.1.1/24 dev e00000
ip -n A2 link set e00000 up
ip -n A2 link set _v1 master e00000
ip -n A2 link set _v1 up
ip -n A2 link set lo0 up
ip -n A2 addr add 2.2.2.2/32 dev lo0
ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000
ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001
ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \
SNAT --to-source 2.2.2.2 -o vrf_1
sleep 5
ip netns exec A2 python3 run_in_A2.py > x &
XPID=$!
sleep 5
IFACE_MAC=`sed -n 1p x`
SOCKET_PORT=`sed -n 2p x`
V1_MAC=`ip -n A2 link show _v1 | sed -n 2p | awk '{print $2'}`
ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \
${SOCKET_PORT} -v1_mac ${SOCKET_PORT}
sleep 5
kill -9 $XPID
wait $XPID 2> /dev/null
ip netns del A1
ip netns del A2
tail x -n 2
rm x
set +x
Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
---
drivers/net/vrf.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 2b1b944d4b28..8bbe2a7bb141 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1367,6 +1367,8 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
bool need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
bool is_ndisc = ipv6_ndisc_frame(skb);
+ nf_reset_ct(skb);
+
/* loopback, multicast & non-ND link-local traffic; do not push through
* packet taps again. Reset pkt_type for upper layers to process skb.
* For strict packets with a source LLA, determine the dst using the
@@ -1429,6 +1431,8 @@ static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev,
skb->skb_iif = vrf_dev->ifindex;
IPCB(skb)->flags |= IPSKB_L3SLAVE;
+ nf_reset_ct(skb);
+
if (ipv4_is_multicast(ip_hdr(skb)->daddr))
goto out;
--
2.25.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] vrf: Reset skb conntrack connection on VRF rcv
2021-08-15 12:00 [PATCH] vrf: Reset skb conntrack connection on VRF rcv Lahav Schlesinger
@ 2021-08-16 18:08 ` David Ahern
2021-08-16 21:41 ` Jakub Kicinski
0 siblings, 1 reply; 5+ messages in thread
From: David Ahern @ 2021-08-16 18:08 UTC (permalink / raw)
To: Lahav Schlesinger, netdev; +Cc: dsahern, davem, kuba
On 8/15/21 6:00 AM, Lahav Schlesinger wrote:
> To fix the "reverse-NAT" for replies.
>
> When a packet is sent over a VRF, the POST_ROUTING hooks are called
> twice: Once from the VRF interface, and once from the "actual"
> interface the packet will be sent from:
> 1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct()
> This causes the POST_ROUTING hooks to run.
> 2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again.
>
> Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and
> second vrf_l3_rcv() calls them again.
>
> As an example, consider the following SNAT rule:
>> iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1
>
> In this case sending over a VRF will create 2 conntrack entries.
> The first is from the VRF interface, which performs the IP SNAT.
> The second will run the SNAT, but since the "expected reply" will remain
> the same, conntrack randomizes the source port of the packet:
> e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack
> rules are:
> udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1
> udp 17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
>
> i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is
> SNAT-ed from 10000 --> 61033.
>
> But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later
> conntrack entry is matched:
> udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1
> udp 17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
>
> And a "port 61033 unreachable" ICMP packet is sent back.
>
> The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(),
> the skb already has a conntrack flow attached to it, which means
> nf_conntrack_in() will not resolve the flow again.
>
> This means only the dest port is "reverse-NATed" (61033 -> 10000) but
> the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's
> not received.
> This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'.
>
> The fix is then to reset the flow when skb is received on a VRF, to let
> conntrack resolve the flow again (which now will hit the earlier flow).
>
> To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by
> running 'bash ./repro'):
> $ cat run_in_A1.py
> import logging
> logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
> from scapy.all import *
> import argparse
>
> def get_packet_to_send(udp_dst_port, msg_name):
> return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \
> IP(src='3.3.3.3', dst='2.2.2.2')/ \
> UDP(sport=53, dport=udp_dst_port)/ \
> Raw(f'{msg_name}\x0012345678901234567890')
>
> parser = argparse.ArgumentParser()
> parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True,
> help="From run_in_A3.py")
> parser.add_argument('-socket_port', dest="socket_port", type=str,
> required=True, help="From run_in_A3.py")
> parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True,
> help="From script")
>
> args, _ = parser.parse_known_args()
> iface_mac = args.iface_mac
> socket_port = int(args.socket_port)
> v1_mac = args.v1_mac
>
> print(f'Source port before NAT: {socket_port}')
>
> while True:
> pkts = sniff(iface='_v0', store=True, count=1, timeout=10)
> if 0 == len(pkts):
> print('Something failed, rerun the script :(', flush=True)
> break
> pkt = pkts[0]
> if not pkt.haslayer('UDP'):
> continue
>
> pkt_sport = pkt.getlayer('UDP').sport
> print(f'Source port after NAT: {pkt_sport}', flush=True)
>
> pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port')
> sendp(pkt_to_send, '_v0', verbose=False) # Will not be received
>
> pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port')
> sendp(pkt_to_send, '_v0', verbose=False)
> break
>
> $ cat run_in_A2.py
> import socket
> import netifaces
>
> print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}",
> flush=True)
> s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE,
> str('vrf_1' + '\0').encode('utf-8'))
> s.connect(('3.3.3.3', 53))
> print(f'{s. getsockname()[1]}', flush=True)
> s.settimeout(5)
>
> while True:
> try:
> # Periodically send in order to keep the conntrack entry alive.
> s.send(b'a'*40)
> resp = s.recvfrom(1024)
> msg_name = resp[0].decode('utf-8').split('\0')[0]
> print(f"Got {msg_name}", flush=True)
> except Exception as e:
> pass
>
> $ cat repro.sh
> ip netns del A1 2> /dev/null
> ip netns del A2 2> /dev/null
> ip netns add A1
> ip netns add A2
>
> ip -n A1 link add _v0 type veth peer name _v1 netns A2
> ip -n A1 link set _v0 up
>
> ip -n A2 link add e00000 type bond
> ip -n A2 link add lo0 type dummy
> ip -n A2 link add vrf_1 type vrf table 10001
> ip -n A2 link set vrf_1 up
> ip -n A2 link set e00000 master vrf_1
>
> ip -n A2 addr add 1.1.1.1/24 dev e00000
> ip -n A2 link set e00000 up
> ip -n A2 link set _v1 master e00000
> ip -n A2 link set _v1 up
> ip -n A2 link set lo0 up
> ip -n A2 addr add 2.2.2.2/32 dev lo0
>
> ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000
> ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001
>
> ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \
> SNAT --to-source 2.2.2.2 -o vrf_1
>
> sleep 5
> ip netns exec A2 python3 run_in_A2.py > x &
> XPID=$!
> sleep 5
>
> IFACE_MAC=`sed -n 1p x`
> SOCKET_PORT=`sed -n 2p x`
> V1_MAC=`ip -n A2 link show _v1 | sed -n 2p | awk '{print $2'}`
> ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \
> ${SOCKET_PORT} -v1_mac ${SOCKET_PORT}
> sleep 5
>
> kill -9 $XPID
> wait $XPID 2> /dev/null
> ip netns del A1
> ip netns del A2
> tail x -n 2
> rm x
> set +x
>
> Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com>
> ---
> drivers/net/vrf.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
Thanks for the detailed explanation and use case.
Looks correct to me.
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] vrf: Reset skb conntrack connection on VRF rcv
2021-08-16 18:08 ` David Ahern
@ 2021-08-16 21:41 ` Jakub Kicinski
2021-08-16 23:28 ` David Ahern
0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2021-08-16 21:41 UTC (permalink / raw)
To: David Ahern; +Cc: Lahav Schlesinger, netdev, dsahern, davem
On Mon, 16 Aug 2021 12:08:00 -0600 David Ahern wrote:
> On 8/15/21 6:00 AM, Lahav Schlesinger wrote:
> > To fix the "reverse-NAT" for replies.
>
> Thanks for the detailed explanation and use case.
>
> Looks correct to me.
> Reviewed-by: David Ahern <dsahern@kernel.org>
I get a sense this is a fix.
Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
?
Or maybe it fixes commit a0f37efa8225 ("net: vrf: Fix NAT within a
VRF")?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] vrf: Reset skb conntrack connection on VRF rcv
2021-08-16 21:41 ` Jakub Kicinski
@ 2021-08-16 23:28 ` David Ahern
2021-08-16 23:38 ` Jakub Kicinski
0 siblings, 1 reply; 5+ messages in thread
From: David Ahern @ 2021-08-16 23:28 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Lahav Schlesinger, netdev, dsahern, davem
On 8/16/21 3:41 PM, Jakub Kicinski wrote:
> On Mon, 16 Aug 2021 12:08:00 -0600 David Ahern wrote:
>> On 8/15/21 6:00 AM, Lahav Schlesinger wrote:
>>> To fix the "reverse-NAT" for replies.
>>
>> Thanks for the detailed explanation and use case.
>>
>> Looks correct to me.
>> Reviewed-by: David Ahern <dsahern@kernel.org>
>
> I get a sense this is a fix.
It is. I missed the lack of Fixes tag.
>
> Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
> ?
This one.
Thanks,
>
> Or maybe it fixes commit a0f37efa8225 ("net: vrf: Fix NAT within a
> VRF")?
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] vrf: Reset skb conntrack connection on VRF rcv
2021-08-16 23:28 ` David Ahern
@ 2021-08-16 23:38 ` Jakub Kicinski
0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2021-08-16 23:38 UTC (permalink / raw)
To: David Ahern, Lahav Schlesinger; +Cc: netdev, dsahern, davem
On Mon, 16 Aug 2021 17:28:49 -0600 David Ahern wrote:
> >> Looks correct to me.
> >> Reviewed-by: David Ahern <dsahern@kernel.org>
> > Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device")
> > ?
>
> This one.
>
> > Or maybe it fixes commit a0f37efa8225 ("net: vrf: Fix NAT within a
> > VRF")?
Applied, thanks!
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-08-16 23:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-15 12:00 [PATCH] vrf: Reset skb conntrack connection on VRF rcv Lahav Schlesinger
2021-08-16 18:08 ` David Ahern
2021-08-16 21:41 ` Jakub Kicinski
2021-08-16 23:28 ` David Ahern
2021-08-16 23:38 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).