Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* Possible GRE-TAP/ECT bug
@ 2021-07-21 17:36 Gilad Naaman
  2021-07-21 20:18 ` Gilad Naaman
  0 siblings, 1 reply; 3+ messages in thread
From: Gilad Naaman @ 2021-07-21 17:36 UTC (permalink / raw)
  To: netdev

Hey,

We’ve encountered a bug with handling of the ECT bits in GRE-tap encapsulated IPv4 traffic on Kernel 5.4.

This bug manifests when a packet in the following form enters the system:

+-----------+
|Inner IPv4 |    ECT=0b00
+-----------+
|Ethernet   |
+-----------+
|GRE        |
+-----------+
|Outer IPv4 |    ECT=0b01  |  ECT=0b10
+-----------+

When inspected by tcpdump-ing the outer interface, the packet is fine;
when inspected by tcpdump-ing the GRE interface, a small amount of bits is changed in the source/destination addresses of the inner Ethernet header.

Upon further inspection, we found out that the bits changed in the ethernet headers correspond to the position of the ECT/Checksum fields in the IPv4 header.
In other words, some code path tries to copy the ECT bits from outer to the inner IPv4 header and correct the checksum, but it accidentally treats the ethernet header as an IP header.

Unfortunately this results in the loss of all IPv4 traffic.

This doesn’t happen if the internal protocol is IPv6 or IS-IS, or if the external ECT is 0.

We’ve tried to search for places that modify the `tos` field of an skb, and found just enough to say “yes, some code wants to sync the ECT bits”,
but not to find the right one.

Most prominently, xfrm does just that, but as far as I can tell it is not relevant to these packets.
(It is done by calling `ipv4_copy_dscp` from `xfrm4_remove_tunnel_encap`)

We’ve gone over the ip_tunnel.c code dmesg shows that `ip_tunnel: non-ECT from 10.254.3.1 with TOS=0x1` is printed for the packet mentioned below.

Would love if someone can direct me at where to look for this ECT copying code for further analysis.

Thank you!


=================================================

As an example to a modified packet, here’s an ICMP ping.
The sent packet is

```
<IP  version=4 ihl=5 tos=0x1 id=0 flags= frag=0 ttl=64 proto=gre src=10.254.3.1 dst=10.254.1.6 |<GRE  chksum_present=0 routing_present=0 key_present=1 seqnum_present=0 strict_route_source=0 recursion_control=0 flags=0 version=0 proto=TEB key=0x49000000 |<Ether  dst=18:be:92:a0:ee:26 src=18:b0:92:a0:6c:26 type=IPv4 |<IP  version=4 ihl=5 tos=0x0 id=35717 flags=DF frag=0 ttl=1 proto=icmp src=130.130.130.1 dst=130.130.130.2 |<ICMP  type=echo-request code=0 id=0xa6eb seq=0x331e unused='' |<Raw  load='\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ' |>>>>>>
```
or in hex:
450100a700000000402f60250afe03010afe0106200065584900000018be92a0ee2618b092a06c2608004500007d8b8540000101e4f2828282018282820208006411a6eb331e1e5c7866335c78663760000000005a4e000000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a



The received packet is
```
<Ether  dst=18:bd:92:a0:ee:26 src=18:b0:92:a0:6c:27 type=IPv4 |<IP  version=4 ihl=5 tos=0x0 len=125 id=35717 flags=DF frag=0 ttl=1 proto=icmp chksum=0xe4f2 src=130.130.130.1 dst=130.130.130.2 |<ICMP  type=echo-request code=0 chksum=0x6411 id=0xa6eb seq=0x331e unused='' |<Raw  load='\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ' |>>>>
```
or
18bd92a0ee2618b092a06c2708004500007d8b8540000101e4f2828282018282820208006411a6eb331e1e5c7866335c78663760000000005a4e000000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a


Note how the destination address changes from `18:be:92:a0:ee:26` to `18:bd:92:a0:ee:26`.
Where the lower 2 bits of the second octet are overwritten with the ECT `0b01`.

The source address is then changed from `18:b0:92:a0:6c:26` to `18:b0:92:a0:6c:27`, probably as an attempt to fix the checksum.
In practice it’s always incremented by 1, even though I’m not sure it results in a correct checksum.

If the MAC address is changed on the interface to match the ECT, or the other ECT is used in the packet, nothing changes and the packet is received correctly.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Possible GRE-TAP/ECT bug
  2021-07-21 17:36 Possible GRE-TAP/ECT bug Gilad Naaman
@ 2021-07-21 20:18 ` Gilad Naaman
  2021-07-22 15:05   ` Gilad Naaman
  0 siblings, 1 reply; 3+ messages in thread
From: Gilad Naaman @ 2021-07-21 20:18 UTC (permalink / raw)
  To: netdev

This can be reproduced by using this:

    $ cat setup_bug.sh
    #!/bin/bash
    ip netns add A
    ip netns add B
    ip -n A link add _v0 type veth peer name _v1 netns B
    
    ip -n A link set _v0 up
    ip -n A addr add dev _v0 10.254.3.1/24
    ip -n A route add default dev _v0 scope global
    
    ip -n B link set _v1 up
    ip -n B addr add dev _v1 10.254.1.6/24
    ip -n B route add default dev _v1 scope global
    
    ip -n B link add florp type gretap local 10.254.1.6 remote 10.254.3.1 key 0x49000000
    ip -n B link set florp up
    
    $ cat send_pkt.py
    #!/usr/bin/env python3
    from scapy.all import *
    
    pkt = IP(b'E\x01\x00\xa7\x00\x00\x00\x00@/`%\n\xfe\x03\x01\n\xfe\x01\x06 \x00eXI\x00\x00\x00\x18\xbe\x92\xa0\xee&\x18\xb0\x92\xa0l&\x08\x00E\x00\x00}\x8b\x85@\x00\x01\x01\xe4\xf2\x82\x82\x82\x01\x82\x82\x82\x02\x08\x00d\x11\xa6\xeb3\x1e\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ')
    
    send(pkt)


And then

    $ sudo ip netns exec B tcpdump -neqllvi florp icmp &

    $ sudo ip netns exec A python3 send_pkt.py

> On 21 Jul 2021, at 20:36, Gilad Naaman <gnaaman@drivenets.com> wrote:
> 
> Hey,
> 
> We’ve encountered a bug with handling of the ECT bits in GRE-tap encapsulated IPv4 traffic on Kernel 5.4.
> 
> This bug manifests when a packet in the following form enters the system:
> 
> +-----------+
> |Inner IPv4 |    ECT=0b00
> +-----------+
> |Ethernet   |
> +-----------+
> |GRE        |
> +-----------+
> |Outer IPv4 |    ECT=0b01  |  ECT=0b10
> +-----------+
> 
> When inspected by tcpdump-ing the outer interface, the packet is fine;
> when inspected by tcpdump-ing the GRE interface, a small amount of bits is changed in the source/destination addresses of the inner Ethernet header.
> 
> Upon further inspection, we found out that the bits changed in the ethernet headers correspond to the position of the ECT/Checksum fields in the IPv4 header.
> In other words, some code path tries to copy the ECT bits from outer to the inner IPv4 header and correct the checksum, but it accidentally treats the ethernet header as an IP header.
> 
> Unfortunately this results in the loss of all IPv4 traffic.
> 
> This doesn’t happen if the internal protocol is IPv6 or IS-IS, or if the external ECT is 0.
> 
> We’ve tried to search for places that modify the `tos` field of an skb, and found just enough to say “yes, some code wants to sync the ECT bits”,
> but not to find the right one.
> 
> Most prominently, xfrm does just that, but as far as I can tell it is not relevant to these packets.
> (It is done by calling `ipv4_copy_dscp` from `xfrm4_remove_tunnel_encap`)
> 
> We’ve gone over the ip_tunnel.c code dmesg shows that `ip_tunnel: non-ECT from 10.254.3.1 with TOS=0x1` is printed for the packet mentioned below.
> 
> Would love if someone can direct me at where to look for this ECT copying code for further analysis.
> 
> Thank you!
> 
> 
> =================================================
> 
> As an example to a modified packet, here’s an ICMP ping.
> The sent packet is
> 
> ```
> <IP  version=4 ihl=5 tos=0x1 id=0 flags= frag=0 ttl=64 proto=gre src=10.254.3.1 dst=10.254.1.6 |<GRE  chksum_present=0 routing_present=0 key_present=1 seqnum_present=0 strict_route_source=0 recursion_control=0 flags=0 version=0 proto=TEB key=0x49000000 |<Ether  dst=18:be:92:a0:ee:26 src=18:b0:92:a0:6c:26 type=IPv4 |<IP  version=4 ihl=5 tos=0x0 id=35717 flags=DF frag=0 ttl=1 proto=icmp src=130.130.130.1 dst=130.130.130.2 |<ICMP  type=echo-request code=0 id=0xa6eb seq=0x331e unused='' |<Raw  load='\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ' |>>>>>>
> ```
> or in hex:
> 450100a700000000402f60250afe03010afe0106200065584900000018be92a0ee2618b092a06c2608004500007d8b8540000101e4f2828282018282820208006411a6eb331e1e5c7866335c78663760000000005a4e000000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a
> 
> 
> 
> The received packet is
> ```
> <Ether  dst=18:bd:92:a0:ee:26 src=18:b0:92:a0:6c:27 type=IPv4 |<IP  version=4 ihl=5 tos=0x0 len=125 id=35717 flags=DF frag=0 ttl=1 proto=icmp chksum=0xe4f2 src=130.130.130.1 dst=130.130.130.2 |<ICMP  type=echo-request code=0 chksum=0x6411 id=0xa6eb seq=0x331e unused='' |<Raw  load='\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ' |>>>>
> ```
> or
> 18bd92a0ee2618b092a06c2708004500007d8b8540000101e4f2828282018282820208006411a6eb331e1e5c7866335c78663760000000005a4e000000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a
> 
> 
> Note how the destination address changes from `18:be:92:a0:ee:26` to `18:bd:92:a0:ee:26`.
> Where the lower 2 bits of the second octet are overwritten with the ECT `0b01`.
> 
> The source address is then changed from `18:b0:92:a0:6c:26` to `18:b0:92:a0:6c:27`, probably as an attempt to fix the checksum.
> In practice it’s always incremented by 1, even though I’m not sure it results in a correct checksum.
> 
> If the MAC address is changed on the interface to match the ECT, or the other ECT is used in the packet, nothing changes and the packet is received correctly.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Possible GRE-TAP/ECT bug
  2021-07-21 20:18 ` Gilad Naaman
@ 2021-07-22 15:05   ` Gilad Naaman
  0 siblings, 0 replies; 3+ messages in thread
From: Gilad Naaman @ 2021-07-22 15:05 UTC (permalink / raw)
  To: netdev

I apologise, I have mistakingly supplied the wrong version.
The lowest kernel version we found this in is v5.3, but this is not a vanilla kernel, but an Ubuntu kernel with probable backports.
Specifically

$ uname -a
Linux ubuntu1910.localdomain 5.3.0-64-generic #58-Ubuntu SMP Fri Jul 10 19:33:51 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


The lowest version from https://kernel.ubuntu.com/~kernel-ppa/mainline/ we found the bug in is v5.7.
i.e. v5.6 does not exhibit the bug, but v5.7 does.
(v5.6.X where X>0 were not checked)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-07-22 15:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-21 17:36 Possible GRE-TAP/ECT bug Gilad Naaman
2021-07-21 20:18 ` Gilad Naaman
2021-07-22 15:05   ` Gilad Naaman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).