Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
@ 2021-08-09 7:04 Hangbin Liu
2021-08-09 8:35 ` Roi Dayan
2021-08-09 10:00 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 10+ messages in thread
From: Hangbin Liu @ 2021-08-09 7:04 UTC (permalink / raw)
To: netdev
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
Jakub Kicinski, Marcelo Ricardo Leitner, Alaa Hleihel,
Davide Caratti, Aaron Conole, Hangbin Liu, Roi Dayan
When mirror/redirect a skb to a different port, the ct info should be reset
for reclassification. Or the pkts will match unexpected rules. For example,
with following topology and commands:
-----------
|
veth0 -+-------
|
veth1 -+-------
|
------------
tc qdisc add dev veth0 clsact
# The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1"
tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1
tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1
tc qdisc add dev veth1 clsact
tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop
ping <remove ip via veth0> &
tc -s filter show dev veth1 ingress
With command 'tc -s filter show', we can find the pkts were dropped on
veth1.
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
net/sched/act_mirred.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 7153c67f641e..2ef4cd2c848b 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -273,6 +273,9 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
goto out;
}
+ /* All mirred/redirected skbs should clear previous ct info */
+ nf_reset_ct(skb2);
+
want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
expects_nh = want_ingress || !m_mac_header_xmit;
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
2021-08-09 7:04 [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb Hangbin Liu
@ 2021-08-09 8:35 ` Roi Dayan
2021-08-09 10:00 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 10+ messages in thread
From: Roi Dayan @ 2021-08-09 8:35 UTC (permalink / raw)
To: Hangbin Liu, netdev
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
Jakub Kicinski, Marcelo Ricardo Leitner, Alaa Hleihel,
Davide Caratti, Aaron Conole
On 2021-08-09 10:04 AM, Hangbin Liu wrote:
> When mirror/redirect a skb to a different port, the ct info should be reset
> for reclassification. Or the pkts will match unexpected rules. For example,
> with following topology and commands:
>
> -----------
> |
> veth0 -+-------
> |
> veth1 -+-------
> |
> ------------
>
> tc qdisc add dev veth0 clsact
> # The same with "action mirred egress mirror dev veth1" or "action mirred ingress redirect dev veth1"
> tc filter add dev veth0 egress chain 1 protocol ip flower ct_state +trk action mirred ingress mirror dev veth1
> tc filter add dev veth0 egress chain 0 protocol ip flower ct_state -inv action ct commit action goto chain 1
> tc qdisc add dev veth1 clsact
> tc filter add dev veth1 ingress chain 0 protocol ip flower ct_state +trk action drop
>
> ping <remove ip via veth0> &
> tc -s filter show dev veth1 ingress
>
> With command 'tc -s filter show', we can find the pkts were dropped on
> veth1.
>
We can add a fixes line on the commit added ct support
Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
> Signed-off-by: Roi Dayan <roid@nvidia.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
> net/sched/act_mirred.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 7153c67f641e..2ef4cd2c848b 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -273,6 +273,9 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
> goto out;
> }
>
> + /* All mirred/redirected skbs should clear previous ct info */
> + nf_reset_ct(skb2);
> +
> want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
>
> expects_nh = want_ingress || !m_mac_header_xmit;
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
2021-08-09 7:04 [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb Hangbin Liu
2021-08-09 8:35 ` Roi Dayan
@ 2021-08-09 10:00 ` patchwork-bot+netdevbpf
2022-04-19 16:50 ` Eyal Birger
1 sibling, 1 reply; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-08-09 10:00 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, jhs, xiyou.wangcong, jiri, davem, kuba, mleitner,
ahleihel, dcaratti, aconole, roid
Hello:
This patch was applied to netdev/net.git (refs/heads/master):
On Mon, 9 Aug 2021 15:04:55 +0800 you wrote:
> When mirror/redirect a skb to a different port, the ct info should be reset
> for reclassification. Or the pkts will match unexpected rules. For example,
> with following topology and commands:
>
> -----------
> |
> veth0 -+-------
> |
> veth1 -+-------
> |
>
> [...]
Here is the summary with links:
- [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
https://git.kernel.org/netdev/net/c/d09c548dbf3b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
2021-08-09 10:00 ` patchwork-bot+netdevbpf
@ 2022-04-19 16:50 ` Eyal Birger
2022-04-19 17:26 ` Marcelo Ricardo Leitner
0 siblings, 1 reply; 10+ messages in thread
From: Eyal Birger @ 2022-04-19 16:50 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, jhs, xiyou.wangcong, jiri, davem, kuba, mleitner,
ahleihel, dcaratti, aconole, roid, Shmulik Ladkani
Hi,
On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
>
> Hello:
>
> This patch was applied to netdev/net.git (refs/heads/master):
>
> On Mon, 9 Aug 2021 15:04:55 +0800 you wrote:
> > When mirror/redirect a skb to a different port, the ct info should be reset
> > for reclassification. Or the pkts will match unexpected rules. For example,
> > with following topology and commands:
> >
> > -----------
> > |
> > veth0 -+-------
> > |
> > veth1 -+-------
> > |
> >
> > [...]
>
> Here is the summary with links:
> - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> https://git.kernel.org/netdev/net/c/d09c548dbf3b
Unfortunately this commit breaks DNAT when performed before going via mirred
egress->ingress.
The reason is that connection tracking is lost and therefore a new state
is created on ingress.
This breaks existing setups.
See below a simplified script reproducing this issue.
Therefore I suggest this commit be reverted and a knob is introduced to mirred
for clearing ct as needed.
Eyal.
Reproduction script:
#!/bin/bash
ip netns add a
ip netns add b
ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1
ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1
ip link add veth0 netns a type veth peer name veth0 netns b
ip -net a link set veth0 up
ip -net a addr add dev veth0 198.51.100.1/30
ip -net a link add dum0 type dummy
ip -net a link set dev dum0 up
ip -net a addr add dev dum0 198.51.100.2/32
ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT
--to-destination 10.0.0.2
ip -net a route add default dev dum0
ip -net a rule add pref 50 iif dum0 lookup 1000
ip -net a route add table 1000 default dev veth0
ip netns exec a tc qdisc add dev dum0 clsact
ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic
action mirred ingress redirect dev dum0
ip -net b link set veth0 up
ip -net b addr add 10.0.0.2/32 dev veth0
ip -net b addr add 198.51.100.3/30 dev veth0
ip netns exec a ping 10.0.0.1
>
> You are awesome, thank you!
> --
> Deet-doot-dot, I am a bot.
> https://korg.docs.kernel.org/patchwork/pwbot.html
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
2022-04-19 16:50 ` Eyal Birger
@ 2022-04-19 17:26 ` Marcelo Ricardo Leitner
2022-04-19 18:14 ` Eyal Birger
0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-04-19 17:26 UTC (permalink / raw)
To: Eyal Birger
Cc: Hangbin Liu, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
ahleihel, dcaratti, aconole, roid, Shmulik Ladkani
Hi,
On Tue, Apr 19, 2022 at 07:50:38PM +0300, Eyal Birger wrote:
> Hi,
>
> On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> >
> > Hello:
> >
> > This patch was applied to netdev/net.git (refs/heads/master):
> >
> > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote:
> > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > with following topology and commands:
> > >
> > > -----------
> > > |
> > > veth0 -+-------
> > > |
> > > veth1 -+-------
> > > |
> > >
> > > [...]
> >
> > Here is the summary with links:
> > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> > https://git.kernel.org/netdev/net/c/d09c548dbf3b
>
> Unfortunately this commit breaks DNAT when performed before going via mirred
> egress->ingress.
>
> The reason is that connection tracking is lost and therefore a new state
> is created on ingress.
>
> This breaks existing setups.
>
> See below a simplified script reproducing this issue.
I guess I can understand why the reproducer triggers it, but I fail to
see the actual use case you have behind it. Can you please elaborate
on it?
>
> Therefore I suggest this commit be reverted and a knob is introduced to mirred
> for clearing ct as needed.
>
> Eyal.
>
> Reproduction script:
>
> #!/bin/bash
>
> ip netns add a
> ip netns add b
>
> ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1
> ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1
>
> ip link add veth0 netns a type veth peer name veth0 netns b
> ip -net a link set veth0 up
> ip -net a addr add dev veth0 198.51.100.1/30
>
> ip -net a link add dum0 type dummy
> ip -net a link set dev dum0 up
> ip -net a addr add dev dum0 198.51.100.2/32
>
> ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT
> --to-destination 10.0.0.2
> ip -net a route add default dev dum0
> ip -net a rule add pref 50 iif dum0 lookup 1000
> ip -net a route add table 1000 default dev veth0
>
> ip netns exec a tc qdisc add dev dum0 clsact
> ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic
> action mirred ingress redirect dev dum0
>
> ip -net b link set veth0 up
> ip -net b addr add 10.0.0.2/32 dev veth0
> ip -net b addr add 198.51.100.3/30 dev veth0
>
> ip netns exec a ping 10.0.0.1
> >
> > You are awesome, thank you!
> > --
> > Deet-doot-dot, I am a bot.
> > https://korg.docs.kernel.org/patchwork/pwbot.html
> >
> >
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
2022-04-19 17:26 ` Marcelo Ricardo Leitner
@ 2022-04-19 18:14 ` Eyal Birger
2022-04-20 10:07 ` Hangbin Liu
2022-04-21 11:00 ` Hangbin Liu
0 siblings, 2 replies; 10+ messages in thread
From: Eyal Birger @ 2022-04-19 18:14 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: Hangbin Liu, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
ahleihel, dcaratti, aconole, roid, Shmulik Ladkani
Hi,
On Tue, Apr 19, 2022 at 8:26 PM Marcelo Ricardo Leitner
<mleitner@redhat.com> wrote:
>
> Hi,
>
> On Tue, Apr 19, 2022 at 07:50:38PM +0300, Eyal Birger wrote:
> > Hi,
> >
> > On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > >
> > > Hello:
> > >
> > > This patch was applied to netdev/net.git (refs/heads/master):
> > >
> > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote:
> > > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > > with following topology and commands:
> > > >
> > > > -----------
> > > > |
> > > > veth0 -+-------
> > > > |
> > > > veth1 -+-------
> > > > |
> > > >
> > > > [...]
> > >
> > > Here is the summary with links:
> > > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> > > https://git.kernel.org/netdev/net/c/d09c548dbf3b
> >
> > Unfortunately this commit breaks DNAT when performed before going via mirred
> > egress->ingress.
> >
> > The reason is that connection tracking is lost and therefore a new state
> > is created on ingress.
> >
> > This breaks existing setups.
> >
> > See below a simplified script reproducing this issue.
>
> I guess I can understand why the reproducer triggers it, but I fail to
> see the actual use case you have behind it. Can you please elaborate
> on it?
One use case we use mirred egress->ingress redirect for is when we want to
reroute a packet after applying some change to the packet which would affect
its routing. for example consider a bpf program running on tc ingress (after
mirred) setting the skb->mark based on some criteria.
So you have something like:
packet routed to dummy device based on some criteria ->
mirred redirect to ingress ->
classification by ebpf logic at tc ingress ->
packet routed again
We have a setup where DNAT is performed before this flow in that case the
ebpf logic needs to see the packet after the NAT.
Eyal.
>
> >
> > Therefore I suggest this commit be reverted and a knob is introduced to mirred
> > for clearing ct as needed.
> >
> > Eyal.
> >
> > Reproduction script:
> >
> > #!/bin/bash
> >
> > ip netns add a
> > ip netns add b
> >
> > ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1
> > ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1
> >
> > ip link add veth0 netns a type veth peer name veth0 netns b
> > ip -net a link set veth0 up
> > ip -net a addr add dev veth0 198.51.100.1/30
> >
> > ip -net a link add dum0 type dummy
> > ip -net a link set dev dum0 up
> > ip -net a addr add dev dum0 198.51.100.2/32
> >
> > ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT
> > --to-destination 10.0.0.2
> > ip -net a route add default dev dum0
> > ip -net a rule add pref 50 iif dum0 lookup 1000
> > ip -net a route add table 1000 default dev veth0
> >
> > ip netns exec a tc qdisc add dev dum0 clsact
> > ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic
> > action mirred ingress redirect dev dum0
> >
> > ip -net b link set veth0 up
> > ip -net b addr add 10.0.0.2/32 dev veth0
> > ip -net b addr add 198.51.100.3/30 dev veth0
> >
> > ip netns exec a ping 10.0.0.1
> > >
> > > You are awesome, thank you!
> > > --
> > > Deet-doot-dot, I am a bot.
> > > https://korg.docs.kernel.org/patchwork/pwbot.html
> > >
> > >
> >
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
2022-04-19 18:14 ` Eyal Birger
@ 2022-04-20 10:07 ` Hangbin Liu
2022-04-21 11:00 ` Hangbin Liu
1 sibling, 0 replies; 10+ messages in thread
From: Hangbin Liu @ 2022-04-20 10:07 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: Eyal Birger, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
ahleihel, dcaratti, aconole, roid, Shmulik Ladkani
On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote:
> >
> > I guess I can understand why the reproducer triggers it, but I fail to
> > see the actual use case you have behind it. Can you please elaborate
> > on it?
>
> One use case we use mirred egress->ingress redirect for is when we want to
> reroute a packet after applying some change to the packet which would affect
> its routing. for example consider a bpf program running on tc ingress (after
> mirred) setting the skb->mark based on some criteria.
>
> So you have something like:
>
> packet routed to dummy device based on some criteria ->
> mirred redirect to ingress ->
> classification by ebpf logic at tc ingress ->
> packet routed again
>
> We have a setup where DNAT is performed before this flow in that case the
> ebpf logic needs to see the packet after the NAT.
Hi Marcelo,
Thanks for taking care of this. Would you help following up this issue as
you are more familiar with net sched?
Thanks
Hangbin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
2022-04-19 18:14 ` Eyal Birger
2022-04-20 10:07 ` Hangbin Liu
@ 2022-04-21 11:00 ` Hangbin Liu
2022-04-22 13:41 ` Marcelo Ricardo Leitner
1 sibling, 1 reply; 10+ messages in thread
From: Hangbin Liu @ 2022-04-21 11:00 UTC (permalink / raw)
To: Eyal Birger
Cc: Marcelo Ricardo Leitner, netdev, jhs, xiyou.wangcong, jiri,
davem, kuba, ahleihel, dcaratti, aconole, roid, Shmulik Ladkani
Hi Eyal,
On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote:
> > > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote:
> > > > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > > > with following topology and commands:
> > > > >
> > > > > -----------
> > > > > |
> > > > > veth0 -+-------
> > > > > |
> > > > > veth1 -+-------
> > > > > |
> > > > >
> > > > > [...]
> > > >
> > > > Here is the summary with links:
> > > > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> > > > https://git.kernel.org/netdev/net/c/d09c548dbf3b
> > >
> > > Unfortunately this commit breaks DNAT when performed before going via mirred
> > > egress->ingress.
> > >
> > > The reason is that connection tracking is lost and therefore a new state
> > > is created on ingress.
> > >
> > > This breaks existing setups.
> > >
> > > See below a simplified script reproducing this issue.
I think we come in to a paradox state. Some user don't want to have previous
ct info after mirror, while others would like to keep. In my understanding,
when we receive a pkt from a interface, the skb should be clean and no ct info
at first. But I may wrong.
Jamal, Wang Cong, Jiri, do you have any comments?
> >
> > I guess I can understand why the reproducer triggers it, but I fail to
> > see the actual use case you have behind it. Can you please elaborate
> > on it?
>
> One use case we use mirred egress->ingress redirect for is when we want to
> reroute a packet after applying some change to the packet which would affect
> its routing. for example consider a bpf program running on tc ingress (after
> mirred) setting the skb->mark based on some criteria.
>
> So you have something like:
>
> packet routed to dummy device based on some criteria ->
> mirred redirect to ingress ->
> classification by ebpf logic at tc ingress ->
> packet routed again
>
> We have a setup where DNAT is performed before this flow in that case the
> ebpf logic needs to see the packet after the NAT.
Is it possible to check whether it's need to set the skb->mark before DNAT?
So we can update it before egress and no need to re-route.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
2022-04-21 11:00 ` Hangbin Liu
@ 2022-04-22 13:41 ` Marcelo Ricardo Leitner
2022-04-23 16:08 ` Eyal Birger
0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-04-22 13:41 UTC (permalink / raw)
To: Hangbin Liu
Cc: Eyal Birger, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
ahleihel, dcaratti, aconole, roid, Shmulik Ladkani
On Thu, Apr 21, 2022 at 07:00:07PM +0800, Hangbin Liu wrote:
> Hi Eyal,
> On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote:
> > > > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote:
> > > > > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > > > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > > > > with following topology and commands:
> > > > > >
> > > > > > -----------
> > > > > > |
> > > > > > veth0 -+-------
> > > > > > |
> > > > > > veth1 -+-------
> > > > > > |
> > > > > >
> > > > > > [...]
> > > > >
> > > > > Here is the summary with links:
> > > > > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> > > > > https://git.kernel.org/netdev/net/c/d09c548dbf3b
> > > >
> > > > Unfortunately this commit breaks DNAT when performed before going via mirred
> > > > egress->ingress.
> > > >
> > > > The reason is that connection tracking is lost and therefore a new state
> > > > is created on ingress.
> > > >
> > > > This breaks existing setups.
> > > >
> > > > See below a simplified script reproducing this issue.
>
> I think we come in to a paradox state. Some user don't want to have previous
> ct info after mirror, while others would like to keep. In my understanding,
> when we receive a pkt from a interface, the skb should be clean and no ct info
> at first. But I may wrong.
Makes sense to me. Moreover, there were a couple of fixes on this on
mirred around that time frame/area (like f799ada6bf23 ("net: sched:
act_mirred: drop dst for the direction from egress to ingress")). That's
because we are seeing that mirred xmit action when switching to
ingress direction should be as close skb_scrub_packet. OVS needs this
scrubbing as well, btw. This ct information could be easily stale if
there were other packet changes after it.
Point being, if we really need the knob for backwards compatibility
here, it may have to be a broader one.
>
> Jamal, Wang Cong, Jiri, do you have any comments?
>
> > >
> > > I guess I can understand why the reproducer triggers it, but I fail to
> > > see the actual use case you have behind it. Can you please elaborate
> > > on it?
> >
> > One use case we use mirred egress->ingress redirect for is when we want to
> > reroute a packet after applying some change to the packet which would affect
> > its routing. for example consider a bpf program running on tc ingress (after
> > mirred) setting the skb->mark based on some criteria.
> >
> > So you have something like:
> >
> > packet routed to dummy device based on some criteria ->
> > mirred redirect to ingress ->
> > classification by ebpf logic at tc ingress ->
> > packet routed again
> >
> > We have a setup where DNAT is performed before this flow in that case the
> > ebpf logic needs to see the packet after the NAT.
>
> Is it possible to check whether it's need to set the skb->mark before DNAT?
> So we can update it before egress and no need to re-route.
>
> Thanks
> Hangbin
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
2022-04-22 13:41 ` Marcelo Ricardo Leitner
@ 2022-04-23 16:08 ` Eyal Birger
0 siblings, 0 replies; 10+ messages in thread
From: Eyal Birger @ 2022-04-23 16:08 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: Hangbin Liu, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
ahleihel, dcaratti, aconole, roid, Shmulik Ladkani
On Fri, Apr 22, 2022 at 4:41 PM Marcelo Ricardo Leitner
<mleitner@redhat.com> wrote:
>
> On Thu, Apr 21, 2022 at 07:00:07PM +0800, Hangbin Liu wrote:
> > Hi Eyal,
> > On Tue, Apr 19, 2022 at 09:14:38PM +0300, Eyal Birger wrote:
> > > > > > On Mon, 9 Aug 2021 15:04:55 +0800 you wrote:
> > > > > > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > > > > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > > > > > with following topology and commands:
> > > > > > >
> > > > > > > -----------
> > > > > > > |
> > > > > > > veth0 -+-------
> > > > > > > |
> > > > > > > veth1 -+-------
> > > > > > > |
> > > > > > >
> > > > > > > [...]
> > > > > >
> > > > > > Here is the summary with links:
> > > > > > - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> > > > > > https://git.kernel.org/netdev/net/c/d09c548dbf3b
> > > > >
> > > > > Unfortunately this commit breaks DNAT when performed before going via mirred
> > > > > egress->ingress.
> > > > >
> > > > > The reason is that connection tracking is lost and therefore a new state
> > > > > is created on ingress.
> > > > >
> > > > > This breaks existing setups.
> > > > >
> > > > > See below a simplified script reproducing this issue.
> >
> > I think we come in to a paradox state. Some user don't want to have previous
> > ct info after mirror, while others would like to keep. In my understanding,
> > when we receive a pkt from a interface, the skb should be clean and no ct info
> > at first. But I may wrong.
>
> Makes sense to me. Moreover, there were a couple of fixes on this on
> mirred around that time frame/area (like f799ada6bf23 ("net: sched:
> act_mirred: drop dst for the direction from egress to ingress")). That's
> because we are seeing that mirred xmit action when switching to
> ingress direction should be as close skb_scrub_packet. OVS needs this
> scrubbing as well, btw. This ct information could be easily stale if
> there were other packet changes after it.
Makes sense to me too. The main reason for bringing this up was that it's a
subtle change and wasn't trivial to figure out.
>
> Point being, if we really need the knob for backwards compatibility
> here, it may have to be a broader one.
FWIW the dst change was ok in our setups.
>
> >
> > Jamal, Wang Cong, Jiri, do you have any comments?
> >
> > > >
> > > > I guess I can understand why the reproducer triggers it, but I fail to
> > > > see the actual use case you have behind it. Can you please elaborate
> > > > on it?
> > >
> > > One use case we use mirred egress->ingress redirect for is when we want to
> > > reroute a packet after applying some change to the packet which would affect
> > > its routing. for example consider a bpf program running on tc ingress (after
> > > mirred) setting the skb->mark based on some criteria.
> > >
> > > So you have something like:
> > >
> > > packet routed to dummy device based on some criteria ->
> > > mirred redirect to ingress ->
> > > classification by ebpf logic at tc ingress ->
> > > packet routed again
> > >
> > > We have a setup where DNAT is performed before this flow in that case the
> > > ebpf logic needs to see the packet after the NAT.
> >
> > Is it possible to check whether it's need to set the skb->mark before DNAT?
> > So we can update it before egress and no need to re-route.
For future reference, we worked around this issue by moving some of the
relevant ebpf functionality to the lwt output hook which allows classification
and rerouting.
Eyal.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-04-23 16:08 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09 7:04 [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb Hangbin Liu
2021-08-09 8:35 ` Roi Dayan
2021-08-09 10:00 ` patchwork-bot+netdevbpf
2022-04-19 16:50 ` Eyal Birger
2022-04-19 17:26 ` Marcelo Ricardo Leitner
2022-04-19 18:14 ` Eyal Birger
2022-04-20 10:07 ` Hangbin Liu
2022-04-21 11:00 ` Hangbin Liu
2022-04-22 13:41 ` Marcelo Ricardo Leitner
2022-04-23 16:08 ` Eyal Birger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).