Netdev Archive on
help / color / mirror / Atom feed
From: Jamal Hadi Salim <>
To: Boris Sukholitko <>
Cc:, Jiri Pirko <>,
	Cong Wang <>,
	"David S . Miller" <>,
	Jakub Kicinski <>,
	Vladimir Oltean <>,
	Vadym Kochan <>,
	Ilya Lifshits <>,
	tom Herbert <>,
	Felipe Magno de Almeida <>,
	Pedro Tammela <>
Subject: Re: [PATCH net-next] net/sched: cls_flower: Add orig_ethtype
Date: Tue, 31 Aug 2021 09:18:16 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <20210831120440.GA4641@noodle>

On 2021-08-31 8:04 a.m., Boris Sukholitko wrote:
> Hi Jamal,
> On Mon, Aug 30, 2021 at 09:48:38PM -0400, Jamal Hadi Salim wrote:
>> On 2021-08-30 4:08 a.m., Boris Sukholitko wrote:
>>> The following flower filter fails to match packets:
>>> tc filter add dev eth0 ingress protocol 0x8864 flower \
>>>       action simple sdata hi64
>>> The protocol 0x8864 (ETH_P_PPP_SES) is a tunnel protocol. As such, it is
>>> being dissected by __skb_flow_dissect and it's internal protocol is
>>> being set as key->basic.n_proto. IOW, the existence of ETH_P_PPP_SES
>>> tunnel is transparent to the callers of __skb_flow_dissect.
>>> OTOH, in the filters above, cls_flower configures its key->basic.n_proto
>>> to the ETH_P_PPP_SES value configured by the user. Matching on this key
>>> fails because of __skb_flow_dissect "transparency" mentioned above.
>>> Therefore there is no way currently to match on such packets using
>>> flower.
>>> To fix the issue add new orig_ethtype key to the flower along with the
>>> necessary changes to the flow dissector etc.
>>> To filter the ETH_P_PPP_SES packets the command becomes:
>>> tc filter add dev eth0 ingress flower orig_ethtype 0x8864 \
>>>       action simple sdata hi64
>> Where's "protocol" on the above command line is. Probably a typo?
> There is no need for protocol there. We intend to match on the tunnel
> protocol existence only, disregarding its contents. Therefore
> orig_ethtype key is sufficient.

Hold on: The command line requires you specify the root (parse)
node with keyword "protocol". i.e something like:
tc filter add dev eth0 ingress protocol 0x8864 flower ...

You are suggesting a new syntax - which only serves this one
use case that solves your problem and is only needed for flower
but no other tc classifier.

>> The main culprit is clearly the flow dissector parsing. I am not sure
>> if in general flowdisc to deal with deeper hierarchies/tunnels
>> without constantly adding a lot more hacks. Imagine if you had an
>> ethernet packet with double vlan tags and encapsulating a pppoe packet
>> (i.e 3 or more layers of ethernet) - that would be a nightmare.
> Of course there is no limit to our imagination :) However I would argue
> that in the RealWorld(tm) the number of such nesting cases is
> neglectable.

Youd be very suprised on how some telcos use things like vlan tags
as "path metadata" and/or mpls labels and the layering involved in that.
i.e There is another world out there;-> More important: there is
nothing illegitimate in any standard about multi-layer tunnels as

> The evidence is that TC and flower are being actively used. Double VLAN
> tags configurations notwithstading. IMHO, the fact that I've been
> unlucky to hit this tunnel corner case does not mean that there is a
> design problem with the flower.

You have _not_ been unlucky - it is a design issue with flow dissector
and the wrapping around flower. Just waiting to happen for more
other use cases..

> AFAICS, the current meaning for the protocol field in TC is:
> match the innermost layer 2 type through the tunnels that the kernel
> knows about.

"protocol" describes where you start parsing and matching in
the header tree - for tc that starts with the outer ethertype.
The frame has to make sense.

> And it seems to me that this semantic is perfectly fine and does not
> require fixing. Maybe be we need to mention it in the docs...

Sorry, I disagree.

>> IMO, your approach is adding yet another bandaid.
> Could you please elaborate why do you see this approach as a bandaid?
> The patch in question adds another key to the other ~50 that exists in the
> flower currently. Two more similar keys will be done for single and
> double VLAN case. As a result, my users will gain the ability to match
> packets that are impossible to match right now.

You are changing the semantics of the tooling for your single
use case at the expense of the general syntax. No other tc classifier
needs that keyword and no other tunneling feature within flower
needs it. Note:
It was a mistake allowing the speacial casing of vlans to work
around the dissector's shortcomings. Lets not add more now
"fixed" by user space.
Hacks are already happening with flower in user space
(at some point someone removed differentiation of "protocol"
and flower's "ethertype" to accomodate some h/w offload concern).

> In difference with the TC protocol field, orig_ethtype answers the
> question:
> what is the original eth type of the packet, independent of the further
> kernel processing.

"protocol" points to the root of the parse/match tree i.e where
to start matching.
Tom (Cced) has a nice diagram somewhere which i cant find right now.
There are certainly issues with namespace overlap. If you have multiple
"ethertypes" encapsulated in a parse tree, giving them appropriate
names would help. Some of the flower syntax allows that nicely but
only to two levels..

> IMHO, the above definition is also quite exact and has the right to
> exist because we do not have such ability in the current kernel.
>> Would it make sense for the setting of the
>> skb_key.basic.n_proto  to be from tp->protocol for
>> your specific case in classify().
>> Which means your original setup:
>>   tc filter add dev eth0 ingress protocol 0x8864 flower \
>>       action simple sdata hi64
>> should continue to work if i am not mistaken. Vlans would
>> continue to be a speacial case.
>> I dont know what that would break though...
> I think right off the bat, it will break the following user
> configuration (untested!):
> tc filter add dev eth0 ingress protocol ipv4 flower \
>        action simple sdata hi64

I dont see how from the code inspection. tp->protocol compare
already happened by the time flower classify() is invoked.
i.e it is correct value specified in configuration.
Dont use your update to iproute2. Just the kernel change
is needed.
I am not worried about your use case - just other tunneling
scenarios; we are going to have to run all tdc testcases and
a few more to validate.

> Currently, the above rule matches ipv4 packets encapsulated in
> ETH_P_PPP_SES protocol. The special casing will break it.

Can you describe the rules?
I think you need something like:
tc filter add dev eth0 ingress protocol 0x8864 flower \
ppp_proto ip \
dst_ip ...\
src_ip ....

This will require you introduce new attributes for ppp encaped
inside pppoe; i dont think there will be namespace collision in
the case of ip src/dst because it will be tied to ppp_proto


  reply	other threads:[~2021-08-31 13:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-30  8:08 Boris Sukholitko
2021-08-30  9:00 ` Vladimir Oltean
2021-08-30  9:04   ` Vladimir Oltean
2021-08-30  9:18   ` Boris Sukholitko
2021-08-30  9:21     ` Vladimir Oltean
2021-08-30  9:42       ` Boris Sukholitko
2021-08-30 10:13         ` Vladimir Oltean
2021-08-31  1:48 ` Jamal Hadi Salim
2021-08-31 12:04   ` Boris Sukholitko
2021-08-31 13:18     ` Jamal Hadi Salim [this message]
2021-08-31 14:03       ` Boris Sukholitko
2021-09-02  6:48       ` Ido Schimmel
2021-09-03 22:52         ` Jamal Hadi Salim
2021-09-04 14:08       ` Tom Herbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH net-next] net/sched: cls_flower: Add orig_ethtype' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).