LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH net-next] net: sched: Introduce act_ctinfo action
@ 2019-04-27 13:08 Kevin 'ldir' Darbyshire-Bryant
  2019-04-30 21:47 ` Cong Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-04-27 13:08 UTC (permalink / raw)
  To: Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Shuah Khan, linux-kernel, netdev, linux-kselftest
  Cc: Kevin 'ldir' Darbyshire-Bryant

ctinfo is a new tc filter action module.  It is designed to restore DSCPs
stored in conntrack marks into the ipv4/v6 diffserv field.

The feature is intended for use and has been found useful for restoring
ingress classifications based on egress classifications across links
that bleach or otherwise change DSCP, typically home ISP Internet links.
Restoring DSCP on ingress on the WAN link allows qdiscs such as CAKE to
shape inbound packets according to policies that are easier to indicate
on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

ctinfo understands the following parameters:

dscp dscpmask[/statemask]

dscpmask - a 32 bit mask of at least 6 contiguous bits and indicates
where ctinfo will find the DSCP bits stored in the conntrack mark.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by dscpmask.  This represents a conditional operation flag
whereby the DSCP is only restored if the flag is set.  This is useful to
implement a 'one shot' iptables based classification where the
'complicated' iptables rules are only run once to classify the
connection on initial (egress) packet and subsequent packets are all
marked/restored with the same DSCP.  A mask of zero disables the
conditional behaviour ie. the conntrack mark DSCP bits are always
restored to the ip diffserv field (assuming the conntrack entry is found
& the skb is an ipv4/ipv6 type)

optional parameters:

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>)

e.g. dscp 0xfc000000/0x01000000

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 include/net/tc_act/tc_ctinfo.h            |  24 ++
 include/uapi/linux/pkt_cls.h              |   1 +
 include/uapi/linux/tc_act/tc_ctinfo.h     |  33 ++
 net/sched/Kconfig                         |  13 +
 net/sched/Makefile                        |   1 +
 net/sched/act_ctinfo.c                    | 375 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 7 files changed, 448 insertions(+)
 create mode 100644 include/net/tc_act/tc_ctinfo.h
 create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
 create mode 100644 net/sched/act_ctinfo.c

diff --git a/include/net/tc_act/tc_ctinfo.h b/include/net/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..bb33e66d3ea5
--- /dev/null
+++ b/include/net/tc_act/tc_ctinfo.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CTINFO_H
+#define __NET_TC_CTINFO_H
+
+#include <net/act_api.h>
+
+struct tcf_ctinfo_params {
+	struct net *net;
+	u32 dscpmask;
+	u32 dscpstatemask;
+	u16 zone;
+	u8 mode;
+	u8 dscpmaskshift;
+	struct rcu_head rcu;
+};
+
+struct tcf_ctinfo {
+	struct tc_action common;
+	struct tcf_ctinfo_params __rcu *params;
+};
+
+#define to_ctinfo(a) ((struct tcf_ctinfo *)a)
+
+#endif /* __NET_TC_CTINFO_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51a0496f78ea..a93680fc4bfa 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -105,6 +105,7 @@ enum tca_id {
 	TCA_ID_IFE = TCA_ACT_IFE,
 	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
 	/* other actions go here */
+	TCA_ID_CTINFO,
 	__TCA_ID_MAX = 255
 };
 
diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h b/include/uapi/linux/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..b84902b5e3b1
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_ctinfo.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CTINFO_H
+#define __UAPI_TC_CTINFO_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+struct tc_ctinfo {
+	tc_gen;
+};
+
+struct tc_ctinfo_dscp {
+	__u32 mask;
+	__u32 statemask;
+};
+
+enum {
+	TCA_CTINFO_UNSPEC,
+	TCA_CTINFO_ACT,
+	TCA_CTINFO_ZONE,
+	TCA_CTINFO_DSCP_PARMS,
+	TCA_CTINFO_MODE_DSCP,
+	TCA_CTINFO_TM,
+	TCA_CTINFO_PAD,
+	__TCA_CTINFO_MAX
+};
+#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
+
+enum {
+	CTINFO_MODE_SETDSCP	= BIT(0)
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 5c02ad97ef23..5ac01c5ebae9 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -876,6 +876,19 @@ config NET_ACT_CONNMARK
 	  To compile this code as a module, choose M here: the
 	  module will be called act_connmark.
 
+config NET_ACT_CTINFO
+        tristate "Netfilter Connmark to DSCP Retriever"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+        help
+	  Say Y here to allow transfer of a connmark stored DSCP into
+	  ipv4/v6 diffserv
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_ctinfo.
+
 config NET_ACT_SKBMOD
         tristate "skb data modification action"
         depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8a40431d7b5c..d54bfcbd7981 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
 obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
 obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
+obj-$(CONFIG_NET_ACT_CTINFO)	+= act_ctinfo.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
new file mode 100644
index 000000000000..01a8694651ea
--- /dev/null
+++ b/net/sched/act_ctinfo.c
@@ -0,0 +1,375 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/* net/sched/act_ctinfo.c  netfilter ctinfo connmark->DSCP action
+ *
+ * Copyright (c) 2019 Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <net/pkt_cls.h>
+#include <uapi/linux/tc_act/tc_ctinfo.h>
+#include <net/tc_act/tc_ctinfo.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+static unsigned int ctinfo_net_id;
+static struct tc_action_ops act_ctinfo_ops;
+
+static void tcf_ctinfo_dscp_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
+				struct tcf_ctinfo_params *cp,
+				struct sk_buff *skb, int wlen, int proto)
+{
+	u8 dscp, newdscp;
+
+	newdscp = (((ct->mark & cp->dscpmask) >> cp->dscpmaskshift) << 2) &
+		     ~INET_ECN_MASK;
+
+	/* mark contains DSCP so restore DSCP bits from c->mark into diffserv */
+	/* using overlimits stats to count how many DSCP updates */
+	switch (proto) {
+	case NFPROTO_IPV4:
+		dscp = ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (!skb_try_make_writable(skb, wlen)) {
+				ipv4_change_dsfield(ip_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->tcf_qstats.overlimits++;
+			} else {
+				ca->tcf_qstats.drops++;
+			}
+		}
+		break;
+	case NFPROTO_IPV6:
+		dscp = ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (!skb_try_make_writable(skb, wlen)) {
+				ipv6_change_dsfield(ipv6_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->tcf_qstats.overlimits++;
+			} else {
+				ca->tcf_qstats.drops++;
+			}
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+static int tcf_ctinfo_act(struct sk_buff *skb, const struct tc_action *a,
+			  struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash = NULL;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_ctinfo *ca = to_ctinfo(a);
+	struct tcf_ctinfo_params *cp;
+	struct nf_conntrack_zone zone;
+	struct nf_conn *ct;
+	int proto, wlen;
+	int action;
+
+	cp = rcu_dereference_bh(ca->params);
+
+	tcf_lastuse_update(&ca->tcf_tm);
+	bstats_update(&ca->tcf_bstats, skb);
+	action = READ_ONCE(ca->tcf_action);
+
+	/* currently the only mode we know but in future...*/
+	if (unlikely(!(cp->mode & CTINFO_MODE_SETDSCP)))
+		goto out;
+
+	wlen = skb_network_offset(skb);
+	if (tc_skb_protocol(skb) == htons(ETH_P_IP)) {
+		wlen += sizeof(struct iphdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (tc_skb_protocol(skb) == htons(ETH_P_IPV6)) {
+		wlen += sizeof(struct ipv6hdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct) { /* look harder usually ingress */
+		if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+				       proto, cp->net, &tuple))
+			goto out;
+		zone.id = cp->zone;
+		zone.dir = NF_CT_DEFAULT_ZONE_DIR;
+
+		thash = nf_conntrack_find_get(cp->net, &zone, &tuple);
+		if (!thash)
+			goto out;
+
+		ct = nf_ct_tuplehash_to_ctrack(thash);
+	}
+
+	if (!cp->dscpstatemask || (ct->mark & cp->dscpstatemask))
+		tcf_ctinfo_dscp_set(ct, ca, cp, skb, wlen, proto);
+
+	if (thash)
+		nf_ct_put(ct);
+out:
+	return action;
+}
+
+static const struct nla_policy ctinfo_policy[TCA_CTINFO_MAX + 1] = {
+	[TCA_CTINFO_ACT] = { .len = sizeof(struct tc_ctinfo) },
+	[TCA_CTINFO_ZONE] = { .type = NLA_U16 },
+	[TCA_CTINFO_MODE_DSCP] = { .type = NLA_FLAG },
+	[TCA_CTINFO_DSCP_PARMS] = { .len = sizeof(struct tc_ctinfo_dscp) },
+};
+
+static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
+			   struct nlattr *est, struct tc_action **a,
+			   int ovr, int bind, bool rtnl_held,
+			   struct tcf_proto *tp,
+			   struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+	struct tcf_ctinfo_params *cp_new;
+	struct nlattr *tb[TCA_CTINFO_MAX + 1];
+	struct tcf_chain *goto_ch = NULL;
+	struct tcf_ctinfo *ci;
+	struct tc_ctinfo *actparm;
+	struct tc_ctinfo_dscp *dscpparm;
+	int ret = 0, err, i;
+
+	if (!nla)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CTINFO_MAX, nla, ctinfo_policy, NULL);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_CTINFO_ACT])
+		return -EINVAL;
+
+	if (tb[TCA_CTINFO_MODE_DSCP] && !tb[TCA_CTINFO_DSCP_PARMS])
+		return -EINVAL;
+
+	actparm = nla_data(tb[TCA_CTINFO_ACT]);
+	dscpparm = nla_data(tb[TCA_CTINFO_DSCP_PARMS]);
+
+	if (dscpparm) {
+		/* need at least contiguous 6 bit mask */
+		i = dscpparm->mask ? __ffs(dscpparm->mask) : 0;
+		if ((0x3f & (dscpparm->mask >> i)) != 0x3f)
+			return -EINVAL;
+		/* mask & statemask must not overlap */
+		if (dscpparm->mask & dscpparm->statemask)
+			return -EINVAL;
+	}
+//done the validation:now to the actual action allocation
+	err = tcf_idr_check_alloc(tn, &actparm->index, a, bind);
+	if (!err) {
+		ret = tcf_idr_create(tn, actparm->index, est, a,
+				     &act_ctinfo_ops, bind, false);
+		if (ret) {
+			tcf_idr_cleanup(tn, actparm->index);
+			return ret;
+		}
+	} else if (err > 0) {
+		if (bind) /* don't override defaults */
+			return 0;
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
+			return -EEXIST;
+		}
+	} else {
+		return err;
+	}
+
+	err = tcf_action_check_ctrlact(actparm->action, tp, &goto_ch, extack);
+	if (err < 0)
+		goto release_idr;
+
+	ci = to_ctinfo(*a);
+
+	cp_new = kzalloc(sizeof(*cp_new), GFP_KERNEL);
+	if (unlikely(!cp_new)) {
+		err = -ENOMEM;
+		goto put_chain;
+	}
+
+	cp_new->net = net;
+	cp_new->zone = tb[TCA_CTINFO_ZONE] ?
+			nla_get_u16(tb[TCA_CTINFO_ZONE]) : 0;
+	if (dscpparm) {
+		cp_new->dscpmask = dscpparm->mask;
+		cp_new->dscpmaskshift = cp_new->dscpmask ?
+				__ffs(cp_new->dscpmask) : 0;
+		cp_new->dscpstatemask = dscpparm->statemask;
+	}
+
+	if (tb[TCA_CTINFO_MODE_DSCP])
+		cp_new->mode |= CTINFO_MODE_SETDSCP;
+	else
+		cp_new->mode &= ~CTINFO_MODE_SETDSCP;
+
+	spin_lock_bh(&ci->tcf_lock);
+	goto_ch = tcf_action_set_ctrlact(*a, actparm->action, goto_ch);
+	rcu_swap_protected(ci->params, cp_new,
+			   lockdep_is_held(&ci->tcf_lock));
+	spin_unlock_bh(&ci->tcf_lock);
+
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+	if (cp_new)
+		kfree_rcu(cp_new, rcu);
+
+	if (ret == ACT_P_CREATED)
+		tcf_idr_insert(tn, *a);
+
+	return ret;
+
+put_chain:
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+release_idr:
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static inline int tcf_ctinfo_dump(struct sk_buff *skb, struct tc_action *a,
+				  int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_ctinfo *ci = to_ctinfo(a);
+	struct tcf_ctinfo_params *cp;
+	struct tc_ctinfo opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+	struct tc_ctinfo_dscp dscpparm;
+
+	spin_lock_bh(&ci->tcf_lock);
+	cp = rcu_dereference_protected(ci->params,
+				       lockdep_is_held(&ci->tcf_lock));
+	opt.action = ci->tcf_action;
+
+	if (nla_put(skb, TCA_CTINFO_ACT, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	if (cp->mode & CTINFO_MODE_SETDSCP) {
+		dscpparm.mask = cp->dscpmask;
+		dscpparm.statemask = cp->dscpstatemask;
+		if (nla_put(skb, TCA_CTINFO_DSCP_PARMS, sizeof(dscpparm),
+			    &dscpparm))
+			goto nla_put_failure;
+
+		if (nla_put_flag(skb, TCA_CTINFO_MODE_DSCP))
+			goto nla_put_failure;
+	}
+
+	if (cp->zone) {
+		if (nla_put_u16(skb, TCA_CTINFO_ZONE, cp->zone))
+			goto nla_put_failure;
+	}
+
+	tcf_tm_dump(&t, &ci->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CTINFO_TM, sizeof(t), &t,
+			  TCA_CTINFO_PAD))
+		goto nla_put_failure;
+
+	spin_unlock_bh(&ci->tcf_lock);
+
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&ci->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_ctinfo_walker(struct net *net, struct sk_buff *skb,
+			     struct netlink_callback *cb, int type,
+			     const struct tc_action_ops *ops,
+			     struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_ctinfo_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static struct tc_action_ops act_ctinfo_ops = {
+	.kind		=	"ctinfo",
+	.id		=	TCA_ID_CTINFO,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_ctinfo_act,
+	.dump		=	tcf_ctinfo_dump,
+	.init		=	tcf_ctinfo_init,
+	.walk		=	tcf_ctinfo_walker,
+	.lookup		=	tcf_ctinfo_search,
+	.size		=	sizeof(struct tcf_ctinfo),
+};
+
+static __net_init int ctinfo_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tc_action_net_init(tn, &act_ctinfo_ops);
+}
+
+static void __net_exit ctinfo_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, ctinfo_net_id);
+}
+
+static struct pernet_operations ctinfo_net_ops = {
+	.init = ctinfo_init_net,
+	.exit_batch = ctinfo_exit_net,
+	.id   = &ctinfo_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init ctinfo_init_module(void)
+{
+	return tcf_register_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+static void __exit ctinfo_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+module_init(ctinfo_init_module);
+module_exit(ctinfo_cleanup_module);
+MODULE_AUTHOR("Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>");
+MODULE_DESCRIPTION("Conntrack mark to DSCP restoring");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
index 203302065458..9d1fddcfb887 100644
--- a/tools/testing/selftests/tc-testing/config
+++ b/tools/testing/selftests/tc-testing/config
@@ -37,6 +37,7 @@ CONFIG_NET_ACT_SKBEDIT=m
 CONFIG_NET_ACT_CSUM=m
 CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_ACT_BPF=m
+CONFIG_NET_ACT_CONNDSCP=m
 CONFIG_NET_ACT_CONNMARK=m
 CONFIG_NET_ACT_SKBMOD=m
 CONFIG_NET_ACT_IFE=m
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] net: sched: Introduce act_ctinfo action
  2019-04-27 13:08 [PATCH net-next] net: sched: Introduce act_ctinfo action Kevin 'ldir' Darbyshire-Bryant
@ 2019-04-30 21:47 ` Cong Wang
  2019-05-03 21:20   ` Kevin 'ldir' Darbyshire-Bryant
  2019-05-05 10:15   ` [net-next v2] " Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 2 replies; 13+ messages in thread
From: Cong Wang @ 2019-04-30 21:47 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant
  Cc: Jamal Hadi Salim, Jiri Pirko, David S. Miller, Shuah Khan,
	linux-kernel, netdev, linux-kselftest

On Sat, Apr 27, 2019 at 6:08 AM Kevin 'ldir' Darbyshire-Bryant
<ldir@darbyshire-bryant.me.uk> wrote:
>
> ctinfo is a new tc filter action module.  It is designed to restore DSCPs
> stored in conntrack marks into the ipv4/v6 diffserv field.

I think we can retrieve any information from conntrack with such
a general name, including skb mark. So, as you already pick the
name ctinfo, please make it general rather than just DSCP.
You can add skb mark into your ctinfo too so that act_connmark
can be just replaced.

Your patch looks fine from a quick galance, please make sure
you run checkpatch.pl to keep your coding style aligned to Linux
kernel's, at least I don't think we accept C++ style comments.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next] net: sched: Introduce act_ctinfo action
  2019-04-30 21:47 ` Cong Wang
@ 2019-05-03 21:20   ` Kevin 'ldir' Darbyshire-Bryant
  2019-05-05 10:15   ` [net-next v2] " Kevin 'ldir' Darbyshire-Bryant
  1 sibling, 0 replies; 13+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-05-03 21:20 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jamal Hadi Salim, Jiri Pirko, David S. Miller, Shuah Khan,
	linux-kernel, netdev, linux-kselftest



> On 30 Apr 2019, at 22:47, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Sat, Apr 27, 2019 at 6:08 AM Kevin 'ldir' Darbyshire-Bryant
> <ldir@darbyshire-bryant.me.uk> wrote:
>> 
>> ctinfo is a new tc filter action module.  It is designed to restore DSCPs
>> stored in conntrack marks into the ipv4/v6 diffserv field.
> 
> I think we can retrieve any information from conntrack with such
> a general name, including skb mark. So, as you already pick the
> name ctinfo, please make it general rather than just DSCP.
> You can add skb mark into your ctinfo too so that act_connmark
> can be just replaced.

Hi Cong,

Thanks for the review, I have a v2 in progress addressing that along
with another silly that got through.  I’m also re-working the stats
reporting to return act_ctinfo stats instead of usurping the dropped,
overlimits & dropped figures.

> 
> Your patch looks fine from a quick galance, please make sure
> you run checkpatch.pl to keep your coding style aligned to Linux
> kernel's, at least I don't think we accept C++ style comments.

This time I’ll remember to run checkpatch before I submit instead of
after :-)

> 
> Thanks.


Cheers,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [net-next v2] net: sched: Introduce act_ctinfo action
  2019-04-30 21:47 ` Cong Wang
  2019-05-03 21:20   ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-05-05 10:15   ` Kevin 'ldir' Darbyshire-Bryant
  2019-05-05 10:23     ` Greg KH
  2019-05-05 13:20     ` [net-next v3] " Kevin 'ldir' Darbyshire-Bryant
  1 sibling, 2 replies; 13+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-05-05 10:15 UTC (permalink / raw)
  To: xiyou.wangcong
  Cc: davem, jhs, jiri, Kevin 'ldir' Darbyshire-Bryant,
	linux-kernel, linux-kselftest, netdev, shuah

ctinfo is a new tc filter action module.  It is designed to restore
information contained in conntrack marks to other places.  At present it
can restore DSCP values to IPv4/6 diffserv fields and also copy
conntrack marks to skb marks.  As such the 2nd function effectively
replaces the existing act_connmark module

The DSCP restoration is intended for use and has been found useful for
restoring ingress classifications based on egress classifications across
links that bleach or otherwise change DSCP, typically home ISP Internet
links.  Restoring DSCP on ingress on the WAN link allows qdiscs such as
CAKE to shape inbound packets according to policies that are easier to
indicate on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

ctinfo understands the following parameters:

dscp dscpmask[/statemask]

dscpmask - a 32 bit mask of at least 6 contiguous bits and indicates
where ctinfo will find the DSCP bits stored in the conntrack mark.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by dscpmask.  This represents a conditional operation flag
whereby the DSCP is only restored if the flag is set.  This is useful to
implement a 'one shot' iptables based classification where the
'complicated' iptables rules are only run once to classify the
connection on initial (egress) packet and subsequent packets are all
marked/restored with the same DSCP.  A mask of zero disables the
conditional behaviour ie. the conntrack mark DSCP bits are always
restored to the ip diffserv field (assuming the conntrack entry is found
& the skb is an ipv4/ipv6 type)

mark [mask]

mark - enables copying the conntrack connmark value to the skb mark

mask - a 32 bit mask applied to the mark to mask out bit unwanted for
restoration.  The CAKE qdisc for example understands both DSCP and 'tin'
classification stored the mark, thus act_ctinfo may be used to restore
both aspects of classification for CAKE in one action.  A default mask
of 0xffffffff is applied if not specified.

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>)

e.g. dscp 0xfc000000/0x01000000

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

e.g. mark 0x00ffffff

|----0x00----conntrack mark----ffffff---|
| Bits 31-24 |                          |
| DSCP & flag|                          |
|---------------------------------------|
			|
			|
			v
|------------skb mark-------------------|
|                                       |
|                                       |
|---------------------------------------|

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---

v2 - add equivalent connmark functionality with an enhancement
     to accept a mask
     pass statistics for each sub-function as individual netlink
     attributes and stop (ab)using overlimits, drops
     update the testing config correctly

 include/net/tc_act/tc_ctinfo.h            |  28 ++
 include/uapi/linux/pkt_cls.h              |   1 +
 include/uapi/linux/tc_act/tc_ctinfo.h     |  43 +++
 net/sched/Kconfig                         |  17 +
 net/sched/Makefile                        |   1 +
 net/sched/act_ctinfo.c                    | 407 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 7 files changed, 498 insertions(+)
 create mode 100644 include/net/tc_act/tc_ctinfo.h
 create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
 create mode 100644 net/sched/act_ctinfo.c

diff --git a/include/net/tc_act/tc_ctinfo.h b/include/net/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..87334120dcb6
--- /dev/null
+++ b/include/net/tc_act/tc_ctinfo.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CTINFO_H
+#define __NET_TC_CTINFO_H
+
+#include <net/act_api.h>
+
+struct tcf_ctinfo_params {
+	struct net *net;
+	u32 dscpmask;
+	u32 dscpstatemask;
+	u32 markmask;
+	u16 zone;
+	u8 mode;
+	u8 dscpmaskshift;
+	struct rcu_head rcu;
+};
+
+struct tcf_ctinfo {
+	struct tc_action common;
+	struct tcf_ctinfo_params __rcu *params;
+	u64 stats_dscp_set;
+	u64 stats_dscp_error;
+	u64 stats_mark_set;
+};
+
+#define to_ctinfo(a) ((struct tcf_ctinfo *)a)
+
+#endif /* __NET_TC_CTINFO_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51a0496f78ea..a93680fc4bfa 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -105,6 +105,7 @@ enum tca_id {
 	TCA_ID_IFE = TCA_ACT_IFE,
 	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
 	/* other actions go here */
+	TCA_ID_CTINFO,
 	__TCA_ID_MAX = 255
 };
 
diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h b/include/uapi/linux/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..8d254b82151c
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_ctinfo.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CTINFO_H
+#define __UAPI_TC_CTINFO_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+struct tc_ctinfo {
+	tc_gen;
+};
+
+struct tc_ctinfo_dscp {
+	__u32 mask;
+	__u32 statemask;
+};
+
+struct tc_ctinfo_stats_dscp {
+	__u64 set;
+	__u64 error;
+};
+
+enum {
+	TCA_CTINFO_UNSPEC,
+	TCA_CTINFO_ACT,
+	TCA_CTINFO_ZONE,
+	TCA_CTINFO_DSCP_PARMS,
+	TCA_CTINFO_MARK_MASK,
+	TCA_CTINFO_MODE_DSCP,
+	TCA_CTINFO_MODE_MARK,
+	TCA_CTINFO_STATS_DSCP,
+	TCA_CTINFO_STATS_MARK,
+	TCA_CTINFO_TM,
+	TCA_CTINFO_PAD,
+	__TCA_CTINFO_MAX
+};
+#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
+
+enum {
+	CTINFO_MODE_SETDSCP	= BIT(0),
+	CTINFO_MODE_SETMARK	= BIT(1)
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 5c02ad97ef23..f5773effcfdc 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -876,6 +876,23 @@ config NET_ACT_CONNMARK
 	  To compile this code as a module, choose M here: the
 	  module will be called act_connmark.
 
+config NET_ACT_CTINFO
+        tristate "Netfilter Connection Mark Actions"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+        help
+	  Say Y here to allow transfer of a connmark stored information.
+	  Current actions transfer connmark stored DSCP into
+	  ipv4/v6 diffserv and/or to transfer connmark to packet
+	  mark.  Both are useful for restoring egress based marks
+	  back onto ingress connections for qdisc priority mapping
+	  purposes.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_ctinfo.
+
 config NET_ACT_SKBMOD
         tristate "skb data modification action"
         depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8a40431d7b5c..d54bfcbd7981 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
 obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
 obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
+obj-$(CONFIG_NET_ACT_CTINFO)	+= act_ctinfo.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
new file mode 100644
index 000000000000..7beab00cc9a7
--- /dev/null
+++ b/net/sched/act_ctinfo.c
@@ -0,0 +1,407 @@
+// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+/* net/sched/act_ctinfo.c  netfilter ctinfo connmark->DSCP action
+ *
+ * Copyright (c) 2019 Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <net/pkt_cls.h>
+#include <uapi/linux/tc_act/tc_ctinfo.h>
+#include <net/tc_act/tc_ctinfo.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+static unsigned int ctinfo_net_id;
+static struct tc_action_ops act_ctinfo_ops;
+
+static void tcf_ctinfo_dscp_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
+				struct tcf_ctinfo_params *cp,
+				struct sk_buff *skb, int wlen, int proto)
+{
+	u8 dscp, newdscp;
+
+	newdscp = (((ct->mark & cp->dscpmask) >> cp->dscpmaskshift) << 2) &
+		     ~INET_ECN_MASK;
+
+	switch (proto) {
+	case NFPROTO_IPV4:
+		dscp = ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (likely(!skb_try_make_writable(skb, wlen))) {
+				ipv4_change_dsfield(ip_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->stats_dscp_set++;
+			} else {
+				ca->stats_dscp_error++;
+			}
+		}
+		break;
+	case NFPROTO_IPV6:
+		dscp = ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (likely(!skb_try_make_writable(skb, wlen))) {
+				ipv6_change_dsfield(ipv6_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->stats_dscp_set++;
+			} else {
+				ca->stats_dscp_error++;
+			}
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+static void tcf_ctinfo_mark_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
+				struct tcf_ctinfo_params *cp,
+				struct sk_buff *skb)
+{
+	ca->stats_mark_set++;
+	skb->mark = ct->mark & cp->markmask;
+}
+
+static int tcf_ctinfo_act(struct sk_buff *skb, const struct tc_action *a,
+			  struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash = NULL;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_ctinfo *ca = to_ctinfo(a);
+	struct tcf_ctinfo_params *cp;
+	struct nf_conntrack_zone zone;
+	struct nf_conn *ct;
+	int proto, wlen;
+	int action;
+
+	cp = rcu_dereference_bh(ca->params);
+
+	tcf_lastuse_update(&ca->tcf_tm);
+	bstats_update(&ca->tcf_bstats, skb);
+	action = READ_ONCE(ca->tcf_action);
+
+	wlen = skb_network_offset(skb);
+	if (tc_skb_protocol(skb) == htons(ETH_P_IP)) {
+		wlen += sizeof(struct iphdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (tc_skb_protocol(skb) == htons(ETH_P_IPV6)) {
+		wlen += sizeof(struct ipv6hdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct) { /* look harder, usually ingress */
+		if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+				       proto, cp->net, &tuple))
+			goto out;
+		zone.id = cp->zone;
+		zone.dir = NF_CT_DEFAULT_ZONE_DIR;
+
+		thash = nf_conntrack_find_get(cp->net, &zone, &tuple);
+		if (!thash)
+			goto out;
+
+		ct = nf_ct_tuplehash_to_ctrack(thash);
+	}
+
+	if (cp->mode & CTINFO_MODE_SETDSCP)
+		if (!cp->dscpstatemask || (ct->mark & cp->dscpstatemask))
+			tcf_ctinfo_dscp_set(ct, ca, cp, skb, wlen, proto);
+
+	if (cp->mode & CTINFO_MODE_SETMARK)
+		tcf_ctinfo_mark_set(ct, ca, cp, skb);
+
+	if (thash)
+		nf_ct_put(ct);
+out:
+	return action;
+}
+
+static const struct nla_policy ctinfo_policy[TCA_CTINFO_MAX + 1] = {
+	[TCA_CTINFO_ACT] = { .len = sizeof(struct tc_ctinfo) },
+	[TCA_CTINFO_ZONE] = { .type = NLA_U16 },
+	[TCA_CTINFO_MODE_DSCP] = { .type = NLA_FLAG },
+	[TCA_CTINFO_MODE_MARK] = { .type = NLA_FLAG },
+	[TCA_CTINFO_DSCP_PARMS] = { .len = sizeof(struct tc_ctinfo_dscp) },
+};
+
+static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
+			   struct nlattr *est, struct tc_action **a,
+			   int ovr, int bind, bool rtnl_held,
+			   struct tcf_proto *tp,
+			   struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+	struct tcf_ctinfo_params *cp_new;
+	struct nlattr *tb[TCA_CTINFO_MAX + 1];
+	struct tcf_chain *goto_ch = NULL;
+	struct tcf_ctinfo *ci;
+	struct tc_ctinfo *actparm;
+	struct tc_ctinfo_dscp *dscpparm;
+	int ret = 0, err, i;
+
+	if (!nla)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CTINFO_MAX, nla, ctinfo_policy, NULL);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_CTINFO_ACT])
+		return -EINVAL;
+
+	if (tb[TCA_CTINFO_MODE_DSCP] && !tb[TCA_CTINFO_DSCP_PARMS])
+		return -EINVAL;
+
+	actparm = nla_data(tb[TCA_CTINFO_ACT]);
+	dscpparm = nla_data(tb[TCA_CTINFO_DSCP_PARMS]);
+
+	if (dscpparm) {
+		/* need at least contiguous 6 bit mask */
+		i = dscpparm->mask ? __ffs(dscpparm->mask) : 0;
+		if ((0x3f & (dscpparm->mask >> i)) != 0x3f)
+			return -EINVAL;
+		/* mask & statemask must not overlap */
+		if (dscpparm->mask & dscpparm->statemask)
+			return -EINVAL;
+	}
+
+	/* done the validation:now to the actual action allocation */
+	err = tcf_idr_check_alloc(tn, &actparm->index, a, bind);
+	if (!err) {
+		ret = tcf_idr_create(tn, actparm->index, est, a,
+				     &act_ctinfo_ops, bind, false);
+		if (ret) {
+			tcf_idr_cleanup(tn, actparm->index);
+			return ret;
+		}
+	} else if (err > 0) {
+		if (bind) /* don't override defaults */
+			return 0;
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
+			return -EEXIST;
+		}
+	} else {
+		return err;
+	}
+
+	err = tcf_action_check_ctrlact(actparm->action, tp, &goto_ch, extack);
+	if (err < 0)
+		goto release_idr;
+
+	ci = to_ctinfo(*a);
+
+	cp_new = kzalloc(sizeof(*cp_new), GFP_KERNEL);
+	if (unlikely(!cp_new)) {
+		err = -ENOMEM;
+		goto put_chain;
+	}
+
+	cp_new->net = net;
+	cp_new->zone = tb[TCA_CTINFO_ZONE] ?
+			nla_get_u16(tb[TCA_CTINFO_ZONE]) : 0;
+	if (dscpparm) {
+		cp_new->dscpmask = dscpparm->mask;
+		cp_new->dscpmaskshift = cp_new->dscpmask ?
+				__ffs(cp_new->dscpmask) : 0;
+		cp_new->dscpstatemask = dscpparm->statemask;
+	}
+	cp_new->markmask = tb[TCA_CTINFO_MARK_MASK] ?
+			nla_get_u32(tb[TCA_CTINFO_MARK_MASK]) : ~0;
+
+	if (tb[TCA_CTINFO_MODE_DSCP])
+		cp_new->mode |= CTINFO_MODE_SETDSCP;
+	else
+		cp_new->mode &= ~CTINFO_MODE_SETDSCP;
+
+	if (tb[TCA_CTINFO_MODE_MARK])
+		cp_new->mode |= CTINFO_MODE_SETMARK;
+	else
+		cp_new->mode &= ~CTINFO_MODE_SETMARK;
+
+	spin_lock_bh(&ci->tcf_lock);
+	goto_ch = tcf_action_set_ctrlact(*a, actparm->action, goto_ch);
+	rcu_swap_protected(ci->params, cp_new,
+			   lockdep_is_held(&ci->tcf_lock));
+	spin_unlock_bh(&ci->tcf_lock);
+
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+	if (cp_new)
+		kfree_rcu(cp_new, rcu);
+
+	if (ret == ACT_P_CREATED)
+		tcf_idr_insert(tn, *a);
+
+	return ret;
+
+put_chain:
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+release_idr:
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static inline int tcf_ctinfo_dump(struct sk_buff *skb, struct tc_action *a,
+				  int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_ctinfo *ci = to_ctinfo(a);
+	struct tcf_ctinfo_params *cp;
+	struct tc_ctinfo opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+	struct tc_ctinfo_dscp dscpparm;
+	struct tc_ctinfo_stats_dscp dscpstats;
+
+	spin_lock_bh(&ci->tcf_lock);
+	cp = rcu_dereference_protected(ci->params,
+				       lockdep_is_held(&ci->tcf_lock));
+	opt.action = ci->tcf_action;
+
+	if (nla_put(skb, TCA_CTINFO_ACT, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	if (cp->mode & CTINFO_MODE_SETDSCP) {
+		dscpparm.mask = cp->dscpmask;
+		dscpparm.statemask = cp->dscpstatemask;
+		if (nla_put(skb, TCA_CTINFO_DSCP_PARMS, sizeof(dscpparm),
+			    &dscpparm))
+			goto nla_put_failure;
+
+		if (nla_put_flag(skb, TCA_CTINFO_MODE_DSCP))
+			goto nla_put_failure;
+
+		dscpstats.set = ci->stats_dscp_set;
+		dscpstats.error = ci->stats_dscp_error;
+		if (nla_put(skb, TCA_CTINFO_STATS_DSCP, sizeof(dscpstats),
+			    &dscpstats))
+			goto nla_put_failure;
+	}
+
+	if (cp->mode & CTINFO_MODE_SETMARK) {
+		if (nla_put_u32(skb, TCA_CTINFO_MARK_MASK, cp->markmask))
+			goto nla_put_failure;
+
+		if (nla_put_flag(skb, TCA_CTINFO_MODE_MARK))
+			goto nla_put_failure;
+
+		if (nla_put_u64_64bit(skb, TCA_CTINFO_STATS_MARK,
+				      ci->stats_mark_set, TCA_CTINFO_PAD))
+			goto nla_put_failure;
+	}
+
+	if (cp->zone) {
+		if (nla_put_u16(skb, TCA_CTINFO_ZONE, cp->zone))
+			goto nla_put_failure;
+	}
+
+	tcf_tm_dump(&t, &ci->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CTINFO_TM, sizeof(t), &t, TCA_CTINFO_PAD))
+		goto nla_put_failure;
+
+	spin_unlock_bh(&ci->tcf_lock);
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&ci->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_ctinfo_walker(struct net *net, struct sk_buff *skb,
+			     struct netlink_callback *cb, int type,
+			     const struct tc_action_ops *ops,
+			     struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_ctinfo_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static struct tc_action_ops act_ctinfo_ops = {
+	.kind		=	"ctinfo",
+	.id		=	TCA_ID_CTINFO,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_ctinfo_act,
+	.dump		=	tcf_ctinfo_dump,
+	.init		=	tcf_ctinfo_init,
+	.walk		=	tcf_ctinfo_walker,
+	.lookup		=	tcf_ctinfo_search,
+	.size		=	sizeof(struct tcf_ctinfo),
+};
+
+static __net_init int ctinfo_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tc_action_net_init(tn, &act_ctinfo_ops);
+}
+
+static void __net_exit ctinfo_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, ctinfo_net_id);
+}
+
+static struct pernet_operations ctinfo_net_ops = {
+	.init = ctinfo_init_net,
+	.exit_batch = ctinfo_exit_net,
+	.id   = &ctinfo_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init ctinfo_init_module(void)
+{
+	return tcf_register_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+static void __exit ctinfo_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+module_init(ctinfo_init_module);
+module_exit(ctinfo_cleanup_module);
+MODULE_AUTHOR("Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>");
+MODULE_DESCRIPTION("Connection tracking mark actions");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
index 203302065458..b235efd55367 100644
--- a/tools/testing/selftests/tc-testing/config
+++ b/tools/testing/selftests/tc-testing/config
@@ -38,6 +38,7 @@ CONFIG_NET_ACT_CSUM=m
 CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_ACT_BPF=m
 CONFIG_NET_ACT_CONNMARK=m
+CONFIG_NET_ACT_CTINFO=m
 CONFIG_NET_ACT_SKBMOD=m
 CONFIG_NET_ACT_IFE=m
 CONFIG_NET_ACT_TUNNEL_KEY=m
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next v2] net: sched: Introduce act_ctinfo action
  2019-05-05 10:15   ` [net-next v2] " Kevin 'ldir' Darbyshire-Bryant
@ 2019-05-05 10:23     ` Greg KH
  2019-05-05 10:32       ` Kevin 'ldir' Darbyshire-Bryant
  2019-05-05 13:20     ` [net-next v3] " Kevin 'ldir' Darbyshire-Bryant
  1 sibling, 1 reply; 13+ messages in thread
From: Greg KH @ 2019-05-05 10:23 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant
  Cc: xiyou.wangcong, davem, jhs, jiri, linux-kernel, linux-kselftest,
	netdev, shuah

On Sun, May 05, 2019 at 10:15:43AM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
> --- /dev/null
> +++ b/net/sched/act_ctinfo.c
> @@ -0,0 +1,407 @@
> +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note

How can a .c file, buried in the kernel tree, have a Linux-syscall-note
exception to it?

Are you _sure_ that is ok?  That license should only be for files in the
uapi header directory.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next v2] net: sched: Introduce act_ctinfo action
  2019-05-05 10:23     ` Greg KH
@ 2019-05-05 10:32       ` Kevin 'ldir' Darbyshire-Bryant
  2019-05-05 10:43         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-05-05 10:32 UTC (permalink / raw)
  To: Greg KH
  Cc: xiyou.wangcong, davem, jhs, jiri, linux-kernel, linux-kselftest,
	netdev, shuah



> On 5 May 2019, at 11:23, Greg KH <greg@kroah.com> wrote:
> 
> On Sun, May 05, 2019 at 10:15:43AM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
>> --- /dev/null
>> +++ b/net/sched/act_ctinfo.c
>> @@ -0,0 +1,407 @@
>> +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
> 

Hey Greg, thanks for the review.

> How can a .c file, buried in the kernel tree, have a Linux-syscall-note
> exception to it?

Because I’m a moron and nobody else spotted it.
> 
> Are you _sure_ that is ok?  That license should only be for files in the
> uapi header directory.

Expect a v3 shortly.  I shall wear your chastisement with honour :-)


Cheers,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next v2] net: sched: Introduce act_ctinfo action
  2019-05-05 10:32       ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-05-05 10:43         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-05-05 10:43 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant, Greg KH
  Cc: xiyou.wangcong, davem, jhs, jiri, linux-kernel, linux-kselftest,
	netdev, shuah

Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> writes:

>> On 5 May 2019, at 11:23, Greg KH <greg@kroah.com> wrote:
>> 
>> On Sun, May 05, 2019 at 10:15:43AM +0000, Kevin 'ldir' Darbyshire-Bryant wrote:
>>> --- /dev/null
>>> +++ b/net/sched/act_ctinfo.c
>>> @@ -0,0 +1,407 @@
>>> +// SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
>> 
>
> Hey Greg, thanks for the review.
>
>> How can a .c file, buried in the kernel tree, have a Linux-syscall-note
>> exception to it?
>
> Because I’m a moron and nobody else spotted it.
>> 
>> Are you _sure_ that is ok?  That license should only be for files in the
>> uapi header directory.
>
> Expect a v3 shortly.  I shall wear your chastisement with honour :-)

While you're at it, you don't actually need the GPL text blob when using
the SPDX headers... :)

-Toke

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [net-next v3] net: sched: Introduce act_ctinfo action
  2019-05-05 10:15   ` [net-next v2] " Kevin 'ldir' Darbyshire-Bryant
  2019-05-05 10:23     ` Greg KH
@ 2019-05-05 13:20     ` Kevin 'ldir' Darbyshire-Bryant
  2019-05-07 19:39       ` David Miller
  1 sibling, 1 reply; 13+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-05-05 13:20 UTC (permalink / raw)
  To: Kevin 'ldir' Darbyshire-Bryant
  Cc: davem, jhs, jiri, linux-kernel, linux-kselftest, netdev, shuah,
	xiyou.wangcong

ctinfo is a new tc filter action module.  It is designed to restore
information contained in conntrack marks to other places.  At present it
can restore DSCP values to IPv4/6 diffserv fields and also copy
conntrack marks to skb marks.  As such the 2nd function effectively
replaces the existing act_connmark module

The DSCP restoration is intended for use and has been found useful for
restoring ingress classifications based on egress classifications across
links that bleach or otherwise change DSCP, typically home ISP Internet
links.  Restoring DSCP on ingress on the WAN link allows qdiscs such as
CAKE to shape inbound packets according to policies that are easier to
indicate on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

ctinfo understands the following parameters:

dscp dscpmask[/statemask]

dscpmask - a 32 bit mask of at least 6 contiguous bits and indicates
where ctinfo will find the DSCP bits stored in the conntrack mark.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by dscpmask.  This represents a conditional operation flag
whereby the DSCP is only restored if the flag is set.  This is useful to
implement a 'one shot' iptables based classification where the
'complicated' iptables rules are only run once to classify the
connection on initial (egress) packet and subsequent packets are all
marked/restored with the same DSCP.  A mask of zero disables the
conditional behaviour ie. the conntrack mark DSCP bits are always
restored to the ip diffserv field (assuming the conntrack entry is found
& the skb is an ipv4/ipv6 type)

mark [mask]

mark - enables copying the conntrack connmark value to the skb mark

mask - a 32 bit mask applied to the mark to mask out bit unwanted for
restoration.  The CAKE qdisc for example understands both DSCP and 'tin'
classification stored the mark, thus act_ctinfo may be used to restore
both aspects of classification for CAKE in one action.  A default mask
of 0xffffffff is applied if not specified.

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>)

e.g. dscp 0xfc000000/0x01000000

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

e.g. mark 0x00ffffff

|----0x00----conntrack mark----ffffff---|
| Bits 31-24 |                          |
| DSCP & flag|                          |
|---------------------------------------|
			|
			|
			v
|------------skb mark-------------------|
|                                       |
|                                       |
|---------------------------------------|

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
v2 - add equivalent connmark functionality with an enhancement
     to accept a mask
     pass statistics for each sub-function as individual netlink
     attributes and stop (ab)using overlimits, drops
     update the testing config correctly
v3 - fix a licensing silly & tidy up GPL boilerplate

 include/net/tc_act/tc_ctinfo.h            |  28 ++
 include/uapi/linux/pkt_cls.h              |   1 +
 include/uapi/linux/tc_act/tc_ctinfo.h     |  43 +++
 net/sched/Kconfig                         |  17 +
 net/sched/Makefile                        |   1 +
 net/sched/act_ctinfo.c                    | 402 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 7 files changed, 493 insertions(+)
 create mode 100644 include/net/tc_act/tc_ctinfo.h
 create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
 create mode 100644 net/sched/act_ctinfo.c

diff --git a/include/net/tc_act/tc_ctinfo.h b/include/net/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..87334120dcb6
--- /dev/null
+++ b/include/net/tc_act/tc_ctinfo.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CTINFO_H
+#define __NET_TC_CTINFO_H
+
+#include <net/act_api.h>
+
+struct tcf_ctinfo_params {
+	struct net *net;
+	u32 dscpmask;
+	u32 dscpstatemask;
+	u32 markmask;
+	u16 zone;
+	u8 mode;
+	u8 dscpmaskshift;
+	struct rcu_head rcu;
+};
+
+struct tcf_ctinfo {
+	struct tc_action common;
+	struct tcf_ctinfo_params __rcu *params;
+	u64 stats_dscp_set;
+	u64 stats_dscp_error;
+	u64 stats_mark_set;
+};
+
+#define to_ctinfo(a) ((struct tcf_ctinfo *)a)
+
+#endif /* __NET_TC_CTINFO_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51a0496f78ea..a93680fc4bfa 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -105,6 +105,7 @@ enum tca_id {
 	TCA_ID_IFE = TCA_ACT_IFE,
 	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
 	/* other actions go here */
+	TCA_ID_CTINFO,
 	__TCA_ID_MAX = 255
 };
 
diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h b/include/uapi/linux/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..8d254b82151c
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_ctinfo.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CTINFO_H
+#define __UAPI_TC_CTINFO_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+struct tc_ctinfo {
+	tc_gen;
+};
+
+struct tc_ctinfo_dscp {
+	__u32 mask;
+	__u32 statemask;
+};
+
+struct tc_ctinfo_stats_dscp {
+	__u64 set;
+	__u64 error;
+};
+
+enum {
+	TCA_CTINFO_UNSPEC,
+	TCA_CTINFO_ACT,
+	TCA_CTINFO_ZONE,
+	TCA_CTINFO_DSCP_PARMS,
+	TCA_CTINFO_MARK_MASK,
+	TCA_CTINFO_MODE_DSCP,
+	TCA_CTINFO_MODE_MARK,
+	TCA_CTINFO_STATS_DSCP,
+	TCA_CTINFO_STATS_MARK,
+	TCA_CTINFO_TM,
+	TCA_CTINFO_PAD,
+	__TCA_CTINFO_MAX
+};
+#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
+
+enum {
+	CTINFO_MODE_SETDSCP	= BIT(0),
+	CTINFO_MODE_SETMARK	= BIT(1)
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 5c02ad97ef23..f5773effcfdc 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -876,6 +876,23 @@ config NET_ACT_CONNMARK
 	  To compile this code as a module, choose M here: the
 	  module will be called act_connmark.
 
+config NET_ACT_CTINFO
+        tristate "Netfilter Connection Mark Actions"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+        help
+	  Say Y here to allow transfer of a connmark stored information.
+	  Current actions transfer connmark stored DSCP into
+	  ipv4/v6 diffserv and/or to transfer connmark to packet
+	  mark.  Both are useful for restoring egress based marks
+	  back onto ingress connections for qdisc priority mapping
+	  purposes.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_ctinfo.
+
 config NET_ACT_SKBMOD
         tristate "skb data modification action"
         depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8a40431d7b5c..d54bfcbd7981 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
 obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
 obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
+obj-$(CONFIG_NET_ACT_CTINFO)	+= act_ctinfo.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
new file mode 100644
index 000000000000..261807b222ca
--- /dev/null
+++ b/net/sched/act_ctinfo.c
@@ -0,0 +1,402 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* net/sched/act_ctinfo.c  netfilter ctinfo connmark actions
+ *
+ * Copyright (c) 2019 Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <net/pkt_cls.h>
+#include <uapi/linux/tc_act/tc_ctinfo.h>
+#include <net/tc_act/tc_ctinfo.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+static unsigned int ctinfo_net_id;
+static struct tc_action_ops act_ctinfo_ops;
+
+static void tcf_ctinfo_dscp_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
+				struct tcf_ctinfo_params *cp,
+				struct sk_buff *skb, int wlen, int proto)
+{
+	u8 dscp, newdscp;
+
+	newdscp = (((ct->mark & cp->dscpmask) >> cp->dscpmaskshift) << 2) &
+		     ~INET_ECN_MASK;
+
+	switch (proto) {
+	case NFPROTO_IPV4:
+		dscp = ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (likely(!skb_try_make_writable(skb, wlen))) {
+				ipv4_change_dsfield(ip_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->stats_dscp_set++;
+			} else {
+				ca->stats_dscp_error++;
+			}
+		}
+		break;
+	case NFPROTO_IPV6:
+		dscp = ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (likely(!skb_try_make_writable(skb, wlen))) {
+				ipv6_change_dsfield(ipv6_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->stats_dscp_set++;
+			} else {
+				ca->stats_dscp_error++;
+			}
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+static void tcf_ctinfo_mark_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
+				struct tcf_ctinfo_params *cp,
+				struct sk_buff *skb)
+{
+	ca->stats_mark_set++;
+	skb->mark = ct->mark & cp->markmask;
+}
+
+static int tcf_ctinfo_act(struct sk_buff *skb, const struct tc_action *a,
+			  struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash = NULL;
+	struct nf_conntrack_tuple tuple;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_ctinfo *ca = to_ctinfo(a);
+	struct tcf_ctinfo_params *cp;
+	struct nf_conntrack_zone zone;
+	struct nf_conn *ct;
+	int proto, wlen;
+	int action;
+
+	cp = rcu_dereference_bh(ca->params);
+
+	tcf_lastuse_update(&ca->tcf_tm);
+	bstats_update(&ca->tcf_bstats, skb);
+	action = READ_ONCE(ca->tcf_action);
+
+	wlen = skb_network_offset(skb);
+	if (tc_skb_protocol(skb) == htons(ETH_P_IP)) {
+		wlen += sizeof(struct iphdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (tc_skb_protocol(skb) == htons(ETH_P_IPV6)) {
+		wlen += sizeof(struct ipv6hdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct) { /* look harder, usually ingress */
+		if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+				       proto, cp->net, &tuple))
+			goto out;
+		zone.id = cp->zone;
+		zone.dir = NF_CT_DEFAULT_ZONE_DIR;
+
+		thash = nf_conntrack_find_get(cp->net, &zone, &tuple);
+		if (!thash)
+			goto out;
+
+		ct = nf_ct_tuplehash_to_ctrack(thash);
+	}
+
+	if (cp->mode & CTINFO_MODE_SETDSCP)
+		if (!cp->dscpstatemask || (ct->mark & cp->dscpstatemask))
+			tcf_ctinfo_dscp_set(ct, ca, cp, skb, wlen, proto);
+
+	if (cp->mode & CTINFO_MODE_SETMARK)
+		tcf_ctinfo_mark_set(ct, ca, cp, skb);
+
+	if (thash)
+		nf_ct_put(ct);
+out:
+	return action;
+}
+
+static const struct nla_policy ctinfo_policy[TCA_CTINFO_MAX + 1] = {
+	[TCA_CTINFO_ACT] = { .len = sizeof(struct tc_ctinfo) },
+	[TCA_CTINFO_ZONE] = { .type = NLA_U16 },
+	[TCA_CTINFO_MODE_DSCP] = { .type = NLA_FLAG },
+	[TCA_CTINFO_MODE_MARK] = { .type = NLA_FLAG },
+	[TCA_CTINFO_DSCP_PARMS] = { .len = sizeof(struct tc_ctinfo_dscp) },
+};
+
+static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
+			   struct nlattr *est, struct tc_action **a,
+			   int ovr, int bind, bool rtnl_held,
+			   struct tcf_proto *tp,
+			   struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+	struct tcf_ctinfo_params *cp_new;
+	struct nlattr *tb[TCA_CTINFO_MAX + 1];
+	struct tcf_chain *goto_ch = NULL;
+	struct tcf_ctinfo *ci;
+	struct tc_ctinfo *actparm;
+	struct tc_ctinfo_dscp *dscpparm;
+	int ret = 0, err, i;
+
+	if (!nla)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CTINFO_MAX, nla, ctinfo_policy, NULL);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_CTINFO_ACT])
+		return -EINVAL;
+
+	if (tb[TCA_CTINFO_MODE_DSCP] && !tb[TCA_CTINFO_DSCP_PARMS])
+		return -EINVAL;
+
+	actparm = nla_data(tb[TCA_CTINFO_ACT]);
+	dscpparm = nla_data(tb[TCA_CTINFO_DSCP_PARMS]);
+
+	if (dscpparm) {
+		/* need at least contiguous 6 bit mask */
+		i = dscpparm->mask ? __ffs(dscpparm->mask) : 0;
+		if ((0x3f & (dscpparm->mask >> i)) != 0x3f)
+			return -EINVAL;
+		/* mask & statemask must not overlap */
+		if (dscpparm->mask & dscpparm->statemask)
+			return -EINVAL;
+	}
+
+	/* done the validation:now to the actual action allocation */
+	err = tcf_idr_check_alloc(tn, &actparm->index, a, bind);
+	if (!err) {
+		ret = tcf_idr_create(tn, actparm->index, est, a,
+				     &act_ctinfo_ops, bind, false);
+		if (ret) {
+			tcf_idr_cleanup(tn, actparm->index);
+			return ret;
+		}
+	} else if (err > 0) {
+		if (bind) /* don't override defaults */
+			return 0;
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
+			return -EEXIST;
+		}
+	} else {
+		return err;
+	}
+
+	err = tcf_action_check_ctrlact(actparm->action, tp, &goto_ch, extack);
+	if (err < 0)
+		goto release_idr;
+
+	ci = to_ctinfo(*a);
+
+	cp_new = kzalloc(sizeof(*cp_new), GFP_KERNEL);
+	if (unlikely(!cp_new)) {
+		err = -ENOMEM;
+		goto put_chain;
+	}
+
+	cp_new->net = net;
+	cp_new->zone = tb[TCA_CTINFO_ZONE] ?
+			nla_get_u16(tb[TCA_CTINFO_ZONE]) : 0;
+	if (dscpparm) {
+		cp_new->dscpmask = dscpparm->mask;
+		cp_new->dscpmaskshift = cp_new->dscpmask ?
+				__ffs(cp_new->dscpmask) : 0;
+		cp_new->dscpstatemask = dscpparm->statemask;
+	}
+	cp_new->markmask = tb[TCA_CTINFO_MARK_MASK] ?
+			nla_get_u32(tb[TCA_CTINFO_MARK_MASK]) : ~0;
+
+	if (tb[TCA_CTINFO_MODE_DSCP])
+		cp_new->mode |= CTINFO_MODE_SETDSCP;
+	else
+		cp_new->mode &= ~CTINFO_MODE_SETDSCP;
+
+	if (tb[TCA_CTINFO_MODE_MARK])
+		cp_new->mode |= CTINFO_MODE_SETMARK;
+	else
+		cp_new->mode &= ~CTINFO_MODE_SETMARK;
+
+	spin_lock_bh(&ci->tcf_lock);
+	goto_ch = tcf_action_set_ctrlact(*a, actparm->action, goto_ch);
+	rcu_swap_protected(ci->params, cp_new,
+			   lockdep_is_held(&ci->tcf_lock));
+	spin_unlock_bh(&ci->tcf_lock);
+
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+	if (cp_new)
+		kfree_rcu(cp_new, rcu);
+
+	if (ret == ACT_P_CREATED)
+		tcf_idr_insert(tn, *a);
+
+	return ret;
+
+put_chain:
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+release_idr:
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static inline int tcf_ctinfo_dump(struct sk_buff *skb, struct tc_action *a,
+				  int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_ctinfo *ci = to_ctinfo(a);
+	struct tcf_ctinfo_params *cp;
+	struct tc_ctinfo opt = {
+		.index   = ci->tcf_index,
+		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+	struct tc_ctinfo_dscp dscpparm;
+	struct tc_ctinfo_stats_dscp dscpstats;
+
+	spin_lock_bh(&ci->tcf_lock);
+	cp = rcu_dereference_protected(ci->params,
+				       lockdep_is_held(&ci->tcf_lock));
+	opt.action = ci->tcf_action;
+
+	if (nla_put(skb, TCA_CTINFO_ACT, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	if (cp->mode & CTINFO_MODE_SETDSCP) {
+		dscpparm.mask = cp->dscpmask;
+		dscpparm.statemask = cp->dscpstatemask;
+		if (nla_put(skb, TCA_CTINFO_DSCP_PARMS, sizeof(dscpparm),
+			    &dscpparm))
+			goto nla_put_failure;
+
+		if (nla_put_flag(skb, TCA_CTINFO_MODE_DSCP))
+			goto nla_put_failure;
+
+		dscpstats.set = ci->stats_dscp_set;
+		dscpstats.error = ci->stats_dscp_error;
+		if (nla_put(skb, TCA_CTINFO_STATS_DSCP, sizeof(dscpstats),
+			    &dscpstats))
+			goto nla_put_failure;
+	}
+
+	if (cp->mode & CTINFO_MODE_SETMARK) {
+		if (nla_put_u32(skb, TCA_CTINFO_MARK_MASK, cp->markmask))
+			goto nla_put_failure;
+
+		if (nla_put_flag(skb, TCA_CTINFO_MODE_MARK))
+			goto nla_put_failure;
+
+		if (nla_put_u64_64bit(skb, TCA_CTINFO_STATS_MARK,
+				      ci->stats_mark_set, TCA_CTINFO_PAD))
+			goto nla_put_failure;
+	}
+
+	if (cp->zone) {
+		if (nla_put_u16(skb, TCA_CTINFO_ZONE, cp->zone))
+			goto nla_put_failure;
+	}
+
+	tcf_tm_dump(&t, &ci->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CTINFO_TM, sizeof(t), &t, TCA_CTINFO_PAD))
+		goto nla_put_failure;
+
+	spin_unlock_bh(&ci->tcf_lock);
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&ci->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_ctinfo_walker(struct net *net, struct sk_buff *skb,
+			     struct netlink_callback *cb, int type,
+			     const struct tc_action_ops *ops,
+			     struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_ctinfo_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static struct tc_action_ops act_ctinfo_ops = {
+	.kind		=	"ctinfo",
+	.id		=	TCA_ID_CTINFO,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_ctinfo_act,
+	.dump		=	tcf_ctinfo_dump,
+	.init		=	tcf_ctinfo_init,
+	.walk		=	tcf_ctinfo_walker,
+	.lookup		=	tcf_ctinfo_search,
+	.size		=	sizeof(struct tcf_ctinfo),
+};
+
+static __net_init int ctinfo_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tc_action_net_init(tn, &act_ctinfo_ops);
+}
+
+static void __net_exit ctinfo_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, ctinfo_net_id);
+}
+
+static struct pernet_operations ctinfo_net_ops = {
+	.init = ctinfo_init_net,
+	.exit_batch = ctinfo_exit_net,
+	.id   = &ctinfo_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init ctinfo_init_module(void)
+{
+	return tcf_register_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+static void __exit ctinfo_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+module_init(ctinfo_init_module);
+module_exit(ctinfo_cleanup_module);
+MODULE_AUTHOR("Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>");
+MODULE_DESCRIPTION("Connection tracking mark actions");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
index 203302065458..b235efd55367 100644
--- a/tools/testing/selftests/tc-testing/config
+++ b/tools/testing/selftests/tc-testing/config
@@ -38,6 +38,7 @@ CONFIG_NET_ACT_CSUM=m
 CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_ACT_BPF=m
 CONFIG_NET_ACT_CONNMARK=m
+CONFIG_NET_ACT_CTINFO=m
 CONFIG_NET_ACT_SKBMOD=m
 CONFIG_NET_ACT_IFE=m
 CONFIG_NET_ACT_TUNNEL_KEY=m
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next v3] net: sched: Introduce act_ctinfo action
  2019-05-05 13:20     ` [net-next v3] " Kevin 'ldir' Darbyshire-Bryant
@ 2019-05-07 19:39       ` David Miller
  2019-05-07 20:12         ` [PATCH net-next v4] " Kevin 'ldir' Darbyshire-Bryant
  2019-05-07 20:18         ` [net-next v3] " Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 2 replies; 13+ messages in thread
From: David Miller @ 2019-05-07 19:39 UTC (permalink / raw)
  To: ldir
  Cc: jhs, jiri, linux-kernel, linux-kselftest, netdev, shuah, xiyou.wangcong

From: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Date: Sun, 5 May 2019 13:20:13 +0000

> ctinfo is a new tc filter action module.  It is designed to restore
> information contained in conntrack marks to other places.  At present it
> can restore DSCP values to IPv4/6 diffserv fields and also copy
> conntrack marks to skb marks.  As such the 2nd function effectively
> replaces the existing act_connmark module

This needs more time for review and therefore I'm deferring this to the
next merge window.

Also:

> +static int tcf_ctinfo_act(struct sk_buff *skb, const struct tc_action *a,
> +			  struct tcf_result *res)
> +{
> +	const struct nf_conntrack_tuple_hash *thash = NULL;
> +	struct nf_conntrack_tuple tuple;
> +	enum ip_conntrack_info ctinfo;
> +	struct tcf_ctinfo *ca = to_ctinfo(a);
> +	struct tcf_ctinfo_params *cp;
> +	struct nf_conntrack_zone zone;
> +	struct nf_conn *ct;
> +	int proto, wlen;
> +	int action;

Reverse christmas tree for these local variables please.

> +static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
> +			   struct nlattr *est, struct tc_action **a,
> +			   int ovr, int bind, bool rtnl_held,
> +			   struct tcf_proto *tp,
> +			   struct netlink_ext_ack *extack)
> +{
> +	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
> +	struct tcf_ctinfo_params *cp_new;
> +	struct nlattr *tb[TCA_CTINFO_MAX + 1];
> +	struct tcf_chain *goto_ch = NULL;
> +	struct tcf_ctinfo *ci;
> +	struct tc_ctinfo *actparm;
> +	struct tc_ctinfo_dscp *dscpparm;
> +	int ret = 0, err, i;

Likewise.

> +static inline int tcf_ctinfo_dump(struct sk_buff *skb, struct tc_action *a,
> +				  int bind, int ref)
> +{
> +	unsigned char *b = skb_tail_pointer(skb);
> +	struct tcf_ctinfo *ci = to_ctinfo(a);
> +	struct tcf_ctinfo_params *cp;
> +	struct tc_ctinfo opt = {
> +		.index   = ci->tcf_index,
> +		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
> +	};
> +	struct tcf_t t;
> +	struct tc_ctinfo_dscp dscpparm;
> +	struct tc_ctinfo_stats_dscp dscpstats;

Likewise.

Also, never use the inline keyword in foo.c files, always let the compiler
decide.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v4] net: sched: Introduce act_ctinfo action
  2019-05-07 19:39       ` David Miller
@ 2019-05-07 20:12         ` Kevin 'ldir' Darbyshire-Bryant
  2019-05-08  0:20           ` David Miller
  2019-05-07 20:18         ` [net-next v3] " Kevin 'ldir' Darbyshire-Bryant
  1 sibling, 1 reply; 13+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-05-07 20:12 UTC (permalink / raw)
  To: davem
  Cc: jhs, jiri, Kevin 'ldir' Darbyshire-Bryant, linux-kernel,
	linux-kselftest, netdev, shuah, xiyou.wangcong

ctinfo is a new tc filter action module.  It is designed to restore
information contained in conntrack marks to other places.  At present it
can restore DSCP values to IPv4/6 diffserv fields and also copy
conntrack marks to skb marks.  As such the 2nd function effectively
replaces the existing act_connmark module

The DSCP restoration is intended for use and has been found useful for
restoring ingress classifications based on egress classifications across
links that bleach or otherwise change DSCP, typically home ISP Internet
links.  Restoring DSCP on ingress on the WAN link allows qdiscs such as
CAKE to shape inbound packets according to policies that are easier to
indicate on egress.

Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway.

ctinfo understands the following parameters:

dscp dscpmask[/statemask]

dscpmask - a 32 bit mask of at least 6 contiguous bits and indicates
where ctinfo will find the DSCP bits stored in the conntrack mark.

statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by dscpmask.  This represents a conditional operation flag
whereby the DSCP is only restored if the flag is set.  This is useful to
implement a 'one shot' iptables based classification where the
'complicated' iptables rules are only run once to classify the
connection on initial (egress) packet and subsequent packets are all
marked/restored with the same DSCP.  A mask of zero disables the
conditional behaviour ie. the conntrack mark DSCP bits are always
restored to the ip diffserv field (assuming the conntrack entry is found
& the skb is an ipv4/ipv6 type)

mark [mask]

mark - enables copying the conntrack connmark value to the skb mark

mask - a 32 bit mask applied to the mark to mask out bit unwanted for
restoration.  The CAKE qdisc for example understands both DSCP and 'tin'
classification stored the mark, thus act_ctinfo may be used to restore
both aspects of classification for CAKE in one action.  A default mask
of 0xffffffff is applied if not specified.

zone - conntrack zone

control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>)

e.g. dscp 0xfc000000/0x01000000

|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP       | unused | flag  |unused   |
|-----------------------0x01---000000---|
      |                   |
      |                   |
      ---|             Conditional flag
         v             only restore if set
|-ip diffserv-|
| 6 bits      |
|-------------|

e.g. mark 0x00ffffff

|----0x00----conntrack mark----ffffff---|
| Bits 31-24 |                          |
| DSCP & flag|                          |
|---------------------------------------|
			|
			|
			v
|------------skb mark-------------------|
|                                       |
|                                       |
|---------------------------------------|

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
v2 - add equivalent connmark functionality with an enhancement
     to accept a mask
     pass statistics for each sub-function as individual netlink
     attributes and stop (ab)using overlimits, drops
     update the testing config correctly
v3 - fix a licensing silly & tidy up GPL boilerplate
v4 - drop stray copy paste inline
     reverse christmas tree local vars

 include/net/tc_act/tc_ctinfo.h            |  28 ++
 include/uapi/linux/pkt_cls.h              |   1 +
 include/uapi/linux/tc_act/tc_ctinfo.h     |  43 +++
 net/sched/Kconfig                         |  17 +
 net/sched/Makefile                        |   1 +
 net/sched/act_ctinfo.c                    | 402 ++++++++++++++++++++++
 tools/testing/selftests/tc-testing/config |   1 +
 7 files changed, 493 insertions(+)
 create mode 100644 include/net/tc_act/tc_ctinfo.h
 create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
 create mode 100644 net/sched/act_ctinfo.c

diff --git a/include/net/tc_act/tc_ctinfo.h b/include/net/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..87334120dcb6
--- /dev/null
+++ b/include/net/tc_act/tc_ctinfo.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CTINFO_H
+#define __NET_TC_CTINFO_H
+
+#include <net/act_api.h>
+
+struct tcf_ctinfo_params {
+	struct net *net;
+	u32 dscpmask;
+	u32 dscpstatemask;
+	u32 markmask;
+	u16 zone;
+	u8 mode;
+	u8 dscpmaskshift;
+	struct rcu_head rcu;
+};
+
+struct tcf_ctinfo {
+	struct tc_action common;
+	struct tcf_ctinfo_params __rcu *params;
+	u64 stats_dscp_set;
+	u64 stats_dscp_error;
+	u64 stats_mark_set;
+};
+
+#define to_ctinfo(a) ((struct tcf_ctinfo *)a)
+
+#endif /* __NET_TC_CTINFO_H */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51a0496f78ea..a93680fc4bfa 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -105,6 +105,7 @@ enum tca_id {
 	TCA_ID_IFE = TCA_ACT_IFE,
 	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
 	/* other actions go here */
+	TCA_ID_CTINFO,
 	__TCA_ID_MAX = 255
 };
 
diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h b/include/uapi/linux/tc_act/tc_ctinfo.h
new file mode 100644
index 000000000000..8d254b82151c
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_ctinfo.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __UAPI_TC_CTINFO_H
+#define __UAPI_TC_CTINFO_H
+
+#include <linux/types.h>
+#include <linux/pkt_cls.h>
+
+struct tc_ctinfo {
+	tc_gen;
+};
+
+struct tc_ctinfo_dscp {
+	__u32 mask;
+	__u32 statemask;
+};
+
+struct tc_ctinfo_stats_dscp {
+	__u64 set;
+	__u64 error;
+};
+
+enum {
+	TCA_CTINFO_UNSPEC,
+	TCA_CTINFO_ACT,
+	TCA_CTINFO_ZONE,
+	TCA_CTINFO_DSCP_PARMS,
+	TCA_CTINFO_MARK_MASK,
+	TCA_CTINFO_MODE_DSCP,
+	TCA_CTINFO_MODE_MARK,
+	TCA_CTINFO_STATS_DSCP,
+	TCA_CTINFO_STATS_MARK,
+	TCA_CTINFO_TM,
+	TCA_CTINFO_PAD,
+	__TCA_CTINFO_MAX
+};
+#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
+
+enum {
+	CTINFO_MODE_SETDSCP	= BIT(0),
+	CTINFO_MODE_SETMARK	= BIT(1)
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 5c02ad97ef23..f5773effcfdc 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -876,6 +876,23 @@ config NET_ACT_CONNMARK
 	  To compile this code as a module, choose M here: the
 	  module will be called act_connmark.
 
+config NET_ACT_CTINFO
+        tristate "Netfilter Connection Mark Actions"
+        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
+        depends on NF_CONNTRACK && NF_CONNTRACK_MARK
+        help
+	  Say Y here to allow transfer of a connmark stored information.
+	  Current actions transfer connmark stored DSCP into
+	  ipv4/v6 diffserv and/or to transfer connmark to packet
+	  mark.  Both are useful for restoring egress based marks
+	  back onto ingress connections for qdisc priority mapping
+	  purposes.
+
+	  If unsure, say N.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called act_ctinfo.
+
 config NET_ACT_SKBMOD
         tristate "skb data modification action"
         depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8a40431d7b5c..d54bfcbd7981 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
 obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
 obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
 obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
+obj-$(CONFIG_NET_ACT_CTINFO)	+= act_ctinfo.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
new file mode 100644
index 000000000000..93f98d62a962
--- /dev/null
+++ b/net/sched/act_ctinfo.c
@@ -0,0 +1,402 @@
+// SPDX-License-Identifier: GPL-2.0+
+/* net/sched/act_ctinfo.c  netfilter ctinfo connmark actions
+ *
+ * Copyright (c) 2019 Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/pkt_cls.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/act_api.h>
+#include <net/pkt_cls.h>
+#include <uapi/linux/tc_act/tc_ctinfo.h>
+#include <net/tc_act/tc_ctinfo.h>
+
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_ecache.h>
+#include <net/netfilter/nf_conntrack_zones.h>
+
+static unsigned int ctinfo_net_id;
+static struct tc_action_ops act_ctinfo_ops;
+
+static void tcf_ctinfo_dscp_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
+				struct tcf_ctinfo_params *cp,
+				struct sk_buff *skb, int wlen, int proto)
+{
+	u8 dscp, newdscp;
+
+	newdscp = (((ct->mark & cp->dscpmask) >> cp->dscpmaskshift) << 2) &
+		     ~INET_ECN_MASK;
+
+	switch (proto) {
+	case NFPROTO_IPV4:
+		dscp = ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (likely(!skb_try_make_writable(skb, wlen))) {
+				ipv4_change_dsfield(ip_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->stats_dscp_set++;
+			} else {
+				ca->stats_dscp_error++;
+			}
+		}
+		break;
+	case NFPROTO_IPV6:
+		dscp = ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK;
+		if (dscp != newdscp) {
+			if (likely(!skb_try_make_writable(skb, wlen))) {
+				ipv6_change_dsfield(ipv6_hdr(skb),
+						    INET_ECN_MASK,
+						    newdscp);
+				ca->stats_dscp_set++;
+			} else {
+				ca->stats_dscp_error++;
+			}
+		}
+		break;
+	default:
+		break;
+	}
+}
+
+static void tcf_ctinfo_mark_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
+				struct tcf_ctinfo_params *cp,
+				struct sk_buff *skb)
+{
+	ca->stats_mark_set++;
+	skb->mark = ct->mark & cp->markmask;
+}
+
+static int tcf_ctinfo_act(struct sk_buff *skb, const struct tc_action *a,
+			  struct tcf_result *res)
+{
+	const struct nf_conntrack_tuple_hash *thash = NULL;
+	struct tcf_ctinfo *ca = to_ctinfo(a);
+	struct nf_conntrack_tuple tuple;
+	struct nf_conntrack_zone zone;
+	enum ip_conntrack_info ctinfo;
+	struct tcf_ctinfo_params *cp;
+	struct nf_conn *ct;
+	int proto, wlen;
+	int action;
+
+	cp = rcu_dereference_bh(ca->params);
+
+	tcf_lastuse_update(&ca->tcf_tm);
+	bstats_update(&ca->tcf_bstats, skb);
+	action = READ_ONCE(ca->tcf_action);
+
+	wlen = skb_network_offset(skb);
+	if (tc_skb_protocol(skb) == htons(ETH_P_IP)) {
+		wlen += sizeof(struct iphdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV4;
+	} else if (tc_skb_protocol(skb) == htons(ETH_P_IPV6)) {
+		wlen += sizeof(struct ipv6hdr);
+		if (!pskb_may_pull(skb, wlen))
+			goto out;
+
+		proto = NFPROTO_IPV6;
+	} else {
+		goto out;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct) { /* look harder, usually ingress */
+		if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
+				       proto, cp->net, &tuple))
+			goto out;
+		zone.id = cp->zone;
+		zone.dir = NF_CT_DEFAULT_ZONE_DIR;
+
+		thash = nf_conntrack_find_get(cp->net, &zone, &tuple);
+		if (!thash)
+			goto out;
+
+		ct = nf_ct_tuplehash_to_ctrack(thash);
+	}
+
+	if (cp->mode & CTINFO_MODE_SETDSCP)
+		if (!cp->dscpstatemask || (ct->mark & cp->dscpstatemask))
+			tcf_ctinfo_dscp_set(ct, ca, cp, skb, wlen, proto);
+
+	if (cp->mode & CTINFO_MODE_SETMARK)
+		tcf_ctinfo_mark_set(ct, ca, cp, skb);
+
+	if (thash)
+		nf_ct_put(ct);
+out:
+	return action;
+}
+
+static const struct nla_policy ctinfo_policy[TCA_CTINFO_MAX + 1] = {
+	[TCA_CTINFO_ACT] = { .len = sizeof(struct tc_ctinfo) },
+	[TCA_CTINFO_ZONE] = { .type = NLA_U16 },
+	[TCA_CTINFO_MODE_DSCP] = { .type = NLA_FLAG },
+	[TCA_CTINFO_MODE_MARK] = { .type = NLA_FLAG },
+	[TCA_CTINFO_DSCP_PARMS] = { .len = sizeof(struct tc_ctinfo_dscp) },
+};
+
+static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
+			   struct nlattr *est, struct tc_action **a,
+			   int ovr, int bind, bool rtnl_held,
+			   struct tcf_proto *tp,
+			   struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+	struct nlattr *tb[TCA_CTINFO_MAX + 1];
+	struct tcf_ctinfo_params *cp_new;
+	struct tcf_chain *goto_ch = NULL;
+	struct tc_ctinfo_dscp *dscpparm;
+	struct tcf_ctinfo *ci;
+	struct tc_ctinfo *actparm;
+	int ret = 0, err, i;
+
+	if (!nla)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CTINFO_MAX, nla, ctinfo_policy, NULL);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_CTINFO_ACT])
+		return -EINVAL;
+
+	if (tb[TCA_CTINFO_MODE_DSCP] && !tb[TCA_CTINFO_DSCP_PARMS])
+		return -EINVAL;
+
+	actparm = nla_data(tb[TCA_CTINFO_ACT]);
+	dscpparm = nla_data(tb[TCA_CTINFO_DSCP_PARMS]);
+
+	if (dscpparm) {
+		/* need at least contiguous 6 bit mask */
+		i = dscpparm->mask ? __ffs(dscpparm->mask) : 0;
+		if ((0x3f & (dscpparm->mask >> i)) != 0x3f)
+			return -EINVAL;
+		/* mask & statemask must not overlap */
+		if (dscpparm->mask & dscpparm->statemask)
+			return -EINVAL;
+	}
+
+	/* done the validation:now to the actual action allocation */
+	err = tcf_idr_check_alloc(tn, &actparm->index, a, bind);
+	if (!err) {
+		ret = tcf_idr_create(tn, actparm->index, est, a,
+				     &act_ctinfo_ops, bind, false);
+		if (ret) {
+			tcf_idr_cleanup(tn, actparm->index);
+			return ret;
+		}
+	} else if (err > 0) {
+		if (bind) /* don't override defaults */
+			return 0;
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
+			return -EEXIST;
+		}
+	} else {
+		return err;
+	}
+
+	err = tcf_action_check_ctrlact(actparm->action, tp, &goto_ch, extack);
+	if (err < 0)
+		goto release_idr;
+
+	ci = to_ctinfo(*a);
+
+	cp_new = kzalloc(sizeof(*cp_new), GFP_KERNEL);
+	if (unlikely(!cp_new)) {
+		err = -ENOMEM;
+		goto put_chain;
+	}
+
+	cp_new->net = net;
+	cp_new->zone = tb[TCA_CTINFO_ZONE] ?
+			nla_get_u16(tb[TCA_CTINFO_ZONE]) : 0;
+	if (dscpparm) {
+		cp_new->dscpmask = dscpparm->mask;
+		cp_new->dscpmaskshift = cp_new->dscpmask ?
+				__ffs(cp_new->dscpmask) : 0;
+		cp_new->dscpstatemask = dscpparm->statemask;
+	}
+	cp_new->markmask = tb[TCA_CTINFO_MARK_MASK] ?
+			nla_get_u32(tb[TCA_CTINFO_MARK_MASK]) : ~0;
+
+	if (tb[TCA_CTINFO_MODE_DSCP])
+		cp_new->mode |= CTINFO_MODE_SETDSCP;
+	else
+		cp_new->mode &= ~CTINFO_MODE_SETDSCP;
+
+	if (tb[TCA_CTINFO_MODE_MARK])
+		cp_new->mode |= CTINFO_MODE_SETMARK;
+	else
+		cp_new->mode &= ~CTINFO_MODE_SETMARK;
+
+	spin_lock_bh(&ci->tcf_lock);
+	goto_ch = tcf_action_set_ctrlact(*a, actparm->action, goto_ch);
+	rcu_swap_protected(ci->params, cp_new,
+			   lockdep_is_held(&ci->tcf_lock));
+	spin_unlock_bh(&ci->tcf_lock);
+
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+	if (cp_new)
+		kfree_rcu(cp_new, rcu);
+
+	if (ret == ACT_P_CREATED)
+		tcf_idr_insert(tn, *a);
+
+	return ret;
+
+put_chain:
+	if (goto_ch)
+		tcf_chain_put_by_act(goto_ch);
+release_idr:
+	tcf_idr_release(*a, bind);
+	return err;
+}
+
+static int tcf_ctinfo_dump(struct sk_buff *skb, struct tc_action *a,
+			   int bind, int ref)
+{
+	struct tcf_ctinfo *ci = to_ctinfo(a);
+	struct tc_ctinfo opt = {
+		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
+		.index   = ci->tcf_index,
+	};
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tc_ctinfo_stats_dscp dscpstats;
+	struct tc_ctinfo_dscp dscpparm;
+	struct tcf_ctinfo_params *cp;
+	struct tcf_t t;
+
+	spin_lock_bh(&ci->tcf_lock);
+	cp = rcu_dereference_protected(ci->params,
+				       lockdep_is_held(&ci->tcf_lock));
+	opt.action = ci->tcf_action;
+
+	if (nla_put(skb, TCA_CTINFO_ACT, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	if (cp->mode & CTINFO_MODE_SETDSCP) {
+		dscpparm.mask = cp->dscpmask;
+		dscpparm.statemask = cp->dscpstatemask;
+		if (nla_put(skb, TCA_CTINFO_DSCP_PARMS, sizeof(dscpparm),
+			    &dscpparm))
+			goto nla_put_failure;
+
+		if (nla_put_flag(skb, TCA_CTINFO_MODE_DSCP))
+			goto nla_put_failure;
+
+		dscpstats.set = ci->stats_dscp_set;
+		dscpstats.error = ci->stats_dscp_error;
+		if (nla_put(skb, TCA_CTINFO_STATS_DSCP, sizeof(dscpstats),
+			    &dscpstats))
+			goto nla_put_failure;
+	}
+
+	if (cp->mode & CTINFO_MODE_SETMARK) {
+		if (nla_put_u32(skb, TCA_CTINFO_MARK_MASK, cp->markmask))
+			goto nla_put_failure;
+
+		if (nla_put_flag(skb, TCA_CTINFO_MODE_MARK))
+			goto nla_put_failure;
+
+		if (nla_put_u64_64bit(skb, TCA_CTINFO_STATS_MARK,
+				      ci->stats_mark_set, TCA_CTINFO_PAD))
+			goto nla_put_failure;
+	}
+
+	if (cp->zone) {
+		if (nla_put_u16(skb, TCA_CTINFO_ZONE, cp->zone))
+			goto nla_put_failure;
+	}
+
+	tcf_tm_dump(&t, &ci->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CTINFO_TM, sizeof(t), &t, TCA_CTINFO_PAD))
+		goto nla_put_failure;
+
+	spin_unlock_bh(&ci->tcf_lock);
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&ci->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_ctinfo_walker(struct net *net, struct sk_buff *skb,
+			     struct netlink_callback *cb, int type,
+			     const struct tc_action_ops *ops,
+			     struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_ctinfo_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static struct tc_action_ops act_ctinfo_ops = {
+	.kind		=	"ctinfo",
+	.id		=	TCA_ID_CTINFO,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_ctinfo_act,
+	.dump		=	tcf_ctinfo_dump,
+	.init		=	tcf_ctinfo_init,
+	.walk		=	tcf_ctinfo_walker,
+	.lookup		=	tcf_ctinfo_search,
+	.size		=	sizeof(struct tcf_ctinfo),
+};
+
+static __net_init int ctinfo_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
+
+	return tc_action_net_init(tn, &act_ctinfo_ops);
+}
+
+static void __net_exit ctinfo_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, ctinfo_net_id);
+}
+
+static struct pernet_operations ctinfo_net_ops = {
+	.init = ctinfo_init_net,
+	.exit_batch = ctinfo_exit_net,
+	.id   = &ctinfo_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+static int __init ctinfo_init_module(void)
+{
+	return tcf_register_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+static void __exit ctinfo_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ctinfo_ops, &ctinfo_net_ops);
+}
+
+module_init(ctinfo_init_module);
+module_exit(ctinfo_cleanup_module);
+MODULE_AUTHOR("Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>");
+MODULE_DESCRIPTION("Connection tracking mark actions");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
index 203302065458..b235efd55367 100644
--- a/tools/testing/selftests/tc-testing/config
+++ b/tools/testing/selftests/tc-testing/config
@@ -38,6 +38,7 @@ CONFIG_NET_ACT_CSUM=m
 CONFIG_NET_ACT_VLAN=m
 CONFIG_NET_ACT_BPF=m
 CONFIG_NET_ACT_CONNMARK=m
+CONFIG_NET_ACT_CTINFO=m
 CONFIG_NET_ACT_SKBMOD=m
 CONFIG_NET_ACT_IFE=m
 CONFIG_NET_ACT_TUNNEL_KEY=m
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [net-next v3] net: sched: Introduce act_ctinfo action
  2019-05-07 19:39       ` David Miller
  2019-05-07 20:12         ` [PATCH net-next v4] " Kevin 'ldir' Darbyshire-Bryant
@ 2019-05-07 20:18         ` Kevin 'ldir' Darbyshire-Bryant
  1 sibling, 0 replies; 13+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-05-07 20:18 UTC (permalink / raw)
  To: David Miller
  Cc: Jamal Hadi Salim, jiri, linux-kernel, linux-kselftest, netdev,
	shuah, xiyou.wangcong



> On 7 May 2019, at 20:39, David Miller <davem@davemloft.net> wrote:
> 
> From: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
> Date: Sun, 5 May 2019 13:20:13 +0000
> 
>> ctinfo is a new tc filter action module.  It is designed to restore
>> information contained in conntrack marks to other places.  At present it
>> can restore DSCP values to IPv4/6 diffserv fields and also copy
>> conntrack marks to skb marks.  As such the 2nd function effectively
>> replaces the existing act_connmark module

Hi David,

> 
> This needs more time for review and therefore I'm deferring this to the
> next merge window.

Thank you.

> 
>> +static inline int tcf_ctinfo_dump(struct sk_buff *skb, struct tc_action *a,
>> +				  int bind, int ref)
>> +{
>> +	unsigned char *b = skb_tail_pointer(skb);
>> +	struct tcf_ctinfo *ci = to_ctinfo(a);
>> +	struct tcf_ctinfo_params *cp;
>> +	struct tc_ctinfo opt = {
>> +		.index   = ci->tcf_index,
>> +		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
>> +		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
>> +	};
>> +	struct tcf_t t;
>> +	struct tc_ctinfo_dscp dscpparm;
>> +	struct tc_ctinfo_stats_dscp dscpstats;
> 

All done, I’ve put struct tcf_ctinfo *ci = to_ctinfo(a); at the top though
as other variables need that initialised, hope that’s ok, else please explain
what I should have done.

> Likewise.
> 
> Also, never use the inline keyword in foo.c files, always let the compiler
> decide.

The perils of using existing code, should have spotted that thought, didn’t
inline anything else ‘cos in theory I know better.  Oh well.

Shame checkpatch didn’t warn me about christmas trees or inline otherwise it
wouldn’t have got this far.

Thanks for your time and patience.

Cheers,

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v4] net: sched: Introduce act_ctinfo action
  2019-05-07 20:12         ` [PATCH net-next v4] " Kevin 'ldir' Darbyshire-Bryant
@ 2019-05-08  0:20           ` David Miller
  2019-05-08 16:12             ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 1 reply; 13+ messages in thread
From: David Miller @ 2019-05-08  0:20 UTC (permalink / raw)
  To: ldir
  Cc: jhs, jiri, linux-kernel, linux-kselftest, netdev, shuah, xiyou.wangcong


The net-next tree is closed.

You will have to submit this again when the net-next tree is open
again.

Thank you.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v4] net: sched: Introduce act_ctinfo action
  2019-05-08  0:20           ` David Miller
@ 2019-05-08 16:12             ` Kevin 'ldir' Darbyshire-Bryant
  0 siblings, 0 replies; 13+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-05-08 16:12 UTC (permalink / raw)
  To: David Miller
  Cc: Jamal Hadi Salim, jiri, linux-kernel, linux-kselftest, netdev,
	shuah, xiyou.wangcong

Hi David,

> On 8 May 2019, at 01:20, David Miller <davem@davemloft.net> wrote:
> 
> 
> The net-next tree is closed.

Apologies, I didn’t understand what this meant in your prior message, having read https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt and seen http://vger.kernel.org/~davem/net-next.html I now do.

> 
> You will have to submit this again when the net-next tree is open
> again.

Catch you on the flip side, in 2-ish weeks time :-)

> 
> Thank you.

Thanks for your patience.

Kevin D-B

gpg: 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-05-08 16:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-27 13:08 [PATCH net-next] net: sched: Introduce act_ctinfo action Kevin 'ldir' Darbyshire-Bryant
2019-04-30 21:47 ` Cong Wang
2019-05-03 21:20   ` Kevin 'ldir' Darbyshire-Bryant
2019-05-05 10:15   ` [net-next v2] " Kevin 'ldir' Darbyshire-Bryant
2019-05-05 10:23     ` Greg KH
2019-05-05 10:32       ` Kevin 'ldir' Darbyshire-Bryant
2019-05-05 10:43         ` Toke Høiland-Jørgensen
2019-05-05 13:20     ` [net-next v3] " Kevin 'ldir' Darbyshire-Bryant
2019-05-07 19:39       ` David Miller
2019-05-07 20:12         ` [PATCH net-next v4] " Kevin 'ldir' Darbyshire-Bryant
2019-05-08  0:20           ` David Miller
2019-05-08 16:12             ` Kevin 'ldir' Darbyshire-Bryant
2019-05-07 20:18         ` [net-next v3] " Kevin 'ldir' Darbyshire-Bryant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).