LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@canonical.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
avagin@virtuozzo.com, ktkhai@virtuozzo.com, serge@hallyn.com,
gregkh@linuxfoundation.org
Subject: Re: [PATCH net-next 1/2 v2] netns: restrict uevents
Date: Thu, 26 Apr 2018 23:27:45 +0200 [thread overview]
Message-ID: <20180426212744.GA30270@gmail.com> (raw)
In-Reply-To: <878t99opvd.fsf@xmission.com>
On Thu, Apr 26, 2018 at 12:10:30PM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@canonical.com> writes:
>
> > On Thu, Apr 26, 2018 at 11:47:19AM -0500, Eric W. Biederman wrote:
> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >>
> >> > On Tue, Apr 24, 2018 at 06:00:35PM -0500, Eric W. Biederman wrote:
> >> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> >>
> >> >> > On Wed, Apr 25, 2018, 00:41 Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> >> >
> >> >> > Bah. This code is obviously correct and probably wrong.
> >> >> >
> >> >> > How do we deliver uevents for network devices that are outside of the
> >> >> > initial user namespace? The kernel still needs to deliver those.
> >> >> >
> >> >> > The logic to figure out which network namespace a device needs to be
> >> >> > delivered to is is present in kobj_bcast_filter. That logic will almost
> >> >> > certainly need to be turned inside out. Sign not as easy as I would
> >> >> > have hoped.
> >> >> >
> >> >> > My first patch that we discussed put additional filtering logic into kobj_bcast_filter for that very reason. But I can move that logic
> >> >> > out and come up with a new patch.
> >> >>
> >> >> I may have mis-understood.
> >> >>
> >> >> I heard and am still hearing additional filtering to reduce the places
> >> >> the packet is delievered.
> >> >>
> >> >> I am saying something needs to change to increase the number of places
> >> >> the packet is delivered.
> >> >>
> >> >> For the special class of devices that kobj_bcast_filter would apply to
> >> >> those need to be delivered to netowrk namespaces that are no longer on
> >> >> uevent_sock_list.
> >> >>
> >> >> So the code fundamentally needs to split into two paths. Ordinary
> >> >> devices that use uevent_sock_list. Network devices that are just
> >> >> delivered in their own network namespace.
> >> >>
> >> >> netlink_broadcast_filtered gets to go away completely.
> >> >
> >> > The split *might* make sense but I think you're wrong about removing the
> >> > kobj_bcast_filter. The current filter doesn't operate on the uevent
> >> > socket in uevent_sock_list itself it rather operates on the sockets in
> >> > mc_list. And if socket in mc_list can have a different network namespace
> >> > then the uevent_socket itself then your way won't work. That's why my
> >> > original patch added additional filtering in there. The way I see it we
> >> > need something like:
> >>
> >> We already filter the sockets in the mc_list by network namespace.
> >
> > Oh really? That's good to know. I haven't found where in the code this
> > actually happens. I thought that when netlink_bind() is called anyone
> > could register themselves in mc_list.
>
> The code in af_netlink.c does:
> > static void do_one_broadcast(struct sock *sk,
> > struct netlink_broadcast_data *p)
> > {
> > struct netlink_sock *nlk = nlk_sk(sk);
> > int val;
> >
> > if (p->exclude_sk == sk)
> > return;
> >
> > if (nlk->portid == p->portid || p->group - 1 >= nlk->ngroups ||
> > !test_bit(p->group - 1, nlk->groups))
> > return;
> >
> > if (!net_eq(sock_net(sk), p->net)) {
> ^^^^^^^^^^^^ Here
> > if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID))
> > return;
> ^^^^^^^^^^^ Here
> >
> > if (!peernet_has_id(sock_net(sk), p->net))
> > return;
> >
> > if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns,
> > CAP_NET_BROADCAST))
> > return;
> > }
>
> Which if you are not a magic NETLINK_F_LISTEN_ALL_NSID socket filters
> you out if you are the wrong network namespace.
>
>
> >> When a packet is transmitted with netlink_broadcast it is only
> >> transmitted within a single network namespace.
> >>
> >> Even in the case of a NETLINK_F_LISTEN_ALL_NSID socket the skb is tagged
> >> with it's source network namespace so no confusion will result, and the
> >> permission checks have been done to make it safe. So you can safely
> >> ignore that case. Please ignore that case. It only needs to be
> >> considered if refactoring af_netlink.c
> >>
> >> When I added netlink_broadcast_filtered I imagined that we would need
> >> code that worked across network namespaces that worked for different
> >> namespaces. So it looked like we would need the level of granularity
> >> that you can get with netlink_broadcast_filtered. It turns out we don't
> >> and that it was a case of over design. As the only split we care about
> >> is per network namespace there is no need for
> >> netlink_broadcast_filtered.
> >>
> >> > init_user_ns_broadcast_filtered(uevent_sock_list, kobj_bcast_filter);
> >> > user_ns_broadcast_filtered(uevent_sock_list,kobj_bcast_filter);
> >> >
> >> > The question that remains is whether we can rely on the network
> >> > namespace information we can gather from the kobject_ns_type_operations
> >> > to decide where we want to broadcast that event to. So something
> >> > *like*:
> >>
> >> We can. We already do. That is what kobj_bcast_filter implements.
> >>
> >> > ops = kobj_ns_ops(kobj);
> >> > if (!ops && kobj->kset) {
> >> > struct kobject *ksobj = &kobj->kset->kobj;
> >> > if (ksobj->parent != NULL)
> >> > ops = kobj_ns_ops(ksobj->parent);
> >> > }
> >> >
> >> > if (ops && ops->netlink_ns && kobj->ktype->namespace)
> >> > if (ops->type == KOBJ_NS_TYPE_NET)
> >> > net = kobj->ktype->namespace(kobj);
> >>
> >> Please note the only entry in the enumeration in the kobj_ns_type
> >> enumeration other than KOBJ_NS_TYPE_NONE is KOBJ_NS_TYPE_NET. So the
> >> check for ops->type in this case is redundant.
> >
> > Yes, I know the reason for doing it explicitly is to block the case
> > where kobjects get tagged with other namespaces. So we'd need to be
> > vigilant should that ever happen but fine.
>
> It is fine to keep the check.
>
> I was intending to point out that it is much more likely that we remove
> the enumeration and remove some of the extra abstraction, than another
> namespace is implemented there.
>
> >> That is something else that could be simplifed. At the time it was the
> >> necessary to get the sysfs changes merged.
> >>
> >> > if (!net || net->user_ns == &init_user_ns)
> >> > ret = init_user_ns_broadcast(env, action_string, devpath);
> >> > else
> >> > ret = user_ns_broadcast(net->uevent_sock->sk, env,
> >> > action_string, devpath);
> >>
> >> Almost.
> >>
> >> if (!net)
> >> kobject_uevent_net_broadcast(kobj, env, action_string,
> >> dev_path);
> >> else
> >> netlink_broadcast(net->uevent_sock->sk, skb, 0, 1, GFP_KERNEL);
> >>
> >>
> >> I am handwaving to get the skb in the netlink_broadcast case but that
> >> should be enough for you to see what I am thinking.
> >
> > I have added a helper alloc_uevent_skb() that can be used in both cases.
> >
> > static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
> > const char *action_string,
> > const char *devpath)
> > {
> > struct sk_buff *skb = NULL;
> > char *scratch;
> > size_t len;
> >
> > /* allocate message with maximum possible size */
> > len = strlen(action_string) + strlen(devpath) + 2;
> > skb = alloc_skb(len + env->buflen, GFP_KERNEL);
> > if (!skb)
> > return NULL;
> >
> > /* add header */
> > scratch = skb_put(skb, len);
> > sprintf(scratch, "%s@%s", action_string, devpath);
> >
> > skb_put_data(skb, env->buf, env->buflen);
> >
> > NETLINK_CB(skb).dst_group = 1;
> >
> > return skb;
> > }
> >
> >>
> >> My only concern with the above is that we almost certainly need to fix
> >> the credentials on the skb so that userspace does not drop the packet
I guess we simply want:
if (user_ns != &init_user_ns) {
NETLINK_CB(skb).creds.uid = (kuid_t)0;
NETLINK_CB(skb).creds.gid = kgid_t)0;
}
instead of the more complicated and - imho wrong:
if (user_ns != &init_user_ns) {
/* fix credentials for udev running in user namespace */
kuid_t uid = NETLINK_CB(skb).creds.uid;
kgid_t gid = NETLINK_CB(skb).creds.gid;
NETLINK_CB(skb).creds.uid = from_kuid_munged(user_ns, uid);
NETLINK_CB(skb).creds.gid = from_kgid_munged(user_ns, gid);
}
Christian
> >> sent to a network namespace because it has the credentials that will
> >> cause userspace to drop the packet today.
> >>
> >> But it should be straight forward to look at net->user_ns, to fix the
> >> credentials.
> >
> > Yes, afaict, the only thing that needs to be updated is the uid.
>
> I suspect there may also be a gid.
>
> Eric
next prev parent reply other threads:[~2018-04-26 21:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-24 20:43 [PATCH net-next 0/2 v2] netns: uevent performance tweaks Christian Brauner
2018-04-24 20:43 ` [PATCH net-next 1/2 v2] netns: restrict uevents Christian Brauner
2018-04-24 21:54 ` Eric W. Biederman
2018-04-24 22:40 ` Eric W. Biederman
2018-04-24 22:47 ` Christian Brauner
2018-04-24 23:00 ` Eric W. Biederman
2018-04-26 16:13 ` Christian Brauner
2018-04-26 16:47 ` Eric W. Biederman
2018-04-26 17:03 ` Christian Brauner
2018-04-26 17:10 ` Eric W. Biederman
2018-04-26 21:27 ` Christian Brauner [this message]
2018-04-27 0:35 ` Eric W. Biederman
2018-04-27 8:41 ` Christian Brauner
2018-04-24 22:54 ` Christian Brauner
2018-04-24 20:43 ` [PATCH net-next 2/2 v2] netns: isolate seqnums to use per-netns locks Christian Brauner
2018-04-24 21:52 ` Eric W. Biederman
2018-04-24 22:20 ` Christian Brauner
2018-04-24 22:52 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180426212744.GA30270@gmail.com \
--to=christian.brauner@canonical.com \
--cc=avagin@virtuozzo.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=gregkh@linuxfoundation.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=serge@hallyn.com \
--subject='Re: [PATCH net-next 1/2 v2] netns: restrict uevents' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).