LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Jeff Layton <jlayton@kernel.org>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Arnd Bergmann <arnd@arndb.de>,
	David Howells <dhowells@redhat.com>,
	Christian Brauner <christian@brauner.io>,
	Eric Biederman <ebiederm@xmission.com>,
	Andy Lutomirski <luto@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Kees Cook <keescook@chromium.org>, Jann Horn <jannh@google.com>,
	Tycho Andersen <tycho@tycho.ws>,
	David Drysdale <drysdale@google.com>,
	Chanho Min <chanho.min@lge.com>, Oleg Nesterov <oleg@redhat.com>,
	Aleksa Sarai <asarai@suse.de>,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>
Subject: Re: [PATCH v7 5/5] namei: resolveat(2) syscall
Date: Sat, 25 May 2019 17:03:07 +1000	[thread overview]
Message-ID: <20190525070307.bxbvjh2254sx2z6g@yavin> (raw)
In-Reply-To: <CAHk-=whbFMg4+HuWOBuHpvDNiAyowX2HUowv3+pt8vPWk5W-YQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6554 bytes --]

On 2019-05-24, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, May 7, 2019 at 9:44 AM Aleksa Sarai <cyphar@cyphar.com> wrote:
> >
> > The most obvious syscall to add support for the new LOOKUP_* scoping
> > flags would be openat(2) (along with the required execveat(2) change
> > included in this series). However, there are a few reasons to not do
> > this:
> 
> So honestly, this last patch is what turns me off the whole thing.
> 
> It goes from a nice new feature ("you can use O_NOSYMLINKS to disallow
> symlink traversal") to a special-case joke that isn't worth it any
> more. You get a useless path descrptor back from s special hacky
> system call, you don't actually get the useful data that you probably
> *want* the open to get you.
> 
> Sure, you could eventually then use a *second* system call (openat
> with O_EMPTYPATH) to actually get something you can *use*, but at this
> point you've just wasted everybodys time and effort with a pointless
> second system call.
> 
> So I really don't see the point of this whole thing. Why even bother.
> Nobody sane will ever use that odd two-systemcall model, and even if
> they did, it would be slower and inconvenient.
> 
> The whole and only point of this seems to be the two lines that say
> 
>        if (flags & ~VALID_RESOLVE_FLAGS)
>               return -EINVAL;
> 
> but that adds absolutely zero value to anything.  The argument is that
> "we can't add it to existing flags, because old kernels won't honor
> it", but that's a completely BS argument, since the user has to have a
> fallback anyway for the old kernel case - so we literally could much
> more conveniently just expose it as a prctl() or something to _ask_
> the kernel what flags it honors.
> 
> So to me, this whole argument means that "Oh, we'll make it really
> inconvenient to actually use this".
> 
> If we want to introduce a new system call that allows cool new
> features, it should have *more* powerful semantics than the existing
> ones, not be clearly weaker and less useful.

You might not have seen the v8 of this set I sent a few days ago[1]. The
new set includes an example of a feature that is possible with
resolveat(2) but not with the current openat(O_PATH) interface. The
feature is that you can set RESOLVE_UPGRADE_NO{READ,WRITE} which then
blocks the re-opening of the file descriptor with those MAY_* modes.
(Though of course you might be against the entire idea of this feature
which allows for restricting the opening of magic-links.)

This can't be done with openat(2) without adding even more flags such as
O_PATH_UPGRADE_NOWRITE -- because O_RDONLY = 0, which means you can't
distinguish the "don't allow read or write" case (we could define 0x3
for that, but that feels a tad ugly). Not to mention that broken
userspace programs might already be setting O_PATH|O_RDWR.

So, while making it easier for userspace to be sure these flags are
working is one benefit, it's not the only reason. And outside arguments
for future features, several folks (some on-list, some on LWN) argued
that adding more "open" flags which aren't clearly related to the mode
the file is opened with makes not-much-more sense than a separate
syscall for it. Another (weaker) argument is that O_PATH should've been
separate from the beginning because of how unlike an ordinary fd it is.

Funnily enough, v8 does contain O_EMPTYPATH. However, this is just an
example of another /proc-less interface and we need it for O_PATH
descriptors even if you can do an full open in one shot with restricted
path resolution. Having an O_PATH can be useful on its own (LXC takes an
O_PATH of /dev/pts/ptmx inside the container and then re-opens it each
time a new console is required to avoid touching paths inside the
container).

But it would be neat to have a way for userspace to easily check what
flags the kernel honours, regardless of this patchset.

> So how about making the new system call be something that is a
> *superset* of "openat()" so that people can use that, and then if it
> fails, just fall back to openat(). But if it succeeds, it just
> succeeds, and you don't need to then do other system calls to actually
> make it useful.
> 
> Make the new system call something people *want* to use because it's
> useful, not a crippled useless thing that has some special case use
> for some limited thing and just wastes system call space.

At the moment, I'm working on implementing userspace library wrappers
which use resolveat(2) for safe handling of an untrusted rootfs. I would
expect that most users of resolveat(2) would be using a library to
handle it -- because to do an "mkdir -p" you need to do a fair bit of
work for it to be safe unless we add LOOKUP_* flags to mkdirat(2) and
every other syscall. This is true whether or not openat(2) provides this
feature or if it's a separate syscall.

> Example *useful* system call attributes:
> 
>  - make it like openat(), but have another argument with the "limit flags"

Sure, this would also work. I didn't know if anyone was open to the idea
of openat2(2). There is a follow-up question of how RESOLVE_UPGRADE_NO*
flags would be handled (they aren't obviously "lookup" flags so we'd
need to add more openat(2) flags to accommodate them) but I'm sure that
can be ironed out once you've taken a look at that patchset.

>  - maybe return more status of the resulting file. People very
> commonly do "open->fstat" just to get the size for mmap or to check
> some other detail of the file before use.

So something like

  resolveat(rootfd, "path/to/file", RESOLVE_IN_ROOT, &statbuf);

or

  openat2(rootfd, "path/to/file", O_PATH, RESOLVE_IN_ROOT, &statbuf);

? Is there a large amount of overhead or downside to the current
open->fstat way of doing things, or is this more of a "if we're going to
add more ways of opening we might as well add more useful things"?

> In other words, make the new system call *useful*. Not some castrated
> "not useful on its own" thing.
>
> So I still support the whole "let's make it easy to limit path lookup
> in sane ways", but this model of then limiting using the result sanely
> just makes me a sad panda.

I am glad that you agree with the general thrust, and it's just the
interface that is the hang-up.

[1]: https://marc.info/?l=linux-fsdevel&m=155835923516235&w=2

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2019-05-25  7:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-07 16:43 [PATCH v7 0/5] namei: resolveat(2) path resolution restriction API Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 1/5] namei: split out nd->dfd handling to dirfd_path_init Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 2/5] namei: O_BENEATH-style path resolution flags Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 3/5] namei: LOOKUP_IN_ROOT: chroot-like path resolution Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 4/5] namei: aggressively check for nd->root escape on ".." resolution Aleksa Sarai
2019-05-07 16:43 ` [PATCH v7 5/5] namei: resolveat(2) syscall Aleksa Sarai
2019-05-24 22:59   ` Linus Torvalds
2019-05-25  7:03     ` Aleksa Sarai [this message]
2019-05-25 16:54       ` Linus Torvalds
2019-05-26  5:46         ` Aleksa Sarai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190525070307.bxbvjh2254sx2z6g@yavin \
    --to=cyphar@cyphar.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=asarai@suse.de \
    --cc=ast@kernel.org \
    --cc=bfields@fieldses.org \
    --cc=chanho.min@lge.com \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=drysdale@google.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=jlayton@kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tycho@tycho.ws \
    --cc=viro@zeniv.linux.org.uk \
    --subject='Re: [PATCH v7 5/5] namei: resolveat(2) syscall' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).