LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [PATCH v2 0/2] close_range()
@ 2019-05-23 18:21 Alexey Dobriyan
  2019-05-23 21:34 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Alexey Dobriyan @ 2019-05-23 18:21 UTC (permalink / raw)
  To: christian; +Cc: linux-kernel, linux-fsdevel

> This is v2 of this patchset.

We've sent fdmap(2) back in the day:
https://marc.info/?l=linux-kernel&m=150628359803324&w=4

It can do everything close_range() does and potentially more.

If people ask for it I can rebase it and resend.

P.S.: you are 2 steps behind :-)
https://lwn.net/Articles/490224/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] close_range()
  2019-05-23 18:21 [PATCH v2 0/2] close_range() Alexey Dobriyan
@ 2019-05-23 21:34 ` Linus Torvalds
  2019-05-24 10:27   ` Christian Brauner
  2019-05-24 18:39   ` Alexey Dobriyan
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2019-05-23 21:34 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Christian Brauner, Linux List Kernel Mailing, linux-fsdevel

On Thu, May 23, 2019 at 11:22 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
> > This is v2 of this patchset.
>
> We've sent fdmap(2) back in the day:

Well, if the main point of the exercise is performance, then fdmap()
is clearly inferior.

Sadly, with all the HW security mitigation, system calls are no longer cheap.

Would there ever be any other reason to traverse unknown open files
than to close them?

                   Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] close_range()
  2019-05-23 21:34 ` Linus Torvalds
@ 2019-05-24 10:27   ` Christian Brauner
  2019-05-24 18:39   ` Alexey Dobriyan
  1 sibling, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2019-05-24 10:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexey Dobriyan, fweimer, oleg, arnd, viro,
	Linux List Kernel Mailing, linux-fsdevel

On Thu, May 23, 2019 at 02:34:31PM -0700, Linus Torvalds wrote:
> On Thu, May 23, 2019 at 11:22 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> > > This is v2 of this patchset.
> >
> > We've sent fdmap(2) back in the day:
> 
> Well, if the main point of the exercise is performance, then fdmap()
> is clearly inferior.
> 
> Sadly, with all the HW security mitigation, system calls are no longer cheap.
> 
> Would there ever be any other reason to traverse unknown open files
> than to close them?

I have had lively discussions and interestingly worded mails on account
of all of this. But noone has brought up this scenario. Florian also
said that it's not needed [1].

If we really want something like that we don't really need a new syscall
I think. We can just do a prctl() command or fcntl() command that will
give you back the next open fd.

There's imho crazy ideas out there what people expect a multi-close file
descriptor solution to look like. Service manager people apparently
think it would be a great idea to have a syscall that takes an array of
fds which the kernel is supposed to leave open and close all others,
basically "close all of the fds only leave out those I tell you".
I think for such a use-cases they can push for a prctl(PR_GET_NEXTFD, 2)
or a fcntl(2, F_GET_NEXTFD) and implement that in userspace.

I really only care about having a performant solution to closing a range
of fds that's a little more flexible than closefrom() without going all
crazy generic and copying (possibly) large bits of data between kernel-
and userspace.

close_range() is really something I've picked up on the side because the
current state has bothered me (and others) a long time whenever I have
to have my userspace hat on. With Al being in favor of it this seemed
like we should do it.
I actually wanted to have Jann's and my clone6() version on the table by
now since that would unblock larger things like the time namespace
patchset.

In any case I'll send v3 with my max()/min() braino fixed that Oleg
thankfully spotted and the split into two patches that Arnd suggested.

[1]: https://lkml.org/lkml/2019/5/21/516

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] close_range()
  2019-05-23 21:34 ` Linus Torvalds
  2019-05-24 10:27   ` Christian Brauner
@ 2019-05-24 18:39   ` Alexey Dobriyan
  2019-05-24 18:55     ` Linus Torvalds
  1 sibling, 1 reply; 8+ messages in thread
From: Alexey Dobriyan @ 2019-05-24 18:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian Brauner, Linux List Kernel Mailing, linux-fsdevel

On Thu, May 23, 2019 at 02:34:31PM -0700, Linus Torvalds wrote:
> On Thu, May 23, 2019 at 11:22 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> > > This is v2 of this patchset.
> >
> > We've sent fdmap(2) back in the day:
> 
> Well, if the main point of the exercise is performance, then fdmap()
> is clearly inferior.

This is not true because there are other usecases.

Current equivalent is readdir() where getdents is essentially bulk fdmap()
with pretty-printing. glibc does getdents into 32KB buffer.

There was a bulk taskstats patch long before meltdown fiasco.

Unfortunately closerange() only closes ranges.
This is why I didn't even tried to send closefrom(2) from OpenBSD.

> Sadly, with all the HW security mitigation, system calls are no longer cheap.
> 
> Would there ever be any other reason to traverse unknown open files
> than to close them?

This is what lsof(1) does:

3140  openat(AT_FDCWD, "/proc/29499/fd", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
3140  fstat(4, {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0
3140  getdents(4, /* 6 entries */, 32768) = 144
3140  readlink("/proc/29499/fd/0", "/dev/pts/4", 4096) = 10
3140  lstat("/proc/29499/fd/0", {st_mode=S_IFLNK|0700, st_size=64, ...}) = 0
3140  stat("/proc/29499/fd/0", {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 4), ...}) = 0
3140  openat(AT_FDCWD, "/proc/29499/fdinfo/0", O_RDONLY) = 7
3140  fstat(7, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
3140  read(7, "pos:\t0\nflags:\t02002\nmnt_id:\t24\n", 1024) = 31
3140  read(7, "", 1024)                 = 0
3140  close(7)
	...

Once fdmap(2) or equivalent is in, more bulk system calls operating on
descriptors can pop up. But closefrom() will remain closefrom().

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] close_range()
  2019-05-24 18:39   ` Alexey Dobriyan
@ 2019-05-24 18:55     ` Linus Torvalds
  2019-05-24 21:27       ` Alexey Dobriyan
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2019-05-24 18:55 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Christian Brauner, Linux List Kernel Mailing, linux-fsdevel

On Fri, May 24, 2019 at 11:39 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
> > Would there ever be any other reason to traverse unknown open files
> > than to close them?
>
> This is what lsof(1) does:

I repeat: Would there ever be any other reason to traverse unknown
open files than to close them?

lsof is not AT ALL a relevant argument.

lsof fundamentally wants /proc, because lsof looks at *other*
processes. That has absolutely zero to do with fdmap. lsof does *not*
want fdmap at all. It wants "list other processes files". Which is
very much what /proc is all about.

So your argument that "fdmap is more generic" is bogus.

fdmap is entirely pointless unless you can show a real and relevant
(to performance) use of it.

When you would *possibly* have a "let me get a list of all the file
descriptors I have open, because I didn't track them myself"
situation?  That makes no sense. Particularly from a performance
standpoint.

In contrast, "close_range()" makes sense as an operation. I can
explain exactly when it would be used, and I can easily see a
situation where "I've opened a ton of files, now I want to release
them" is a valid model of operation. And it's a valid optimization to
do a bulk operation like that.

IOW, close_range() makes sense as an operation even if you could just
say "ok, I know exactly what files I have open". But it also makes
sense as an operation for the case of "I don't even care what files I
have open, I just want to close them".

In contrast, the "I have opened a ton of files, and I don't even know
what the hell I did, so can you list them for me" makes no sense.

Because outside of "close them", there's no bulk operation that makes
sense on random file handles that you don't know what they are. Unless
you iterate over them and do the stat thing or whatever to figure it
out - which is lsof, but as mentioned, it's about *other* peoples
files.

               Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] close_range()
  2019-05-24 18:55     ` Linus Torvalds
@ 2019-05-24 21:27       ` Alexey Dobriyan
  2019-05-24 23:45         ` Al Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Alexey Dobriyan @ 2019-05-24 21:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian Brauner, Linux List Kernel Mailing, linux-fsdevel

On Fri, May 24, 2019 at 11:55:44AM -0700, Linus Torvalds wrote:
> On Fri, May 24, 2019 at 11:39 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> > > Would there ever be any other reason to traverse unknown open files
> > > than to close them?
> >
> > This is what lsof(1) does:
> 
> I repeat: Would there ever be any other reason to traverse unknown
> open files than to close them?
> 
> lsof is not AT ALL a relevant argument.
> 
> lsof fundamentally wants /proc, because lsof looks at *other*
> processes. That has absolutely zero to do with fdmap. lsof does *not*
> want fdmap at all. It wants "list other processes files". Which is
> very much what /proc is all about.
> 
> So your argument that "fdmap is more generic" is bogus.
> 
> fdmap is entirely pointless unless you can show a real and relevant
> (to performance) use of it.
> 
> When you would *possibly* have a "let me get a list of all the file
> descriptors I have open, because I didn't track them myself"
> situation?  That makes no sense. Particularly from a performance
> standpoint.
> 
> In contrast, "close_range()" makes sense as an operation.

What about orthogonality of interfaces?

	fdmap()
	bulk_close()

Now fdmap() can be reused for lsof/criu and it is only 2 system calls
for close-everything usecase which is OK because readdir is 4(!) minimum:

	open
	getdents
	getdents() = 0
	close

Writing all of this I understood how fdmap can be made more faster which
neither getdents() nor even read() have the luxury of: it can return
a flag if more data is available so that application would do next fdmap()
only if truly necessary.

> I can
> explain exactly when it would be used, and I can easily see a
> situation where "I've opened a ton of files, now I want to release
> them" is a valid model of operation. And it's a valid optimization to
> do a bulk operation like that.
> 
> IOW, close_range() makes sense as an operation even if you could just
> say "ok, I know exactly what files I have open". But it also makes
> sense as an operation for the case of "I don't even care what files I
> have open, I just want to close them".
> 
> In contrast, the "I have opened a ton of files, and I don't even know
> what the hell I did, so can you list them for me" makes no sense.
> 
> Because outside of "close them", there's no bulk operation that makes
> sense on random file handles that you don't know what they are. Unless
> you iterate over them and do the stat thing or whatever to figure it
> out - which is lsof, but as mentioned, it's about *other* peoples
> files.

What you're doing is making exactly one usecase take exactly one system
call and leaving everything else deal with /proc. Stracing lsof shows
very clearly how stupid and how wasteful it is. Especially now that
we're post-meltdown era caring about system call costs (yeah suure).

I'm suggesting make close-universe usecase take only 2 system calls.
which is still better than anything /proc can offer.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] close_range()
  2019-05-24 21:27       ` Alexey Dobriyan
@ 2019-05-24 23:45         ` Al Viro
  0 siblings, 0 replies; 8+ messages in thread
From: Al Viro @ 2019-05-24 23:45 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Linus Torvalds, Christian Brauner, Linux List Kernel Mailing,
	linux-fsdevel

On Sat, May 25, 2019 at 12:27:40AM +0300, Alexey Dobriyan wrote:

> What about orthogonality of interfaces?
> 
> 	fdmap()
> 	bulk_close()
> 
> Now fdmap() can be reused for lsof/criu and it is only 2 system calls
> for close-everything usecase which is OK because readdir is 4(!) minimum:
> 
> 	open
> 	getdents
> 	getdents() = 0
> 	close
> 
> Writing all of this I understood how fdmap can be made more faster which
> neither getdents() nor even read() have the luxury of: it can return
> a flag if more data is available so that application would do next fdmap()
> only if truly necessary.

Tactless question: what has traumatised you so badly about string operations?
Because that seems to be the common denominator to a lot of things...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 0/2] close_range()
@ 2019-05-23 15:47 Christian Brauner
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2019-05-23 15:47 UTC (permalink / raw)
  To: viro, linux-kernel, linux-fsdevel, linux-api, torvalds, fweimer
  Cc: jannh, oleg, tglx, arnd, shuah, dhowells, tkjos, ldv, miklos,
	linux-alpha, linux-arm-kernel, linux-ia64, linux-m68k,
	linux-mips, linux-parisc, linuxppc-dev, linux-s390, linux-sh,
	sparclinux, linux-xtensa, linux-arch, linux-kselftest, x86,
	Christian Brauner

Hey,

This is v2 of this patchset.

In accordance with some comments There's a cond_resched() added to the
close loop similar to what is done for close_files().
A common helper pick_file() for __close_fd() and __close_range() has
been split out. This allows to only make a cond_resched() call when
filp_close() has been called similar to what is done in close_files().
Maybe that's not worth it. Jann mentioned that cond_resched() looks
rather cheap.
So it maybe that we could simply do:

while (fd <= max_fd) {
       __close(files, fd++);
       cond_resched();
}

I also added a missing test for close_range(fd, fd, 0).

Thanks!
Christian

Christian Brauner (2):
  open: add close_range()
  tests: add close_range() tests

 arch/alpha/kernel/syscalls/syscall.tbl        |   1 +
 arch/arm/tools/syscall.tbl                    |   1 +
 arch/arm64/include/asm/unistd32.h             |   2 +
 arch/ia64/kernel/syscalls/syscall.tbl         |   1 +
 arch/m68k/kernel/syscalls/syscall.tbl         |   1 +
 arch/microblaze/kernel/syscalls/syscall.tbl   |   1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_n64.tbl     |   1 +
 arch/mips/kernel/syscalls/syscall_o32.tbl     |   1 +
 arch/parisc/kernel/syscalls/syscall.tbl       |   1 +
 arch/powerpc/kernel/syscalls/syscall.tbl      |   1 +
 arch/s390/kernel/syscalls/syscall.tbl         |   1 +
 arch/sh/kernel/syscalls/syscall.tbl           |   1 +
 arch/sparc/kernel/syscalls/syscall.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_32.tbl        |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl        |   1 +
 arch/xtensa/kernel/syscalls/syscall.tbl       |   1 +
 fs/file.c                                     |  62 +++++++-
 fs/open.c                                     |  20 +++
 include/linux/fdtable.h                       |   2 +
 include/linux/syscalls.h                      |   2 +
 include/uapi/asm-generic/unistd.h             |   4 +-
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/core/.gitignore       |   1 +
 tools/testing/selftests/core/Makefile         |   6 +
 .../testing/selftests/core/close_range_test.c | 142 ++++++++++++++++++
 26 files changed, 249 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/core/.gitignore
 create mode 100644 tools/testing/selftests/core/Makefile
 create mode 100644 tools/testing/selftests/core/close_range_test.c

-- 
2.21.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-05-24 23:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23 18:21 [PATCH v2 0/2] close_range() Alexey Dobriyan
2019-05-23 21:34 ` Linus Torvalds
2019-05-24 10:27   ` Christian Brauner
2019-05-24 18:39   ` Alexey Dobriyan
2019-05-24 18:55     ` Linus Torvalds
2019-05-24 21:27       ` Alexey Dobriyan
2019-05-24 23:45         ` Al Viro
  -- strict thread matches above, loose matches on Subject: below --
2019-05-23 15:47 Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).