LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Minimal effort/low overhead file descriptor duplication over Posix.1b s
@ 2014-12-02  4:35 Alex Dubov
  2014-12-02  4:35 ` [PATCH 1/2] fs: introduce sendfd() syscall Alex Dubov
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Alex Dubov @ 2014-12-02  4:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: alex.dubov

A common requirement in parallel processing applications (relied upon by
popular network servers, databases and various other applications) is to
pass open file descriptors between processes. Historically, several mechanisms
existed to support this requirement, such as those provided by "cmsg" facility
of unix domain sockets or special operations on named pipes (on Android this
can also be achieved using "binder" facility).

Unfortunately, using facilities like Unix domain sockets to merely pass file
descriptors between "worker" processes is unnecessarily difficult, due to
the following common consideration:

1. Domain sockets and named pipes are persistent objects. Applications must
manage their lifetime and devise unambiguous access schemes in case multiple
application instances are to be run within the same OS instance. Usually, they
would also require a writable file system to be mounted.

2. Interaction with domain sockets and named pipes requires a sizable,
non-trivial and error-prone code on the application side, especially in
cases where multiple worker types started by multiple application instances
must coexist within the same OS instance.

3. Domain sockets and pipes require creation of complex kernel-side set-ups,
whereupon, in many cases, the only information ever passed by the application
over those channels are file descriptors (it is usual for the major part of the
application's shared state to be established through other mechanisms,
like shared memory). In some cases, applications are forced to send meaningless
rubbish over the domain socket merely to "push" the associated "cmsg" carrying
the file descriptor through.

Present patch introduces exceptionally easy to use, low latency and low
overhead mechanism for transferring file descriptors between cooperating
processes:

    int sendfd(pid_t pid, int sig, int fd)

Given a target process pid, the sendfd() syscall will create a duplicate
file descriptor in a target task's (referred by pid) file table pointing to
the file references by descriptor fd. Then, it will attempt to notify the
target task by issuing a Posix.1b real-time signal (sig), carrying the new
file descriptor as integer payload. If real-time signal can not be enqueued
at the destination signal queue, the newly created file descriptor will be
promptly closed.

It is believed, that proposed sendfd() syscall, together with recently
accepted "memfd" facility may greatly simplify development of parallel
processing applications, by eliminating the need to rely on tricky and
possibly insecure approaches involving domain sockets and such.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-02  4:35 Minimal effort/low overhead file descriptor duplication over Posix.1b s Alex Dubov
@ 2014-12-02  4:35 ` Alex Dubov
  2014-12-02 12:50   ` Eric Dumazet
  2014-12-02 17:00   ` Al Viro
  2014-12-02  4:35 ` [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures) Alex Dubov
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Alex Dubov @ 2014-12-02  4:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: alex.dubov, Alex Dubov

Present patch introduces exceptionally easy to use, low latency and low
overhead mechanism for transferring file descriptors between cooperating
processes:

    int sendfd(pid_t pid, int sig, int fd)

Given a target process pid, the sendfd() syscall will create a duplicate
file descriptor in a target task's (referred by pid) file table pointing to
the file references by descriptor fd. Then, it will attempt to notify the
target task by issuing a Posix.1b real-time signal (sig), carrying the new
file descriptor as integer payload. If real-time signal can not be enqueued
at the destination signal queue, the newly created file descriptor will be
promptly closed.

Signed-off-by: Alex Dubov <oakad@yahoo.com>
---
 fs/Makefile  |  1 +
 fs/sendfd.c  | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 init/Kconfig | 11 ++++++++
 3 files changed, 94 insertions(+)
 create mode 100644 fs/sendfd.c

diff --git a/fs/Makefile b/fs/Makefile
index da0bbb4..bed05a8 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -27,6 +27,7 @@ obj-$(CONFIG_ANON_INODES)	+= anon_inodes.o
 obj-$(CONFIG_SIGNALFD)		+= signalfd.o
 obj-$(CONFIG_TIMERFD)		+= timerfd.o
 obj-$(CONFIG_EVENTFD)		+= eventfd.o
+obj-$(CONFIG_SENDFD)		+= sendfd.o
 obj-$(CONFIG_AIO)               += aio.o
 obj-$(CONFIG_FILE_LOCKING)      += locks.o
 obj-$(CONFIG_COMPAT)		+= compat.o compat_ioctl.o
diff --git a/fs/sendfd.c b/fs/sendfd.c
new file mode 100644
index 0000000..1e85484
--- /dev/null
+++ b/fs/sendfd.c
@@ -0,0 +1,82 @@
+/*
+ *  fs/sendfd.c
+ *
+ *  Copyright (C) 2014 Alex Dubov <oakad@yahoo.com>
+ *
+ */
+
+#include <linux/file.h>
+#include <linux/fdtable.h>
+#include <linux/syscalls.h>
+
+SYSCALL_DEFINE3(sendfd, pid_t, pid, int, sig, int, fd)
+{
+	struct siginfo s_info = {
+		.si_signo = sig,
+		.si_errno = 0,
+		.si_code = __SI_RT
+	};
+	struct file *src_file = NULL;
+	struct task_struct *dst_task = NULL;
+	struct files_struct *dst_files  = NULL;
+	unsigned long rlim = 0;
+	unsigned long flags = 0;
+	int rc = 0;
+
+	if ((sig < SIGRTMIN) || (sig > SIGRTMAX))
+		return -EINVAL;
+
+	s_info.si_pid = task_pid_vnr(current);
+	s_info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+	s_info.si_int = -1;
+
+	src_file = fget(fd);
+	if (!src_file)
+		return -EBADF;
+
+	rcu_read_lock();
+	dst_task = find_task_by_vpid(pid);
+
+	if (!dst_task) {
+		rc = -ESRCH;
+		goto out_put_src_file;
+	}
+	get_task_struct(dst_task);
+	rcu_read_unlock();
+
+	dst_files = get_files_struct(dst_task);
+	if (!dst_files) {
+		rc = -EMFILE;
+		goto out_put_dst_task;
+	}
+
+	if (!lock_task_sighand(dst_task, &flags)) {
+		rc = -EMFILE;
+		goto out_put_dst_files;
+	}
+
+	rlim = task_rlimit(dst_task, RLIMIT_NOFILE);
+
+	unlock_task_sighand(dst_task, &flags);
+
+	rc = __alloc_fd(dst_task->files, 0, rlim, O_CLOEXEC);
+	if (rc < 0)
+		goto out_put_dst_files;
+
+	s_info.si_int = rc;
+
+	get_file(src_file);
+	__fd_install(dst_files, rc, src_file);
+	rc = kill_pid_info(sig, &s_info, task_pid(dst_task));
+
+	if (rc < 0)
+		__close_fd(dst_files, s_info.si_int);
+
+out_put_dst_files:
+	put_files_struct(dst_files);
+out_put_dst_task:
+	put_task_struct(dst_task);
+out_put_src_file:
+	fput(src_file);
+	return rc;
+}
diff --git a/init/Kconfig b/init/Kconfig
index 2081a4d..dfe8b6f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1525,6 +1525,17 @@ config EVENTFD
 
 	  If unsure, say Y.
 
+config SENDFD
+	bool "Enable sendfd() system call" if EXPERT
+	default y
+	help
+	  Enable the sendfd() system call that allows rapid duplication
+	  of file descriptor across process boundaries. The target process
+	  will receive a duplicate file descriptor delivered with one of
+	  Posix.1b real-time signals.
+
+	  If unsure, say Y.
+
 # syscall, maps, verifier
 config BPF_SYSCALL
 	bool "Enable bpf() system call" if EXPERT
-- 
1.8.3.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures)
  2014-12-02  4:35 Minimal effort/low overhead file descriptor duplication over Posix.1b s Alex Dubov
  2014-12-02  4:35 ` [PATCH 1/2] fs: introduce sendfd() syscall Alex Dubov
@ 2014-12-02  4:35 ` Alex Dubov
  2014-12-02  8:01   ` Geert Uytterhoeven
  2014-12-02 15:26 ` Minimal effort/low overhead file descriptor duplication over Posix.1b s Jonathan Corbet
  2014-12-17 13:11 ` Kevin Easton
  3 siblings, 1 reply; 28+ messages in thread
From: Alex Dubov @ 2014-12-02  4:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: alex.dubov, Alex Dubov

Signed-off-by: Alex Dubov <oakad@yahoo.com>
---
 arch/arm/include/uapi/asm/unistd.h        |  1 +
 arch/arm/kernel/calls.S                   |  1 +
 arch/arm64/include/asm/unistd32.h         |  2 ++
 arch/ia64/include/uapi/asm/unistd.h       |  1 +
 arch/ia64/kernel/entry.S                  |  1 +
 arch/m68k/include/uapi/asm/unistd.h       |  1 +
 arch/m68k/kernel/syscalltable.S           |  1 +
 arch/microblaze/include/uapi/asm/unistd.h |  1 +
 arch/microblaze/kernel/syscall_table.S    |  1 +
 arch/mips/include/uapi/asm/unistd.h       | 15 +++++++++------
 arch/mips/kernel/scall32-o32.S            |  1 +
 arch/mips/kernel/scall64-64.S             |  1 +
 arch/mips/kernel/scall64-n32.S            |  1 +
 arch/mips/kernel/scall64-o32.S            |  1 +
 arch/parisc/include/uapi/asm/unistd.h     |  3 ++-
 arch/powerpc/include/asm/systbl.h         |  1 +
 arch/powerpc/include/uapi/asm/unistd.h    |  1 +
 arch/s390/include/uapi/asm/unistd.h       |  3 ++-
 arch/s390/kernel/compat_wrapper.c         |  1 +
 arch/s390/kernel/syscalls.S               |  1 +
 arch/sparc/include/uapi/asm/unistd.h      |  3 ++-
 arch/sparc/kernel/systbls_32.S            |  2 +-
 arch/sparc/kernel/systbls_64.S            |  4 ++--
 arch/x86/syscalls/syscall_32.tbl          |  1 +
 arch/x86/syscalls/syscall_64.tbl          |  1 +
 arch/xtensa/include/uapi/asm/unistd.h     |  5 +++--
 include/linux/syscalls.h                  |  1 +
 include/uapi/asm-generic/unistd.h         |  4 +++-
 kernel/sys_ni.c                           |  3 +++
 29 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
index 705bb76..6428823 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -413,6 +413,7 @@
 #define __NR_getrandom			(__NR_SYSCALL_BASE+384)
 #define __NR_memfd_create		(__NR_SYSCALL_BASE+385)
 #define __NR_bpf			(__NR_SYSCALL_BASE+386)
+#define __NR_sendfd			(__NR_SYSCALL_BASE+387)
 
 /*
  * The following SWIs are ARM private.
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index e51833f..30bdeb5 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -396,6 +396,7 @@
 		CALL(sys_getrandom)
 /* 385 */	CALL(sys_memfd_create)
 		CALL(sys_bpf)
+		CALL(sys_sendfd)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 9dfdac4..7f19595 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -794,3 +794,5 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
 __SYSCALL(__NR_memfd_create, sys_memfd_create)
 #define __NR_bpf 386
 __SYSCALL(__NR_bpf, sys_bpf)
+#define __NR_sendfd 387
+__SYSCALL(__NR_sendfd, sys_sendfd)
diff --git a/arch/ia64/include/uapi/asm/unistd.h b/arch/ia64/include/uapi/asm/unistd.h
index 4c2240c..55be68c 100644
--- a/arch/ia64/include/uapi/asm/unistd.h
+++ b/arch/ia64/include/uapi/asm/unistd.h
@@ -331,5 +331,6 @@
 #define __NR_getrandom			1339
 #define __NR_memfd_create		1340
 #define __NR_bpf			1341
+#define __NR_sendfd			1342
 
 #endif /* _UAPI_ASM_IA64_UNISTD_H */
diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
index f5e96df..97596a3 100644
--- a/arch/ia64/kernel/entry.S
+++ b/arch/ia64/kernel/entry.S
@@ -1779,6 +1779,7 @@ sys_call_table:
 	data8 sys_getrandom
 	data8 sys_memfd_create			// 1340
 	data8 sys_bpf
+	data8 sys_sendfd
 
 	.org sys_call_table + 8*NR_syscalls	// guard against failures to increase NR_syscalls
 #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */
diff --git a/arch/m68k/include/uapi/asm/unistd.h b/arch/m68k/include/uapi/asm/unistd.h
index 2c1bec9..77e7098 100644
--- a/arch/m68k/include/uapi/asm/unistd.h
+++ b/arch/m68k/include/uapi/asm/unistd.h
@@ -360,5 +360,6 @@
 #define __NR_getrandom		352
 #define __NR_memfd_create	353
 #define __NR_bpf		354
+#define __NR_sendfd		355
 
 #endif /* _UAPI_ASM_M68K_UNISTD_H_ */
diff --git a/arch/m68k/kernel/syscalltable.S b/arch/m68k/kernel/syscalltable.S
index 2ca219e..3ea20d4 100644
--- a/arch/m68k/kernel/syscalltable.S
+++ b/arch/m68k/kernel/syscalltable.S
@@ -375,4 +375,5 @@ ENTRY(sys_call_table)
 	.long sys_getrandom
 	.long sys_memfd_create
 	.long sys_bpf
+	.long sys_sendfd
 
diff --git a/arch/microblaze/include/uapi/asm/unistd.h b/arch/microblaze/include/uapi/asm/unistd.h
index c712677..f69e30a 100644
--- a/arch/microblaze/include/uapi/asm/unistd.h
+++ b/arch/microblaze/include/uapi/asm/unistd.h
@@ -403,5 +403,6 @@
 #define __NR_getrandom		385
 #define __NR_memfd_create	386
 #define __NR_bpf		387
+#define __NR_sendfd		388
 
 #endif /* _UAPI_ASM_MICROBLAZE_UNISTD_H */
diff --git a/arch/microblaze/kernel/syscall_table.S b/arch/microblaze/kernel/syscall_table.S
index 0166e89..1550f45 100644
--- a/arch/microblaze/kernel/syscall_table.S
+++ b/arch/microblaze/kernel/syscall_table.S
@@ -388,3 +388,4 @@ ENTRY(sys_call_table)
 	.long sys_getrandom		/* 385 */
 	.long sys_memfd_create
 	.long sys_bpf
+	.long sys_sendfd
diff --git a/arch/mips/include/uapi/asm/unistd.h b/arch/mips/include/uapi/asm/unistd.h
index d001bb1..24109dc 100644
--- a/arch/mips/include/uapi/asm/unistd.h
+++ b/arch/mips/include/uapi/asm/unistd.h
@@ -376,16 +376,17 @@
 #define __NR_getrandom			(__NR_Linux + 353)
 #define __NR_memfd_create		(__NR_Linux + 354)
 #define __NR_bpf			(__NR_Linux + 355)
+#define __NR_sendfd			(__NR_Linux + 356)
 
 /*
  * Offset of the last Linux o32 flavoured syscall
  */
-#define __NR_Linux_syscalls		355
+#define __NR_Linux_syscalls		356
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
 
 #define __NR_O32_Linux			4000
-#define __NR_O32_Linux_syscalls		355
+#define __NR_O32_Linux_syscalls		356
 
 #if _MIPS_SIM == _MIPS_SIM_ABI64
 
@@ -709,16 +710,17 @@
 #define __NR_getrandom			(__NR_Linux + 313)
 #define __NR_memfd_create		(__NR_Linux + 314)
 #define __NR_bpf			(__NR_Linux + 315)
+#define __NR_sendfd			(__NR_Linux + 316)
 
 /*
  * Offset of the last Linux 64-bit flavoured syscall
  */
-#define __NR_Linux_syscalls		315
+#define __NR_Linux_syscalls		316
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
 
 #define __NR_64_Linux			5000
-#define __NR_64_Linux_syscalls		315
+#define __NR_64_Linux_syscalls		316
 
 #if _MIPS_SIM == _MIPS_SIM_NABI32
 
@@ -1046,15 +1048,16 @@
 #define __NR_getrandom			(__NR_Linux + 317)
 #define __NR_memfd_create		(__NR_Linux + 318)
 #define __NR_bpf			(__NR_Linux + 319)
+#define __NR_sendfd			(__NR_Linux + 320)
 
 /*
  * Offset of the last N32 flavoured syscall
  */
-#define __NR_Linux_syscalls		319
+#define __NR_Linux_syscalls		320
 
 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
 
 #define __NR_N32_Linux			6000
-#define __NR_N32_Linux_syscalls		319
+#define __NR_N32_Linux_syscalls		320
 
 #endif /* _UAPI_ASM_UNISTD_H */
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index 00cad10..94a7014 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -580,3 +580,4 @@ EXPORT(sys_call_table)
 	PTR	sys_getrandom
 	PTR	sys_memfd_create
 	PTR	sys_bpf				/* 4355 */
+	PTR	sys_sendfd
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index 5251565..cc2440d 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -435,4 +435,5 @@ EXPORT(sys_call_table)
 	PTR	sys_getrandom
 	PTR	sys_memfd_create
 	PTR	sys_bpf				/* 5315 */
+	PTR	sys_sendfd
 	.size	sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index 77e7439..ff1de3a 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -428,4 +428,5 @@ EXPORT(sysn32_call_table)
 	PTR	sys_getrandom
 	PTR	sys_memfd_create
 	PTR	sys_bpf
+	PTR	sys_sendfd
 	.size	sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index 6f8db9f..87d3a33 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -565,4 +565,5 @@ EXPORT(sys32_call_table)
 	PTR	sys_getrandom
 	PTR	sys_memfd_create
 	PTR	sys_bpf				/* 4355 */
+	PTR	sys_sendfd
 	.size	sys32_call_table,.-sys32_call_table
diff --git a/arch/parisc/include/uapi/asm/unistd.h b/arch/parisc/include/uapi/asm/unistd.h
index 5f5c037..f182787 100644
--- a/arch/parisc/include/uapi/asm/unistd.h
+++ b/arch/parisc/include/uapi/asm/unistd.h
@@ -834,8 +834,9 @@
 #define __NR_getrandom		(__NR_Linux + 339)
 #define __NR_memfd_create	(__NR_Linux + 340)
 #define __NR_bpf		(__NR_Linux + 341)
+#define __NR_sendfd		(__NR_Linux + 342)
 
-#define __NR_Linux_syscalls	(__NR_bpf + 1)
+#define __NR_Linux_syscalls	(__NR_sendfd + 1)
 
 
 #define __IGNORE_select		/* newselect */
diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index ce9577d..4aa6c22 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -366,3 +366,4 @@ SYSCALL_SPU(seccomp)
 SYSCALL_SPU(getrandom)
 SYSCALL_SPU(memfd_create)
 SYSCALL_SPU(bpf)
+SYSCALL_SPU(sendfd)
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index f55351f..2d55338 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -384,5 +384,6 @@
 #define __NR_getrandom		359
 #define __NR_memfd_create	360
 #define __NR_bpf		361
+#define __NR_sendfd		362
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
diff --git a/arch/s390/include/uapi/asm/unistd.h b/arch/s390/include/uapi/asm/unistd.h
index 4197c89..7248c4a 100644
--- a/arch/s390/include/uapi/asm/unistd.h
+++ b/arch/s390/include/uapi/asm/unistd.h
@@ -287,7 +287,8 @@
 #define __NR_getrandom		349
 #define __NR_memfd_create	350
 #define __NR_bpf		351
-#define NR_syscalls 352
+#define __NR_sendfd		352
+#define NR_syscalls 353
 
 /* 
  * There are some system calls that are not present on 64 bit, some
diff --git a/arch/s390/kernel/compat_wrapper.c b/arch/s390/kernel/compat_wrapper.c
index c4f7a3d..d931326 100644
--- a/arch/s390/kernel/compat_wrapper.c
+++ b/arch/s390/kernel/compat_wrapper.c
@@ -218,3 +218,4 @@ COMPAT_SYSCALL_WRAP3(seccomp, unsigned int, op, unsigned int, flags, const char
 COMPAT_SYSCALL_WRAP3(getrandom, char __user *, buf, size_t, count, unsigned int, flags)
 COMPAT_SYSCALL_WRAP2(memfd_create, const char __user *, uname, unsigned int, flags)
 COMPAT_SYSCALL_WRAP3(bpf, int, cmd, union bpf_attr *, attr, unsigned int, size);
+COMPAT_SYSCALL_WRAP3(sendfd, pid_t, pid, int, sig, int, fd);
diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S
index 9f7087f..b1beaf1 100644
--- a/arch/s390/kernel/syscalls.S
+++ b/arch/s390/kernel/syscalls.S
@@ -360,3 +360,4 @@ SYSCALL(sys_seccomp,sys_seccomp,compat_sys_seccomp)
 SYSCALL(sys_getrandom,sys_getrandom,compat_sys_getrandom)
 SYSCALL(sys_memfd_create,sys_memfd_create,compat_sys_memfd_create) /* 350 */
 SYSCALL(sys_bpf,sys_bpf,compat_sys_bpf)
+SYSCALL(sys_sendfd,sys_sendfd,compat_sys_sendfd)
diff --git a/arch/sparc/include/uapi/asm/unistd.h b/arch/sparc/include/uapi/asm/unistd.h
index 46d8384..a43637a 100644
--- a/arch/sparc/include/uapi/asm/unistd.h
+++ b/arch/sparc/include/uapi/asm/unistd.h
@@ -415,8 +415,9 @@
 #define __NR_getrandom		347
 #define __NR_memfd_create	348
 #define __NR_bpf		349
+#define __NR_sendfd		350
 
-#define NR_syscalls		350
+#define NR_syscalls		351
 
 /* Bitmask values returned from kern_features system call.  */
 #define KERN_FEATURE_MIXED_MODE_STACK	0x00000001
diff --git a/arch/sparc/kernel/systbls_32.S b/arch/sparc/kernel/systbls_32.S
index ad0cdf4..1b3ff92 100644
--- a/arch/sparc/kernel/systbls_32.S
+++ b/arch/sparc/kernel/systbls_32.S
@@ -86,4 +86,4 @@ sys_call_table:
 /*330*/	.long sys_fanotify_mark, sys_prlimit64, sys_name_to_handle_at, sys_open_by_handle_at, sys_clock_adjtime
 /*335*/	.long sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
 /*340*/	.long sys_ni_syscall, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
-/*345*/	.long sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
+/*345*/	.long sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf, sys_sendfd
diff --git a/arch/sparc/kernel/systbls_64.S b/arch/sparc/kernel/systbls_64.S
index 580cde9..ebbafb1 100644
--- a/arch/sparc/kernel/systbls_64.S
+++ b/arch/sparc/kernel/systbls_64.S
@@ -87,7 +87,7 @@ sys_call_table32:
 /*330*/	.word compat_sys_fanotify_mark, sys_prlimit64, sys_name_to_handle_at, compat_sys_open_by_handle_at, compat_sys_clock_adjtime
 	.word sys_syncfs, compat_sys_sendmmsg, sys_setns, compat_sys_process_vm_readv, compat_sys_process_vm_writev
 /*340*/	.word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
-	.word sys32_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
+	.word sys32_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf, sys_sendfd
 
 #endif /* CONFIG_COMPAT */
 
@@ -166,4 +166,4 @@ sys_call_table:
 /*330*/	.word sys_fanotify_mark, sys_prlimit64, sys_name_to_handle_at, sys_open_by_handle_at, sys_clock_adjtime
 	.word sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
 /*340*/	.word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
-	.word sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
+	.word sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf, sys_sendfd
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 9fe1b5d..dfe91f7 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -364,3 +364,4 @@
 355	i386	getrandom		sys_getrandom
 356	i386	memfd_create		sys_memfd_create
 357	i386	bpf			sys_bpf
+358	i386	sendfd			sys_sendfd
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 281150b..4d6b55d 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -328,6 +328,7 @@
 319	common	memfd_create		sys_memfd_create
 320	common	kexec_file_load		sys_kexec_file_load
 321	common	bpf			sys_bpf
+322	common	sendfd			sys_sendfd
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/include/uapi/asm/unistd.h b/arch/xtensa/include/uapi/asm/unistd.h
index db5bb72..3705d28 100644
--- a/arch/xtensa/include/uapi/asm/unistd.h
+++ b/arch/xtensa/include/uapi/asm/unistd.h
@@ -749,8 +749,9 @@ __SYSCALL(337, sys_seccomp, 3)
 __SYSCALL(338, sys_getrandom, 3)
 #define __NR_memfd_create			339
 __SYSCALL(339, sys_memfd_create, 2)
-
-#define __NR_syscall_count			340
+#define __NR_sendfd				340
+__SYSCALL(340, sys_sendfd, 3)
+#define __NR_syscall_count			341
 
 /*
  * sysxtensa syscall handler
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index bda9b81..1871b72f 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -877,4 +877,5 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
 asmlinkage long sys_getrandom(char __user *buf, size_t count,
 			      unsigned int flags);
 asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
+asmlinkage long sys_sendfd(pid_t pid, int sig, int fd);
 #endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 22749c1..270aa02 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -707,9 +707,11 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
 __SYSCALL(__NR_memfd_create, sys_memfd_create)
 #define __NR_bpf 280
 __SYSCALL(__NR_bpf, sys_bpf)
+#define __NR_sendfd 281
+__SYSCALL(__NR_sendfd, sys_sendfd)
 
 #undef __NR_syscalls
-#define __NR_syscalls 281
+#define __NR_syscalls 282
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 02aa418..353cddb 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -224,3 +224,6 @@ cond_syscall(sys_seccomp);
 
 /* access BPF programs and maps */
 cond_syscall(sys_bpf);
+
+/* send file descriptor to another process */
+cond_syscall(sys_sendfd);
-- 
1.8.3.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures)
  2014-12-02  4:35 ` [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures) Alex Dubov
@ 2014-12-02  8:01   ` Geert Uytterhoeven
  2014-12-02  8:31     ` Alex Dubov
  2014-12-02 11:42     ` Michal Simek
  0 siblings, 2 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2014-12-02  8:01 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-kernel, Alex Dubov

This really needs a CC to linux-arch (added).

On Tue, Dec 2, 2014 at 5:35 AM, Alex Dubov <alex.dubov@gmail.com> wrote:
> Signed-off-by: Alex Dubov <oakad@yahoo.com>
> ---
>  arch/arm/include/uapi/asm/unistd.h        |  1 +
>  arch/arm/kernel/calls.S                   |  1 +
>  arch/arm64/include/asm/unistd32.h         |  2 ++
>  arch/ia64/include/uapi/asm/unistd.h       |  1 +
>  arch/ia64/kernel/entry.S                  |  1 +
>  arch/m68k/include/uapi/asm/unistd.h       |  1 +
>  arch/m68k/kernel/syscalltable.S           |  1 +

You forgot to update NR_syscalls in arch/m68k/include/asm/unistd.h.

>  arch/microblaze/include/uapi/asm/unistd.h |  1 +
>  arch/microblaze/kernel/syscall_table.S    |  1 +
>  arch/mips/include/uapi/asm/unistd.h       | 15 +++++++++------
>  arch/mips/kernel/scall32-o32.S            |  1 +
>  arch/mips/kernel/scall64-64.S             |  1 +
>  arch/mips/kernel/scall64-n32.S            |  1 +
>  arch/mips/kernel/scall64-o32.S            |  1 +
>  arch/parisc/include/uapi/asm/unistd.h     |  3 ++-
>  arch/powerpc/include/asm/systbl.h         |  1 +
>  arch/powerpc/include/uapi/asm/unistd.h    |  1 +
>  arch/s390/include/uapi/asm/unistd.h       |  3 ++-
>  arch/s390/kernel/compat_wrapper.c         |  1 +
>  arch/s390/kernel/syscalls.S               |  1 +
>  arch/sparc/include/uapi/asm/unistd.h      |  3 ++-
>  arch/sparc/kernel/systbls_32.S            |  2 +-
>  arch/sparc/kernel/systbls_64.S            |  4 ++--
>  arch/x86/syscalls/syscall_32.tbl          |  1 +
>  arch/x86/syscalls/syscall_64.tbl          |  1 +
>  arch/xtensa/include/uapi/asm/unistd.h     |  5 +++--
>  include/linux/syscalls.h                  |  1 +
>  include/uapi/asm-generic/unistd.h         |  4 +++-
>  kernel/sys_ni.c                           |  3 +++
>  29 files changed, 48 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
> index 705bb76..6428823 100644
> --- a/arch/arm/include/uapi/asm/unistd.h
> +++ b/arch/arm/include/uapi/asm/unistd.h
> @@ -413,6 +413,7 @@
>  #define __NR_getrandom                 (__NR_SYSCALL_BASE+384)
>  #define __NR_memfd_create              (__NR_SYSCALL_BASE+385)
>  #define __NR_bpf                       (__NR_SYSCALL_BASE+386)
> +#define __NR_sendfd                    (__NR_SYSCALL_BASE+387)
>
>  /*
>   * The following SWIs are ARM private.
> diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
> index e51833f..30bdeb5 100644
> --- a/arch/arm/kernel/calls.S
> +++ b/arch/arm/kernel/calls.S
> @@ -396,6 +396,7 @@
>                 CALL(sys_getrandom)
>  /* 385 */      CALL(sys_memfd_create)
>                 CALL(sys_bpf)
> +               CALL(sys_sendfd)
>  #ifndef syscalls_counted
>  .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
>  #define syscalls_counted
> diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
> index 9dfdac4..7f19595 100644
> --- a/arch/arm64/include/asm/unistd32.h
> +++ b/arch/arm64/include/asm/unistd32.h
> @@ -794,3 +794,5 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
>  __SYSCALL(__NR_memfd_create, sys_memfd_create)
>  #define __NR_bpf 386
>  __SYSCALL(__NR_bpf, sys_bpf)
> +#define __NR_sendfd 387
> +__SYSCALL(__NR_sendfd, sys_sendfd)
> diff --git a/arch/ia64/include/uapi/asm/unistd.h b/arch/ia64/include/uapi/asm/unistd.h
> index 4c2240c..55be68c 100644
> --- a/arch/ia64/include/uapi/asm/unistd.h
> +++ b/arch/ia64/include/uapi/asm/unistd.h
> @@ -331,5 +331,6 @@
>  #define __NR_getrandom                 1339
>  #define __NR_memfd_create              1340
>  #define __NR_bpf                       1341
> +#define __NR_sendfd                    1342
>
>  #endif /* _UAPI_ASM_IA64_UNISTD_H */
> diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
> index f5e96df..97596a3 100644
> --- a/arch/ia64/kernel/entry.S
> +++ b/arch/ia64/kernel/entry.S
> @@ -1779,6 +1779,7 @@ sys_call_table:
>         data8 sys_getrandom
>         data8 sys_memfd_create                  // 1340
>         data8 sys_bpf
> +       data8 sys_sendfd
>
>         .org sys_call_table + 8*NR_syscalls     // guard against failures to increase NR_syscalls
>  #endif /* __IA64_ASM_PARAVIRTUALIZED_NATIVE */
> diff --git a/arch/m68k/include/uapi/asm/unistd.h b/arch/m68k/include/uapi/asm/unistd.h
> index 2c1bec9..77e7098 100644
> --- a/arch/m68k/include/uapi/asm/unistd.h
> +++ b/arch/m68k/include/uapi/asm/unistd.h
> @@ -360,5 +360,6 @@
>  #define __NR_getrandom         352
>  #define __NR_memfd_create      353
>  #define __NR_bpf               354
> +#define __NR_sendfd            355
>
>  #endif /* _UAPI_ASM_M68K_UNISTD_H_ */
> diff --git a/arch/m68k/kernel/syscalltable.S b/arch/m68k/kernel/syscalltable.S
> index 2ca219e..3ea20d4 100644
> --- a/arch/m68k/kernel/syscalltable.S
> +++ b/arch/m68k/kernel/syscalltable.S
> @@ -375,4 +375,5 @@ ENTRY(sys_call_table)
>         .long sys_getrandom
>         .long sys_memfd_create
>         .long sys_bpf
> +       .long sys_sendfd
>
> diff --git a/arch/microblaze/include/uapi/asm/unistd.h b/arch/microblaze/include/uapi/asm/unistd.h
> index c712677..f69e30a 100644
> --- a/arch/microblaze/include/uapi/asm/unistd.h
> +++ b/arch/microblaze/include/uapi/asm/unistd.h
> @@ -403,5 +403,6 @@
>  #define __NR_getrandom         385
>  #define __NR_memfd_create      386
>  #define __NR_bpf               387
> +#define __NR_sendfd            388
>
>  #endif /* _UAPI_ASM_MICROBLAZE_UNISTD_H */
> diff --git a/arch/microblaze/kernel/syscall_table.S b/arch/microblaze/kernel/syscall_table.S
> index 0166e89..1550f45 100644
> --- a/arch/microblaze/kernel/syscall_table.S
> +++ b/arch/microblaze/kernel/syscall_table.S
> @@ -388,3 +388,4 @@ ENTRY(sys_call_table)
>         .long sys_getrandom             /* 385 */
>         .long sys_memfd_create
>         .long sys_bpf
> +       .long sys_sendfd
> diff --git a/arch/mips/include/uapi/asm/unistd.h b/arch/mips/include/uapi/asm/unistd.h
> index d001bb1..24109dc 100644
> --- a/arch/mips/include/uapi/asm/unistd.h
> +++ b/arch/mips/include/uapi/asm/unistd.h
> @@ -376,16 +376,17 @@
>  #define __NR_getrandom                 (__NR_Linux + 353)
>  #define __NR_memfd_create              (__NR_Linux + 354)
>  #define __NR_bpf                       (__NR_Linux + 355)
> +#define __NR_sendfd                    (__NR_Linux + 356)
>
>  /*
>   * Offset of the last Linux o32 flavoured syscall
>   */
> -#define __NR_Linux_syscalls            355
> +#define __NR_Linux_syscalls            356
>
>  #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
>
>  #define __NR_O32_Linux                 4000
> -#define __NR_O32_Linux_syscalls                355
> +#define __NR_O32_Linux_syscalls                356
>
>  #if _MIPS_SIM == _MIPS_SIM_ABI64
>
> @@ -709,16 +710,17 @@
>  #define __NR_getrandom                 (__NR_Linux + 313)
>  #define __NR_memfd_create              (__NR_Linux + 314)
>  #define __NR_bpf                       (__NR_Linux + 315)
> +#define __NR_sendfd                    (__NR_Linux + 316)
>
>  /*
>   * Offset of the last Linux 64-bit flavoured syscall
>   */
> -#define __NR_Linux_syscalls            315
> +#define __NR_Linux_syscalls            316
>
>  #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
>
>  #define __NR_64_Linux                  5000
> -#define __NR_64_Linux_syscalls         315
> +#define __NR_64_Linux_syscalls         316
>
>  #if _MIPS_SIM == _MIPS_SIM_NABI32
>
> @@ -1046,15 +1048,16 @@
>  #define __NR_getrandom                 (__NR_Linux + 317)
>  #define __NR_memfd_create              (__NR_Linux + 318)
>  #define __NR_bpf                       (__NR_Linux + 319)
> +#define __NR_sendfd                    (__NR_Linux + 320)
>
>  /*
>   * Offset of the last N32 flavoured syscall
>   */
> -#define __NR_Linux_syscalls            319
> +#define __NR_Linux_syscalls            320
>
>  #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
>
>  #define __NR_N32_Linux                 6000
> -#define __NR_N32_Linux_syscalls                319
> +#define __NR_N32_Linux_syscalls                320
>
>  #endif /* _UAPI_ASM_UNISTD_H */
> diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
> index 00cad10..94a7014 100644
> --- a/arch/mips/kernel/scall32-o32.S
> +++ b/arch/mips/kernel/scall32-o32.S
> @@ -580,3 +580,4 @@ EXPORT(sys_call_table)
>         PTR     sys_getrandom
>         PTR     sys_memfd_create
>         PTR     sys_bpf                         /* 4355 */
> +       PTR     sys_sendfd
> diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
> index 5251565..cc2440d 100644
> --- a/arch/mips/kernel/scall64-64.S
> +++ b/arch/mips/kernel/scall64-64.S
> @@ -435,4 +435,5 @@ EXPORT(sys_call_table)
>         PTR     sys_getrandom
>         PTR     sys_memfd_create
>         PTR     sys_bpf                         /* 5315 */
> +       PTR     sys_sendfd
>         .size   sys_call_table,.-sys_call_table
> diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
> index 77e7439..ff1de3a 100644
> --- a/arch/mips/kernel/scall64-n32.S
> +++ b/arch/mips/kernel/scall64-n32.S
> @@ -428,4 +428,5 @@ EXPORT(sysn32_call_table)
>         PTR     sys_getrandom
>         PTR     sys_memfd_create
>         PTR     sys_bpf
> +       PTR     sys_sendfd
>         .size   sysn32_call_table,.-sysn32_call_table
> diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
> index 6f8db9f..87d3a33 100644
> --- a/arch/mips/kernel/scall64-o32.S
> +++ b/arch/mips/kernel/scall64-o32.S
> @@ -565,4 +565,5 @@ EXPORT(sys32_call_table)
>         PTR     sys_getrandom
>         PTR     sys_memfd_create
>         PTR     sys_bpf                         /* 4355 */
> +       PTR     sys_sendfd
>         .size   sys32_call_table,.-sys32_call_table
> diff --git a/arch/parisc/include/uapi/asm/unistd.h b/arch/parisc/include/uapi/asm/unistd.h
> index 5f5c037..f182787 100644
> --- a/arch/parisc/include/uapi/asm/unistd.h
> +++ b/arch/parisc/include/uapi/asm/unistd.h
> @@ -834,8 +834,9 @@
>  #define __NR_getrandom         (__NR_Linux + 339)
>  #define __NR_memfd_create      (__NR_Linux + 340)
>  #define __NR_bpf               (__NR_Linux + 341)
> +#define __NR_sendfd            (__NR_Linux + 342)
>
> -#define __NR_Linux_syscalls    (__NR_bpf + 1)
> +#define __NR_Linux_syscalls    (__NR_sendfd + 1)
>
>
>  #define __IGNORE_select                /* newselect */
> diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
> index ce9577d..4aa6c22 100644
> --- a/arch/powerpc/include/asm/systbl.h
> +++ b/arch/powerpc/include/asm/systbl.h
> @@ -366,3 +366,4 @@ SYSCALL_SPU(seccomp)
>  SYSCALL_SPU(getrandom)
>  SYSCALL_SPU(memfd_create)
>  SYSCALL_SPU(bpf)
> +SYSCALL_SPU(sendfd)
> diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
> index f55351f..2d55338 100644
> --- a/arch/powerpc/include/uapi/asm/unistd.h
> +++ b/arch/powerpc/include/uapi/asm/unistd.h
> @@ -384,5 +384,6 @@
>  #define __NR_getrandom         359
>  #define __NR_memfd_create      360
>  #define __NR_bpf               361
> +#define __NR_sendfd            362
>
>  #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
> diff --git a/arch/s390/include/uapi/asm/unistd.h b/arch/s390/include/uapi/asm/unistd.h
> index 4197c89..7248c4a 100644
> --- a/arch/s390/include/uapi/asm/unistd.h
> +++ b/arch/s390/include/uapi/asm/unistd.h
> @@ -287,7 +287,8 @@
>  #define __NR_getrandom         349
>  #define __NR_memfd_create      350
>  #define __NR_bpf               351
> -#define NR_syscalls 352
> +#define __NR_sendfd            352
> +#define NR_syscalls 353
>
>  /*
>   * There are some system calls that are not present on 64 bit, some
> diff --git a/arch/s390/kernel/compat_wrapper.c b/arch/s390/kernel/compat_wrapper.c
> index c4f7a3d..d931326 100644
> --- a/arch/s390/kernel/compat_wrapper.c
> +++ b/arch/s390/kernel/compat_wrapper.c
> @@ -218,3 +218,4 @@ COMPAT_SYSCALL_WRAP3(seccomp, unsigned int, op, unsigned int, flags, const char
>  COMPAT_SYSCALL_WRAP3(getrandom, char __user *, buf, size_t, count, unsigned int, flags)
>  COMPAT_SYSCALL_WRAP2(memfd_create, const char __user *, uname, unsigned int, flags)
>  COMPAT_SYSCALL_WRAP3(bpf, int, cmd, union bpf_attr *, attr, unsigned int, size);
> +COMPAT_SYSCALL_WRAP3(sendfd, pid_t, pid, int, sig, int, fd);
> diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S
> index 9f7087f..b1beaf1 100644
> --- a/arch/s390/kernel/syscalls.S
> +++ b/arch/s390/kernel/syscalls.S
> @@ -360,3 +360,4 @@ SYSCALL(sys_seccomp,sys_seccomp,compat_sys_seccomp)
>  SYSCALL(sys_getrandom,sys_getrandom,compat_sys_getrandom)
>  SYSCALL(sys_memfd_create,sys_memfd_create,compat_sys_memfd_create) /* 350 */
>  SYSCALL(sys_bpf,sys_bpf,compat_sys_bpf)
> +SYSCALL(sys_sendfd,sys_sendfd,compat_sys_sendfd)
> diff --git a/arch/sparc/include/uapi/asm/unistd.h b/arch/sparc/include/uapi/asm/unistd.h
> index 46d8384..a43637a 100644
> --- a/arch/sparc/include/uapi/asm/unistd.h
> +++ b/arch/sparc/include/uapi/asm/unistd.h
> @@ -415,8 +415,9 @@
>  #define __NR_getrandom         347
>  #define __NR_memfd_create      348
>  #define __NR_bpf               349
> +#define __NR_sendfd            350
>
> -#define NR_syscalls            350
> +#define NR_syscalls            351
>
>  /* Bitmask values returned from kern_features system call.  */
>  #define KERN_FEATURE_MIXED_MODE_STACK  0x00000001
> diff --git a/arch/sparc/kernel/systbls_32.S b/arch/sparc/kernel/systbls_32.S
> index ad0cdf4..1b3ff92 100644
> --- a/arch/sparc/kernel/systbls_32.S
> +++ b/arch/sparc/kernel/systbls_32.S
> @@ -86,4 +86,4 @@ sys_call_table:
>  /*330*/        .long sys_fanotify_mark, sys_prlimit64, sys_name_to_handle_at, sys_open_by_handle_at, sys_clock_adjtime
>  /*335*/        .long sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
>  /*340*/        .long sys_ni_syscall, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
> -/*345*/        .long sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
> +/*345*/        .long sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf, sys_sendfd
> diff --git a/arch/sparc/kernel/systbls_64.S b/arch/sparc/kernel/systbls_64.S
> index 580cde9..ebbafb1 100644
> --- a/arch/sparc/kernel/systbls_64.S
> +++ b/arch/sparc/kernel/systbls_64.S
> @@ -87,7 +87,7 @@ sys_call_table32:
>  /*330*/        .word compat_sys_fanotify_mark, sys_prlimit64, sys_name_to_handle_at, compat_sys_open_by_handle_at, compat_sys_clock_adjtime
>         .word sys_syncfs, compat_sys_sendmmsg, sys_setns, compat_sys_process_vm_readv, compat_sys_process_vm_writev
>  /*340*/        .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
> -       .word sys32_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
> +       .word sys32_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf, sys_sendfd
>
>  #endif /* CONFIG_COMPAT */
>
> @@ -166,4 +166,4 @@ sys_call_table:
>  /*330*/        .word sys_fanotify_mark, sys_prlimit64, sys_name_to_handle_at, sys_open_by_handle_at, sys_clock_adjtime
>         .word sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
>  /*340*/        .word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
> -       .word sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
> +       .word sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf, sys_sendfd
> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
> index 9fe1b5d..dfe91f7 100644
> --- a/arch/x86/syscalls/syscall_32.tbl
> +++ b/arch/x86/syscalls/syscall_32.tbl
> @@ -364,3 +364,4 @@
>  355    i386    getrandom               sys_getrandom
>  356    i386    memfd_create            sys_memfd_create
>  357    i386    bpf                     sys_bpf
> +358    i386    sendfd                  sys_sendfd
> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
> index 281150b..4d6b55d 100644
> --- a/arch/x86/syscalls/syscall_64.tbl
> +++ b/arch/x86/syscalls/syscall_64.tbl
> @@ -328,6 +328,7 @@
>  319    common  memfd_create            sys_memfd_create
>  320    common  kexec_file_load         sys_kexec_file_load
>  321    common  bpf                     sys_bpf
> +322    common  sendfd                  sys_sendfd
>
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/arch/xtensa/include/uapi/asm/unistd.h b/arch/xtensa/include/uapi/asm/unistd.h
> index db5bb72..3705d28 100644
> --- a/arch/xtensa/include/uapi/asm/unistd.h
> +++ b/arch/xtensa/include/uapi/asm/unistd.h
> @@ -749,8 +749,9 @@ __SYSCALL(337, sys_seccomp, 3)
>  __SYSCALL(338, sys_getrandom, 3)
>  #define __NR_memfd_create                      339
>  __SYSCALL(339, sys_memfd_create, 2)
> -
> -#define __NR_syscall_count                     340
> +#define __NR_sendfd                            340
> +__SYSCALL(340, sys_sendfd, 3)
> +#define __NR_syscall_count                     341
>
>  /*
>   * sysxtensa syscall handler
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index bda9b81..1871b72f 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -877,4 +877,5 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
>  asmlinkage long sys_getrandom(char __user *buf, size_t count,
>                               unsigned int flags);
>  asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
> +asmlinkage long sys_sendfd(pid_t pid, int sig, int fd);
>  #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 22749c1..270aa02 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -707,9 +707,11 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
>  __SYSCALL(__NR_memfd_create, sys_memfd_create)
>  #define __NR_bpf 280
>  __SYSCALL(__NR_bpf, sys_bpf)
> +#define __NR_sendfd 281
> +__SYSCALL(__NR_sendfd, sys_sendfd)
>
>  #undef __NR_syscalls
> -#define __NR_syscalls 281
> +#define __NR_syscalls 282
>
>  /*
>   * All syscalls below here should go away really,
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 02aa418..353cddb 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -224,3 +224,6 @@ cond_syscall(sys_seccomp);
>
>  /* access BPF programs and maps */
>  cond_syscall(sys_bpf);
> +
> +/* send file descriptor to another process */
> +cond_syscall(sys_sendfd);
> --
> 1.8.3.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



-- 
Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures)
  2014-12-02  8:01   ` Geert Uytterhoeven
@ 2014-12-02  8:31     ` Alex Dubov
  2014-12-02 11:42     ` Michal Simek
  1 sibling, 0 replies; 28+ messages in thread
From: Alex Dubov @ 2014-12-02  8:31 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linux-kernel, Alex Dubov

On Tue, Dec 2, 2014 at 7:01 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> This really needs a CC to linux-arch (added).
>
>
> You forgot to update NR_syscalls in arch/m68k/include/asm/unistd.h.
>

Noted. I would assume that other architectures may have similar
problems (I only tested
my submission on x86_64).

Will try to fix those when/if there's progress toward accepting the
proposed feature.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures)
  2014-12-02  8:01   ` Geert Uytterhoeven
  2014-12-02  8:31     ` Alex Dubov
@ 2014-12-02 11:42     ` Michal Simek
  2014-12-02 14:31       ` Alex Dubov
  1 sibling, 1 reply; 28+ messages in thread
From: Michal Simek @ 2014-12-02 11:42 UTC (permalink / raw)
  To: Geert Uytterhoeven, Alex Dubov; +Cc: linux-kernel, Alex Dubov

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

On 12/02/2014 09:01 AM, Geert Uytterhoeven wrote:
> This really needs a CC to linux-arch (added).
> 
> On Tue, Dec 2, 2014 at 5:35 AM, Alex Dubov <alex.dubov@gmail.com> wrote:
>> Signed-off-by: Alex Dubov <oakad@yahoo.com>
>> ---
>>  arch/arm/include/uapi/asm/unistd.h        |  1 +
>>  arch/arm/kernel/calls.S                   |  1 +
>>  arch/arm64/include/asm/unistd32.h         |  2 ++
>>  arch/ia64/include/uapi/asm/unistd.h       |  1 +
>>  arch/ia64/kernel/entry.S                  |  1 +
>>  arch/m68k/include/uapi/asm/unistd.h       |  1 +
>>  arch/m68k/kernel/syscalltable.S           |  1 +
> 
> You forgot to update NR_syscalls in arch/m68k/include/asm/unistd.h.
> 
>>  arch/microblaze/include/uapi/asm/unistd.h |  1 +
>>  arch/microblaze/kernel/syscall_table.S    |  1 +


The same for microblaze here
arch/microblaze/include/asm/unistd.h

Thanks,
Michal

-- 
Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91
w: www.monstr.eu p: +42-0-721842854
Maintainer of Linux kernel - Microblaze cpu - http://www.monstr.eu/fdt/
Maintainer of Linux kernel - Xilinx Zynq ARM architecture
Microblaze U-BOOT custodian and responsible for u-boot arm zynq platform



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-02  4:35 ` [PATCH 1/2] fs: introduce sendfd() syscall Alex Dubov
@ 2014-12-02 12:50   ` Eric Dumazet
  2014-12-02 14:47     ` Alex Dubov
  2014-12-02 17:00   ` Al Viro
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-12-02 12:50 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-kernel, Alex Dubov

On Tue, 2014-12-02 at 15:35 +1100, Alex Dubov wrote:
> Present patch introduces exceptionally easy to use, low latency and low
> overhead mechanism for transferring file descriptors between cooperating
> processes:
> 
>     int sendfd(pid_t pid, int sig, int fd)
> 
> Given a target process pid, the sendfd() syscall will create a duplicate
> file descriptor in a target task's (referred by pid) file table pointing to
> the file references by descriptor fd. Then, it will attempt to notify the
> target task by issuing a Posix.1b real-time signal (sig), carrying the new
> file descriptor as integer payload. If real-time signal can not be enqueued
> at the destination signal queue, the newly created file descriptor will be
> promptly closed.
> 
> Signed-off-by: Alex Dubov <oakad@yahoo.com>
> ---

User A can send fd(s) to processes belonging to user B, even if user B
does (probably) not want this to happen ?

Also, relying on signals seems quite old fashion these days. How about
multi-threaded programs wanting separate channels to receive fds ?

Ability to flood fds and fill target file descriptors table looks very
dangerous to me. Some programs could break as they expect they control
fd allocations.

I like the idea of not having to use AF_UNIX and stick to a well defined
interface, but I do not like this asynchronous model.

Thanks.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures)
  2014-12-02 11:42     ` Michal Simek
@ 2014-12-02 14:31       ` Alex Dubov
  2014-12-02 14:38         ` Geert Uytterhoeven
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Dubov @ 2014-12-02 14:31 UTC (permalink / raw)
  To: monstr; +Cc: Geert Uytterhoeven, linux-kernel, Alex Dubov

On Tue, Dec 2, 2014 at 10:42 PM, Michal Simek <monstr@monstr.eu> wrote:
> On 12/02/2014 09:01 AM, Geert Uytterhoeven wrote:
>> This really needs a CC to linux-arch (added).
>>
>> On Tue, Dec 2, 2014 at 5:35 AM, Alex Dubov <alex.dubov@gmail.com> wrote:
>>> Signed-off-by: Alex Dubov <oakad@yahoo.com>
>>> ---

>
> The same for microblaze here
> arch/microblaze/include/asm/unistd.h
>

This invites the question as to why the __NR_syscalls macro is not
defined in uapi/asm/unistd.h on those architectures, where it will be
easier to spot? After all, asm/unistd.h includes uapi/asm/unistd.h
unconditionally.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures)
  2014-12-02 14:31       ` Alex Dubov
@ 2014-12-02 14:38         ` Geert Uytterhoeven
  0 siblings, 0 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2014-12-02 14:38 UTC (permalink / raw)
  To: Alex Dubov; +Cc: Michal Simek, linux-kernel, Alex Dubov

Hi Alex,

On Tue, Dec 2, 2014 at 3:31 PM, Alex Dubov <alex.dubov@gmail.com> wrote:
> On Tue, Dec 2, 2014 at 10:42 PM, Michal Simek <monstr@monstr.eu> wrote:
>> On 12/02/2014 09:01 AM, Geert Uytterhoeven wrote:
>>> This really needs a CC to linux-arch (added).
>>>
>>> On Tue, Dec 2, 2014 at 5:35 AM, Alex Dubov <alex.dubov@gmail.com> wrote:
>>>> Signed-off-by: Alex Dubov <oakad@yahoo.com>
>>>> ---
>
>> The same for microblaze here
>> arch/microblaze/include/asm/unistd.h
>
> This invites the question as to why the __NR_syscalls macro is not
> defined in uapi/asm/unistd.h on those architectures, where it will be
> easier to spot? After all, asm/unistd.h includes uapi/asm/unistd.h
> unconditionally.

Because it's not part of the ABI?

There may be multiple ABIs, with multiple syscall ranges.
Userspace only needs to know if a syscall is available, not what the
valid syscall number range is.
The kernel does need to know the size of the full syscall table.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-02 12:50   ` Eric Dumazet
@ 2014-12-02 14:47     ` Alex Dubov
  2014-12-02 15:33       ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Dubov @ 2014-12-02 14:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, Alex Dubov

> User A can send fd(s) to processes belonging to user B, even if user B
> does (probably) not want this to happen ?

1. Process A must have sufficient permissions to signal process B.
This will only happen if process A belongs to the same user as process
B or has elevated capabilities, which can not appear by themselves
(and if root on some machine can not be trusted, then all is lost
anyway).

2. If process B has not specified explicitly how it wants the
particular signal to be handled, it will be killed by the default
handler. End of story, nothing else is going to happen.

I suppose, I can add an extra permissions check prior to creating the
new file descriptor in the first place.

> Also, relying on signals seems quite old fashion these days. How about
> multi-threaded programs wanting separate channels to receive fds ?

Most multi-threaded programs share the same file table between all
threads (unless some fancy clone() magic is involved), so the issue is
rather mundane. At any rate, each thread has its own pid and the usual
signal routing applies.

At a more generic level Posix real-time signals are anything, but
old-fashioned. sigqueue()/signalfd() pair provides a very convenient,
low overhead micro-messaging facility with ordered, reliably delivery.
I fail to see what's wrong with making a worthy use of it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Minimal effort/low overhead file descriptor duplication over Posix.1b s
  2014-12-02  4:35 Minimal effort/low overhead file descriptor duplication over Posix.1b s Alex Dubov
  2014-12-02  4:35 ` [PATCH 1/2] fs: introduce sendfd() syscall Alex Dubov
  2014-12-02  4:35 ` [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures) Alex Dubov
@ 2014-12-02 15:26 ` Jonathan Corbet
  2014-12-02 16:15   ` Alex Dubov
  2014-12-17 13:11 ` Kevin Easton
  3 siblings, 1 reply; 28+ messages in thread
From: Jonathan Corbet @ 2014-12-02 15:26 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-kernel, linux-api

On Tue,  2 Dec 2014 15:35:17 +1100
Alex Dubov <alex.dubov@gmail.com> wrote:

>     int sendfd(pid_t pid, int sig, int fd)
> 
> Given a target process pid, the sendfd() syscall will create a duplicate
> file descriptor in a target task's (referred by pid) file table pointing to
> the file references by descriptor fd. Then, it will attempt to notify the
> target task by issuing a Posix.1b real-time signal (sig), carrying the new
> file descriptor as integer payload. If real-time signal can not be enqueued
> at the destination signal queue, the newly created file descriptor will be
> promptly closed.

[ CC += linux-api ]

So I'm not a syscall API design expert, but this one raises a few
questions with me.

 - Messing with another process's file descriptor table without its
   knowledge looks like a possible source of all kinds problems.  Might
   there be race conditions with close()/dup() code, for example?  And
   remember that users can be root in a user namespace; maybe there's no
   potential for mischief there, but it needs to be considered.

 - Forcing the use of realtime signals seems strange; this isn't a
   realtime operation by any stretch.

 - How might the sending process communicate to the recipient what the fd
   is for?  Even if a process only expects one type of file descriptor,
   the ability to communicate information other than its number seems
   like it would often be useful.

Some of these concerns might be addressable by requiring the recipient to
call acceptfd() (or some such) with the ability to use poll().  As an
alternative, I believe kdbus has fd-passing abilities; if kdbus goes in,
would you still need this feature?

Thanks,

jon

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-02 14:47     ` Alex Dubov
@ 2014-12-02 15:33       ` Eric Dumazet
  2014-12-02 16:23         ` Alex Dubov
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-12-02 15:33 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-kernel, Alex Dubov

On Wed, 2014-12-03 at 01:47 +1100, Alex Dubov wrote:
> > User A can send fd(s) to processes belonging to user B, even if user B
> > does (probably) not want this to happen ?
> 
> 1. Process A must have sufficient permissions to signal process B.
> This will only happen if process A belongs to the same user as process
> B or has elevated capabilities, which can not appear by themselves
> (and if root on some machine can not be trusted, then all is lost
> anyway).
> 

I do not see this enforced in your patch.

Allowing a process to hold many times the lock protecting my file
descriptor table is very scary.

Reserving a slot, then undo this if the signal failed is a nice way to
slow down critical programs and eventually block them from doing
progress when using file descriptors (most system calls afaik)


> 2. If process B has not specified explicitly how it wants the
> particular signal to be handled, it will be killed by the default
> handler. End of story, nothing else is going to happen.

So it seems possible for an arbitrary program to send fds to innocent
programs, that will likely fill their fd table and wont be able to open
a new file.

This opens interesting security issues and attack vectors.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Minimal effort/low overhead file descriptor duplication over Posix.1b s
  2014-12-02 15:26 ` Minimal effort/low overhead file descriptor duplication over Posix.1b s Jonathan Corbet
@ 2014-12-02 16:15   ` Alex Dubov
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Dubov @ 2014-12-02 16:15 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-kernel, linux-api

On Wed, Dec 3, 2014 at 2:26 AM, Jonathan Corbet <corbet@lwn.net> wrote:
> On Tue,  2 Dec 2014 15:35:17 +1100
> Alex Dubov <alex.dubov@gmail.com> wrote:
>
>
>  - Messing with another process's file descriptor table without its
>    knowledge looks like a possible source of all kinds problems.  Might
>    there be race conditions with close()/dup() code, for example?  And
>    remember that users can be root in a user namespace; maybe there's no
>    potential for mischief there, but it needs to be considered.

If process A has sufficient permissions to signal process B, it can
already do arbitrary mischief, no news there (SIGKILL and SIGSTOP will
definitely cause more havoc :-).

I don't believe there can be any race conditions as this is not
different to what happens when dup() is invoked from one of the
threads in multi-threaded application, whereupon other threads go on
with their usual file operations. Descriptor duplication happens prior
to any signal handling activities.

>  - Forcing the use of realtime signals seems strange; this isn't a
>    realtime operation by any stretch.

"Real time signals" are merely a misleading name for Posix.1b
micro-messaging facility. To the best of my knowledge they do not
affect scheduling any more then SIGIO or SIGALRM would.

As Posix.1b signals are best handled by signalfd() facility anyway, no
impact on scheduling compared to any other approach (including the
existing domain socket approach) is expected at all.

>
>  - How might the sending process communicate to the recipient what the fd
>    is for?  Even if a process only expects one type of file descriptor,
>    the ability to communicate information other than its number seems
>    like it would often be useful.

There are 32 "real time" signals defined by default in kernel; this
range can be increased at will with kernel recompilation and glibc
will pick up the correct range automatically (this is Posix mandated
behavior and it actually works like that).

I have not seen an app yet that relied on more than half a dozen of
distinct signal numbers. Thus any application can conveniently define
more than 2 dozens of different fd varieties out of the box, delivered
to it with dedicated signal ids, whereupon in most practical
applications only 1 or 2 varieties of file descriptors are ever passed
around.

>
> Some of these concerns might be addressable by requiring the recipient to
> call acceptfd() (or some such) with the ability to use poll().  As an
> alternative, I believe kdbus has fd-passing abilities; if kdbus goes in,
> would you still need this feature?

Any process willing to handle Posix.1b signals must explicitly
manipulate the signal masks - otherwise it will be killed the moment
signal is received. Thus, no special "acceptfd()" call is necessary on
the receiver side - applications usually don't  modify their signal
masks unless they expect some particular signal to arrive.

kdbus has something like it and binder on android has it as well. The
problem with both of them are the same as with unix domain sockets
(which implement a whole, rather convoluted, cmsg facility to be ever
used for that single purpose): they try to solve big problems with
fancy functionality, whereupon fd passing is a nice side feature
(which then gets used the most).

To my understanding, commonly used functionality deserves to have its
own quick, low overhead path:

1. We've got eventfd() which is neat and all, but to use it we need an
easy way to pass its fd around.

2. We've got memfd() which is also neat, but to use it..

3. We've got fairly complex (and consequently buggy) functionality
like SO_REUSEPORT, but I can't avoid a feeling that if there was a low
overhead transport available to path fds around (like the one
proposed), the old school approach of having one process running
tightly around accept() and sending sockets to workers may still rival
it (pity I don't have google's setup around to test it).

4. Most importantly, when network appliances are concerned (and those
represent a huge percentage of linux install base), it is desirable to
have the leanest possible code paths both in kernel and in the user
space (no functionality - no vulnerabilities to fish for) and still be
able to rely on multi-process applications (as multi-process
applications are considerably more reliable then multi-threaded ones,
for all the obvious reasons). A compact, easily traceable facility
comprising few hundred LOCs in the kernel, end to end, and very simple
application code (sigqueue() -> signalfd()) pose a distinct advantage
in this regard over largish subsystems which may provide similar
feature (invariable at the expense of unnecessary costs, like
persistent file system objects, specialized user-space libraries, etc)
.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-02 15:33       ` Eric Dumazet
@ 2014-12-02 16:23         ` Alex Dubov
  2014-12-02 16:42           ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Dubov @ 2014-12-02 16:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, Alex Dubov

On Wed, Dec 3, 2014 at 2:33 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-12-03 at 01:47 +1100, Alex Dubov wrote:
>> > User A can send fd(s) to processes belonging to user B, even if user B
>> > does (probably) not want this to happen ?
>>
>> 1. Process A must have sufficient permissions to signal process B.
>> This will only happen if process A belongs to the same user as process
>> B or has elevated capabilities, which can not appear by themselves
>> (and if root on some machine can not be trusted, then all is lost
>> anyway).
>>
>
> I do not see this enforced in your patch.
>
> Allowing a process to hold many times the lock protecting my file
> descriptor table is very scary.
>
> Reserving a slot, then undo this if the signal failed is a nice way to
> slow down critical programs and eventually block them from doing
> progress when using file descriptors (most system calls afaik)

Yes, this is an omission. I already promised to tighten the security
in my last post. :)

>> 2. If process B has not specified explicitly how it wants the
>> particular signal to be handled, it will be killed by the default
>> handler. End of story, nothing else is going to happen.
>
> So it seems possible for an arbitrary program to send fds to innocent
> programs, that will likely fill their fd table and wont be able to open
> a new file.
>
> This opens interesting security issues and attack vectors.

Same as SIGKILL. And yet, our machines are still working fine.

If process A has sufficient capability to send signals to process B,
then process B is already at its mercy, fds or not fds.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-02 16:23         ` Alex Dubov
@ 2014-12-02 16:42           ` Eric Dumazet
  2014-12-03  2:11             ` Alex Dubov
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-12-02 16:42 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-kernel, Alex Dubov

On Wed, 2014-12-03 at 03:23 +1100, Alex Dubov wrote:

> Same as SIGKILL. And yet, our machines are still working fine.
> 
> If process A has sufficient capability to send signals to process B,
> then process B is already at its mercy, fds or not fds.

Tell me how a 128 threads program can use this new mechanism in a
scalable way.

One signal per thread ?

I guess we'll keep AF_UNIX then, thank you.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-02  4:35 ` [PATCH 1/2] fs: introduce sendfd() syscall Alex Dubov
  2014-12-02 12:50   ` Eric Dumazet
@ 2014-12-02 17:00   ` Al Viro
  2014-12-03  2:22     ` Alex Dubov
  1 sibling, 1 reply; 28+ messages in thread
From: Al Viro @ 2014-12-02 17:00 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-kernel, Alex Dubov

On Tue, Dec 02, 2014 at 03:35:18PM +1100, Alex Dubov wrote:
> +	dst_files = get_files_struct(dst_task);
> +	if (!dst_files) {
> +		rc = -EMFILE;
> +		goto out_put_dst_task;
> +	}
> +
> +	if (!lock_task_sighand(dst_task, &flags)) {
> +		rc = -EMFILE;
> +		goto out_put_dst_files;
> +	}
> +
> +	rlim = task_rlimit(dst_task, RLIMIT_NOFILE);
> +
> +	unlock_task_sighand(dst_task, &flags);
> +
> +	rc = __alloc_fd(dst_task->files, 0, rlim, O_CLOEXEC);
> +	if (rc < 0)
> +		goto out_put_dst_files;
> +
> +	s_info.si_int = rc;
> +
> +	get_file(src_file);
> +	__fd_install(dst_files, rc, src_file);
> +	rc = kill_pid_info(sig, &s_info, task_pid(dst_task));
> +
> +	if (rc < 0)
> +		__close_fd(dst_files, s_info.si_int);

Oh, lovely...  And we are guaranteed that it still the same file, because...?

Not to mention anything else, this stuff violates the assumption used in a lot
of places - that the *only* way for a process to modify a descriptor table is
to have a reference to it obtained by something that had it as its current
descriptor table and not dropped since then.  The way you do it might actually
turn out to be OK, but there's no way I'll take that without detailed analysis;
start with refcounting of struct file, for one thing - it does rely on the
assumption above in non-trivial ways.

Binder, shite as it is, satisfies that assumption.  Your "simpler" variant
does not.  Which means that you get to prove that you won't open any races
around fs/file.c.

And that's aside of the points other folks had brought up.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-02 16:42           ` Eric Dumazet
@ 2014-12-03  2:11             ` Alex Dubov
  2014-12-03  6:48               ` Eric Dumazet
  0 siblings, 1 reply; 28+ messages in thread
From: Alex Dubov @ 2014-12-03  2:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, Alex Dubov

On Wed, Dec 3, 2014 at 3:42 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-12-03 at 03:23 +1100, Alex Dubov wrote:
>
> Tell me how a 128 threads program can use this new mechanism in a
> scalable way.
>
> One signal per thread ?

What for?

Kernel will deliver the signal only to the thread/threads which has
the relevant signal unblocked (they are blocked by default).

>
> I guess we'll keep AF_UNIX then, thank you.

Kindly enlighten me, how are you going to use any file descriptor in a
128 threads program in a scalable way (socket and all)? How this
approach will be different when using signalfd()?

And no, I'm not proposing to take your favorite toys away.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-02 17:00   ` Al Viro
@ 2014-12-03  2:22     ` Alex Dubov
  2014-12-03  3:40       ` Al Viro
  2014-12-03  6:50       ` Eric Dumazet
  0 siblings, 2 replies; 28+ messages in thread
From: Alex Dubov @ 2014-12-03  2:22 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Alex Dubov

On Wed, Dec 3, 2014 at 4:00 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Dec 02, 2014 at 03:35:18PM +1100, Alex Dubov wrote:
>> +
>> +     if (rc < 0)
>> +             __close_fd(dst_files, s_info.si_int);
>
> Oh, lovely...  And we are guaranteed that it still the same file, because...?
>
> Not to mention anything else, this stuff violates the assumption used in a lot
> of places - that the *only* way for a process to modify a descriptor table is
> to have a reference to it obtained by something that had it as its current
> descriptor table and not dropped since then.  The way you do it might actually
> turn out to be OK, but there's no way I'll take that without detailed analysis;
> start with refcounting of struct file, for one thing - it does rely on the
> assumption above in non-trivial ways.

Ok, I see the problem here. This indeed requires further thought.

> And that's aside of the points other folks had brought up.

Yours is the first insightful message in this thread. Some of the
other commenters exhibited an unfortunate lack of understanding,
regarding what signals are and what they can be useful for.

Unless, of course, I have missed something important.

On a less related note, I hope you will agree that the simpler
mechanism for this very in-demand feature is long overdue on Linux
(every man and his dog are passing fds around these days).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-03  2:22     ` Alex Dubov
@ 2014-12-03  3:40       ` Al Viro
  2014-12-03  4:14         ` Alex Dubov
  2014-12-03  6:50       ` Eric Dumazet
  1 sibling, 1 reply; 28+ messages in thread
From: Al Viro @ 2014-12-03  3:40 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-kernel, Alex Dubov

On Wed, Dec 03, 2014 at 01:22:33PM +1100, Alex Dubov wrote:

> On a less related note, I hope you will agree that the simpler
> mechanism for this very in-demand feature is long overdue on Linux
> (every man and his dog are passing fds around these days).

... and I'm less than sure that it's a good thing.  If nothing else,
once the pieces of your program are passing descriptors around freely,
you have created a barfball that will be impossible to split between
several boxen if you run into scalability issues.  Descriptor-passing
is limited to a single system; you *can't* do that between e.g. components
of a cluster.  So it's not an unmixed blessing, just as overuse of
shared memory segments, etc.  They do have their uses, but that needs
to be carefully considered every time, or you'll create a major headache
a few years down the road.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-03  3:40       ` Al Viro
@ 2014-12-03  4:14         ` Alex Dubov
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Dubov @ 2014-12-03  4:14 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-kernel, Alex Dubov

On Wed, Dec 3, 2014 at 2:40 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Dec 03, 2014 at 01:22:33PM +1100, Alex Dubov wrote:
>
>> On a less related note, I hope you will agree that the simpler
>> mechanism for this very in-demand feature is long overdue on Linux
>> (every man and his dog are passing fds around these days).
>
> ... and I'm less than sure that it's a good thing.  If nothing else,
> once the pieces of your program are passing descriptors around freely,
> you have created a barfball that will be impossible to split between
> several boxen if you run into scalability issues.  Descriptor-passing
> is limited to a single system; you *can't* do that between e.g. components
> of a cluster.  So it's not an unmixed blessing, just as overuse of
> shared memory segments, etc.  They do have their uses, but that needs
> to be carefully considered every time, or you'll create a major headache
> a few years down the road.

Well, if you try hard enough, you can pass fds around the components
of the cluster - Mosix was doing just that some 10 years ago.
Conceptually, it's even easier than doing distributed shared memory,
as long as mmap is not concerned. :-)

I was, however, looking at it from a different standpoint. Abundance
of cores in the contemporary CPUs calls for locally parallel
applications (and those are still the majority - clearly 90% of the
applications and their workloads fit just fine on a single node).

Thus, any modern application developer faces the usual dilemma:

1. Go multi-threaded - easy inter-thread IPC, lousy reliability with
minor errors in secondary tasks crashing the whole application.

2. Go multi-process - circus hoop jumping when IPC is concerned, great
reliability through OS provided fault isolation (so even really broken
stuff, like PHP plugin for apache manages to perform most of the time
:-)

Memfd (on its own) and eventfd are great steps in the right direction,
as managing persistent shmem and sem objects was always pain in the
arse. If there was an alternative to AF_UNIX fd passing, with its
arcane API, fs persistence and mind boggling fd recursion bugs, then
option 2 would became much more attractive for developers leading to
over-all increase in application reliability and security.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-03  2:11             ` Alex Dubov
@ 2014-12-03  6:48               ` Eric Dumazet
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2014-12-03  6:48 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-kernel, Alex Dubov

On Wed, 2014-12-03 at 13:11 +1100, Alex Dubov wrote:

> Kindly enlighten me, how are you going to use any file descriptor in a
> 128 threads program in a scalable way (socket and all)? How this
> approach will be different when using signalfd()?

Thats the point : use one different channel (AF_UNIX socket, or AF_INET
listener...) per thread.

Each thread uses epoll() on a private epoll fd, and a dedicated channel
to get fds from other processes.

Sharing a signalfd() would be terrible, like using accept() on a single
listener socket :(

Your proposed interface, being tied to legacy signal(s), do not allow
for many multiple channels.

Sorry, but using signals is simply a no go for me.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-03  2:22     ` Alex Dubov
  2014-12-03  3:40       ` Al Viro
@ 2014-12-03  6:50       ` Eric Dumazet
  2014-12-03  8:08         ` Richard Cochran
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2014-12-03  6:50 UTC (permalink / raw)
  To: Alex Dubov; +Cc: Al Viro, linux-kernel, Alex Dubov

On Wed, 2014-12-03 at 13:22 +1100, Alex Dubov wrote:

> Yours is the first insightful message in this thread. Some of the
> other commenters exhibited an unfortunate lack of understanding,
> regarding what signals are and what they can be useful for.

Oh nice.

I think I will ignore your future mails.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-03  6:50       ` Eric Dumazet
@ 2014-12-03  8:08         ` Richard Cochran
  2014-12-03  8:17           ` Richard Weinberger
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Cochran @ 2014-12-03  8:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alex Dubov, Al Viro, linux-kernel, Alex Dubov

On Tue, Dec 02, 2014 at 10:50:46PM -0800, Eric Dumazet wrote:
> I think I will ignore your future mails.

And I won't have time to read them either, because I will be too busy
passing fds to my two collies.

Cheers,
Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-03  8:08         ` Richard Cochran
@ 2014-12-03  8:17           ` Richard Weinberger
  2014-12-03 10:41             ` Richard Cochran
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Weinberger @ 2014-12-03  8:17 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Eric Dumazet, Alex Dubov, Al Viro, linux-kernel, Alex Dubov

On Wed, Dec 3, 2014 at 9:08 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Tue, Dec 02, 2014 at 10:50:46PM -0800, Eric Dumazet wrote:
>> I think I will ignore your future mails.
>
> And I won't have time to read them either, because I will be too busy
> passing fds to my two collies.

Come on guys, get a cup of coffee and relax a bit...

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-03  8:17           ` Richard Weinberger
@ 2014-12-03 10:41             ` Richard Cochran
  2014-12-03 14:08               ` Alex Dubov
  2014-12-05 13:37               ` One Thousand Gnomes
  0 siblings, 2 replies; 28+ messages in thread
From: Richard Cochran @ 2014-12-03 10:41 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Eric Dumazet, Alex Dubov, Al Viro, linux-kernel, Alex Dubov

On Wed, Dec 03, 2014 at 09:17:37AM +0100, Richard Weinberger wrote:
> Come on guys, get a cup of coffee and relax a bit...

I am relaxed, especially after I had a good laugh reading this:

   On a less related note, I hope you will agree that the simpler
   mechanism for this very in-demand feature is long overdue on Linux
   (every man and his dog are passing fds around these days).

Really, in years and years of unix programming, I have not yet felt
the need to pass a file descriptor. Thats goes double for my dogs.

In any case, I find it hard to believe that the traditional method is
really so bad. The explanation of why this new way is needed boils
down to: "unix programming is so hard to get right."

Thanks,
Richard

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-03 10:41             ` Richard Cochran
@ 2014-12-03 14:08               ` Alex Dubov
  2014-12-05 13:37               ` One Thousand Gnomes
  1 sibling, 0 replies; 28+ messages in thread
From: Alex Dubov @ 2014-12-03 14:08 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Richard Weinberger, Eric Dumazet, Al Viro, linux-kernel, Alex Dubov

On Wed, Dec 3, 2014 at 9:41 PM, Richard Cochran
<richardcochran@gmail.com> wrote:
> In any case, I find it hard to believe that the traditional method is
> really so bad. The explanation of why this new way is needed boils
> down to: "unix programming is so hard to get right."


Surely, this can be said about any new feature proposed. Why do we
need this new thing called wheel? We lived 50k years without it just
fine! It all boils down to: "walking with legs is so hard to get
right". :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fs: introduce sendfd() syscall
  2014-12-03 10:41             ` Richard Cochran
  2014-12-03 14:08               ` Alex Dubov
@ 2014-12-05 13:37               ` One Thousand Gnomes
  1 sibling, 0 replies; 28+ messages in thread
From: One Thousand Gnomes @ 2014-12-05 13:37 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Richard Weinberger, Eric Dumazet, Alex Dubov, Al Viro,
	linux-kernel, Alex Dubov

On Wed, 3 Dec 2014 11:41:44 +0100
Richard Cochran <richardcochran@gmail.com> wrote:

> On Wed, Dec 03, 2014 at 09:17:37AM +0100, Richard Weinberger wrote:
> > Come on guys, get a cup of coffee and relax a bit...
> 
> I am relaxed, especially after I had a good laugh reading this:
> 
>    On a less related note, I hope you will agree that the simpler
>    mechanism for this very in-demand feature is long overdue on Linux
>    (every man and his dog are passing fds around these days).
> 
> Really, in years and years of unix programming, I have not yet felt
> the need to pass a file descriptor. Thats goes double for my dogs.

Its underused in part because you need a pointy hat to do it in Unix, but
it's a very common model elsewhere.

Whether you need the syscall or just to write sendfd() acceptfd() in
terms of AF_UNIX sockets in a library and bury the icky bits is another
question. I think the reality is you'd probably end up doing the library
*anyway* to deal with the fact it'll be 5 or more years before sendfd
percolated everywhere even if it was merged today.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Minimal effort/low overhead file descriptor duplication over Posix.1b s
  2014-12-02  4:35 Minimal effort/low overhead file descriptor duplication over Posix.1b s Alex Dubov
                   ` (2 preceding siblings ...)
  2014-12-02 15:26 ` Minimal effort/low overhead file descriptor duplication over Posix.1b s Jonathan Corbet
@ 2014-12-17 13:11 ` Kevin Easton
  3 siblings, 0 replies; 28+ messages in thread
From: Kevin Easton @ 2014-12-17 13:11 UTC (permalink / raw)
  To: Alex Dubov; +Cc: linux-kernel

On Tue, Dec 02, 2014 at 03:35:17PM +1100, Alex Dubov wrote:
> Unfortunately, using facilities like Unix domain sockets to merely pass file
> descriptors between "worker" processes is unnecessarily difficult, due to
> the following common consideration:
> 
> 1. Domain sockets and named pipes are persistent objects. Applications must
> manage their lifetime and devise unambiguous access schemes in case multiple
> application instances are to be run within the same OS instance. Usually, they
> would also require a writable file system to be mounted.

I believe this particular issue has long been addressed in Linux, with
the "abstract namespace" domain sockets.

These aren't persistent - they go away when the bound socket is closed -
and they don't need a writable filesystem.

If you derived the name in the abstract namespace from your PID (or better,
application identifier and PID) then you would have exactly the same
"ambiguous access" scheme as your proposal.

>     int sendfd(pid_t pid, int sig, int fd)

PIDs tend to be regarded as a bit of an iffy way to refer to another
process, because they tend to be racy.  If the process you think you're
talking to dies, and has its PID reused by another unrelated sendfd()-aware
process, you've just sent your open file to somewhere unexpected.

You can avoid that if the process is a child of yours, but in that case
you could have set up a no-fuss domain socket connection with socketpair()
too.

    - Kevin


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-12-17 12:19 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-02  4:35 Minimal effort/low overhead file descriptor duplication over Posix.1b s Alex Dubov
2014-12-02  4:35 ` [PATCH 1/2] fs: introduce sendfd() syscall Alex Dubov
2014-12-02 12:50   ` Eric Dumazet
2014-12-02 14:47     ` Alex Dubov
2014-12-02 15:33       ` Eric Dumazet
2014-12-02 16:23         ` Alex Dubov
2014-12-02 16:42           ` Eric Dumazet
2014-12-03  2:11             ` Alex Dubov
2014-12-03  6:48               ` Eric Dumazet
2014-12-02 17:00   ` Al Viro
2014-12-03  2:22     ` Alex Dubov
2014-12-03  3:40       ` Al Viro
2014-12-03  4:14         ` Alex Dubov
2014-12-03  6:50       ` Eric Dumazet
2014-12-03  8:08         ` Richard Cochran
2014-12-03  8:17           ` Richard Weinberger
2014-12-03 10:41             ` Richard Cochran
2014-12-03 14:08               ` Alex Dubov
2014-12-05 13:37               ` One Thousand Gnomes
2014-12-02  4:35 ` [PATCH 2/2] fs: Wire up sendfd() syscall (all architectures) Alex Dubov
2014-12-02  8:01   ` Geert Uytterhoeven
2014-12-02  8:31     ` Alex Dubov
2014-12-02 11:42     ` Michal Simek
2014-12-02 14:31       ` Alex Dubov
2014-12-02 14:38         ` Geert Uytterhoeven
2014-12-02 15:26 ` Minimal effort/low overhead file descriptor duplication over Posix.1b s Jonathan Corbet
2014-12-02 16:15   ` Alex Dubov
2014-12-17 13:11 ` Kevin Easton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).