LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [GIT PULL] kdbus for 4.1-rc1
@ 2015-04-13 19:03 Greg Kroah-Hartman
  2015-04-13 19:29 ` Eric W. Biederman
                   ` (2 more replies)
  0 siblings, 3 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-13 19:03 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: Arnd Bergmann, ebiederm, gnomes, teg, jkosina, luto,
	linux-kernel, daniel, dh.herrmann, tixxdz

The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:

  Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1

for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:

  kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)

----------------------------------------------------------------
kdbus for 4.1-rc1

Here's the kdbus pull request for 4.1-rc1.

It's been under development for many years now, and been in linux-next
for many months, and has undergone loads of testing a review and even a few
good arguments.  It comes with full documentation and tests.

There has been a few complaints about the code, notably from people who
don't like the use of metadata in the bus messages.  That is actually
one of the main features here, as we can get this data in a secure and
reliable way, and it's something that userspace requires today.  So
while it does look "odd" to people who are not familiar with dbus, this
is something that finally fixes a number of almost unfixable races in
the current dbus implementations.

The rest of this pull request message comes from the kdbus patch posting
messages as sent to lkml previously:

Reasons kdbus should be in the kernel, instead of userspace as it is
currently done today includes the following:

 * Performance: Fewer process context switches, fewer copies, fewer
   syscalls, larger memory chunks via memfd.  This is really important
   for a whole class of userspace programs that are ported from other
   operating systems that are run on tiny ARM systems that rely on
   hundreds of thousands of messages passed at boot time, and at
   "critical" times in their user interaction loops. DBus is not used
   for performance sensitive applications because DBus is slow.
   We want to make it fast so we can finally use it for low-latency,
   high-throughput applications. A simple DBus method-call+reply takes
   200us on an up-to-date test machine, with kdbus it takes 8us (with
   UDS about 2us). If the packet size is increased from 8k to 128k,
   kdbus even beats UDS due to single-copy transfers.

 * Security: The peers which communicate do not have to trust each
   other, as the only trustworthy component in the game is the kernel
   which adds metadata and ensures that all data passed as payload is
   either copied or sealed, so that the receiver can parse the data
   without having to protect against changing memory while parsing
   buffers. Also, all the data transfer is controlled by the kernel,
   so that LSMs can track and control what is going on, without
   involving userspace. Because of the LSM issue, security people are
   much happier with this model than the current scheme of having to
   hook into dbus to mediate things.
 * More types of metadata can be attached to messages than in userspace

 * Semantics for apps with heavy data payloads (media apps, for
   instance) with optinal priority message dequeuing, and global
   message ordering. Some "crazy" people are playing with using kdbus
   for audio data in the system.  I'm not saying that this is the best
   model for this, but until now, there wasn't any other way to do this
   without having to create custom "buses", one for each application
   library.

 * Being in the kernel closes a lot of races which can't be fixed with
   the current userspace solutions.  For example, with kdbus, there is a
   way a client can disconnect from a bus, but do so only if no further
   messages present in its queue, which is crucial for implementing
   race-free "exit-on-idle" services

 * Eavesdropping on the kernel level, so privileged users can hook into
   the message stream without hacking support for that into their
   userspace processes

 * A number of smaller benefits: for example kdbus learned a way to peek
   full messages without dequeing them, which is really useful for
   logging metadata when handling bus-activation requests.

 * dbus-daemon is not available during early-boot or shutdown.

DBus marshaling is the de-facto standard in all major(!) Linux desktop
systems. It is well established and accepted by many DEs. It also
solves many other problems, including: policy, authentication /
authorization, well-known name registry, efficient broadcasts /
multicasts, peer discovery, bus discovery, metadata transmission, and
more.

It is a shame that we cannot use this well-established protocol for
low-latency applications. We, effectively, have to duplicate all this
code on custom UDS and other transports just because DBus is too slow.
kdbus tries to unify those efforts, so that we don't need multiple
policy implementations, name registries and peer discovery mechanisms.
Furthermore, kdbus implements comprehensive, yet optional, metadata
transmission that allows to identify and authenticate peers in a
race-free manner (which is *not* possible with UDS).

Also, kdbus provides a single transport bus with sequential message
numbering. If you use multiple channels, you cannot give any ordering
guarantees across peers (for instance, regarding parallel name-registry
changes).

Of course, some of the bits above could be implemented in userspace
alone, for example with more sophisticated memory management APIs, but
this is usually done by losing out on the other details.  For example,
for many of the memory management APIs, it's hard to not require the
communicating peers to fully trust each other.  And we _really_ don't
want peers to have to trust each other.

Another benefit of having this in the kernel, rather than as a userspace
daemon, is that you can now easily use the bus from the initrd, or up to
the very end when the system shuts down.  On current userspace D-Bus,
this is not really possible, as this requires passing the bus instance
around between initrd and the "real" system.  Such a transition of all
fds also requires keeping full state of what has already been read from
the connection fds.  kdbus makes this much simpler, as we can change the
ownership of the bus, just by passing one fd over from one part to the
other.

Given the theoretical advantages above, here are some real-world
examples:

 * The Tizen developers have been complaining about the high latency
   of DBus for polkit'ish policy queries. That's why their
   authentication framework uses custom UDS sockets (called 'Cynara').
   If a UI-interaction needs multiple authentication-queries, you don't
   want it to take multiple milliseconds, given that you usually want
   to render the result in the same frame.

 * PulseAudio doesn't use DBus for data transmission. They had to
   implement their own marshaling code, transport layer and so on, just
   because DBus1-latency is horrible. With kdbus, we can basically drop
   this code-duplication and unify the IPC layer. Same is true for
   Wayland, btw.

 * By moving broadcast-transmission into the kernel, we can use the
   time-slices of the sender to perform heavy operations. This is also
   true for policy decisions, etc. With a userspace daemon, we cannot
   perform operations in a time-slice of the caller. This makes DoS
   attacks much harder.

 * With priority-inheritance, we can do synchronous calls into trusted
   peers and let them optionally use our time-slice to perform the
   action. This allows syscall-like/binder-like method-calls into other
   processes. Without priority-inheritance, this is not possible in a
   secure manner (see 'priority-inheritance').

 * Logging-daemons often want to attach metadata to log-messages so
   debugging/filtering gets easier. If short-lived programs send
   log-messages, the destination peer might not be able to read such
   metadata from /proc, as the process might no longer be available at
   that time. Same is true for policy-decisions like polkit does. You
   cannot send off method-calls and exit. You have to wait for a reply,
   even though you might not even care for it. If you don't wait, the
   other side might not be able to verify your identity and as such
   reject the request.

 * Even though the dbus traffic on idle-systems might be low, this
   doesn't mean it's not significant at boot-times or under high-load.
   If you run a dbus-monitor of your choice, you will see there is an
   significant number of messages exchanged during VT-switches, startup,
   shutdown, suspend, wakeup, hotplugging and similar situations where
   lots of control-messages are exchanged. We don't want to spend
   hundreds of ms just to transmit those messages.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

----------------------------------------------------------------
Arnd Bergmann (1):
      kdbus: avoid the use of struct timespec

Daniel Mack (18):
      kdbus: add documentation
      kdbus: add uapi header file
      kdbus: add driver skeleton, ioctl entry points and utility functions
      kdbus: add connection pool implementation
      kdbus: add connection, queue handling and message validation code
      kdbus: add node and filesystem implementation
      kdbus: add code to gather metadata
      kdbus: add code for notifications and matches
      kdbus: add code for buses, domains and endpoints
      kdbus: add name registry implementation
      kdbus: add policy database implementation
      kdbus: add Makefile, Kconfig and MAINTAINERS entry
      kdbus: add walk-through user space example
      kdbus: add selftests
      Documentation: kdbus: fix location for generated files
      kdbus: connection: fix handling of failed fget()
      kdbus: Fix CONFIG_KDBUS help text
      samples: kdbus: build kdbus-workers conditionally

David Herrmann (5):
      kdbus: samples/kdbus: add -lrt
      samples/kdbus: drop wrong include
      Documentation/kdbus: fix out-of-tree builds
      Documentation/kdbus: support quiet builds
      selftests/kdbus: fix gitignore

Lucas De Marchi (1):
      kdbus: fix header guard name

Lukasz Skalski (1):
      Documentation/kdbus: replace 'reply_cookie' with 'cookie_reply'

Nicolas Iooss (1):
      kdbus: fix minor typo in the walk-through example

Sergei Zviagintsev (5):
      kdbus: uapi: Fix kernel-doc for enum kdbus_send_flags
      Documentation: kdbus: Fix list of KDBUS_CMD_ENDPOINT_UPDATE errors
      Documentation: kdbus: Update list of ioctls which cause writing to receiver's pool
      Documentation: kdbus: Fix description of KDBUS_SEND_SYNC_REPLY flag
      Documentation: kdbus: Fix typos

Tyler Baker (1):
      selftest/kdbus: enable cross compilation

 Documentation/Makefile                            |    2 +-
 Documentation/ioctl/ioctl-number.txt              |    1 +
 Documentation/kdbus/.gitignore                    |    2 +
 Documentation/kdbus/Makefile                      |   40 +
 Documentation/kdbus/kdbus.bus.xml                 |  359 ++++
 Documentation/kdbus/kdbus.connection.xml          | 1250 ++++++++++++
 Documentation/kdbus/kdbus.endpoint.xml            |  429 ++++
 Documentation/kdbus/kdbus.fs.xml                  |  124 ++
 Documentation/kdbus/kdbus.item.xml                |  839 ++++++++
 Documentation/kdbus/kdbus.match.xml               |  555 ++++++
 Documentation/kdbus/kdbus.message.xml             | 1276 ++++++++++++
 Documentation/kdbus/kdbus.name.xml                |  711 +++++++
 Documentation/kdbus/kdbus.policy.xml              |  406 ++++
 Documentation/kdbus/kdbus.pool.xml                |  326 +++
 Documentation/kdbus/kdbus.xml                     | 1012 ++++++++++
 Documentation/kdbus/stylesheet.xsl                |   16 +
 MAINTAINERS                                       |   13 +
 Makefile                                          |    1 +
 include/uapi/linux/Kbuild                         |    1 +
 include/uapi/linux/kdbus.h                        |  979 +++++++++
 include/uapi/linux/magic.h                        |    2 +
 init/Kconfig                                      |   13 +
 ipc/Makefile                                      |    2 +-
 ipc/kdbus/Makefile                                |   22 +
 ipc/kdbus/bus.c                                   |  560 ++++++
 ipc/kdbus/bus.h                                   |  101 +
 ipc/kdbus/connection.c                            | 2214 +++++++++++++++++++++
 ipc/kdbus/connection.h                            |  257 +++
 ipc/kdbus/domain.c                                |  296 +++
 ipc/kdbus/domain.h                                |   77 +
 ipc/kdbus/endpoint.c                              |  275 +++
 ipc/kdbus/endpoint.h                              |   67 +
 ipc/kdbus/fs.c                                    |  510 +++++
 ipc/kdbus/fs.h                                    |   28 +
 ipc/kdbus/handle.c                                |  617 ++++++
 ipc/kdbus/handle.h                                |   85 +
 ipc/kdbus/item.c                                  |  339 ++++
 ipc/kdbus/item.h                                  |   64 +
 ipc/kdbus/limits.h                                |   64 +
 ipc/kdbus/main.c                                  |  125 ++
 ipc/kdbus/match.c                                 |  559 ++++++
 ipc/kdbus/match.h                                 |   35 +
 ipc/kdbus/message.c                               |  616 ++++++
 ipc/kdbus/message.h                               |  133 ++
 ipc/kdbus/metadata.c                              | 1159 +++++++++++
 ipc/kdbus/metadata.h                              |   57 +
 ipc/kdbus/names.c                                 |  772 +++++++
 ipc/kdbus/names.h                                 |   74 +
 ipc/kdbus/node.c                                  |  910 +++++++++
 ipc/kdbus/node.h                                  |   84 +
 ipc/kdbus/notify.c                                |  248 +++
 ipc/kdbus/notify.h                                |   30 +
 ipc/kdbus/policy.c                                |  489 +++++
 ipc/kdbus/policy.h                                |   51 +
 ipc/kdbus/pool.c                                  |  728 +++++++
 ipc/kdbus/pool.h                                  |   46 +
 ipc/kdbus/queue.c                                 |  678 +++++++
 ipc/kdbus/queue.h                                 |   92 +
 ipc/kdbus/reply.c                                 |  257 +++
 ipc/kdbus/reply.h                                 |   68 +
 ipc/kdbus/util.c                                  |  201 ++
 ipc/kdbus/util.h                                  |   74 +
 samples/Kconfig                                   |    7 +
 samples/Makefile                                  |    3 +-
 samples/kdbus/.gitignore                          |    1 +
 samples/kdbus/Makefile                            |    9 +
 samples/kdbus/kdbus-api.h                         |  114 ++
 samples/kdbus/kdbus-workers.c                     | 1326 ++++++++++++
 tools/testing/selftests/Makefile                  |    1 +
 tools/testing/selftests/kdbus/.gitignore          |    1 +
 tools/testing/selftests/kdbus/Makefile            |   48 +
 tools/testing/selftests/kdbus/kdbus-enum.c        |   94 +
 tools/testing/selftests/kdbus/kdbus-enum.h        |   14 +
 tools/testing/selftests/kdbus/kdbus-test.c        |  923 +++++++++
 tools/testing/selftests/kdbus/kdbus-test.h        |   85 +
 tools/testing/selftests/kdbus/kdbus-util.c        | 1615 +++++++++++++++
 tools/testing/selftests/kdbus/kdbus-util.h        |  222 +++
 tools/testing/selftests/kdbus/test-activator.c    |  318 +++
 tools/testing/selftests/kdbus/test-attach-flags.c |  750 +++++++
 tools/testing/selftests/kdbus/test-benchmark.c    |  451 +++++
 tools/testing/selftests/kdbus/test-bus.c          |  175 ++
 tools/testing/selftests/kdbus/test-chat.c         |  122 ++
 tools/testing/selftests/kdbus/test-connection.c   |  616 ++++++
 tools/testing/selftests/kdbus/test-daemon.c       |   65 +
 tools/testing/selftests/kdbus/test-endpoint.c     |  341 ++++
 tools/testing/selftests/kdbus/test-fd.c           |  789 ++++++++
 tools/testing/selftests/kdbus/test-free.c         |   64 +
 tools/testing/selftests/kdbus/test-match.c        |  441 ++++
 tools/testing/selftests/kdbus/test-message.c      |  731 +++++++
 tools/testing/selftests/kdbus/test-metadata-ns.c  |  506 +++++
 tools/testing/selftests/kdbus/test-monitor.c      |  176 ++
 tools/testing/selftests/kdbus/test-names.c        |  194 ++
 tools/testing/selftests/kdbus/test-policy-ns.c    |  632 ++++++
 tools/testing/selftests/kdbus/test-policy-priv.c  | 1269 ++++++++++++
 tools/testing/selftests/kdbus/test-policy.c       |   80 +
 tools/testing/selftests/kdbus/test-sync.c         |  369 ++++
 tools/testing/selftests/kdbus/test-timeout.c      |   99 +
 97 files changed, 34069 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/kdbus/.gitignore
 create mode 100644 Documentation/kdbus/Makefile
 create mode 100644 Documentation/kdbus/kdbus.bus.xml
 create mode 100644 Documentation/kdbus/kdbus.connection.xml
 create mode 100644 Documentation/kdbus/kdbus.endpoint.xml
 create mode 100644 Documentation/kdbus/kdbus.fs.xml
 create mode 100644 Documentation/kdbus/kdbus.item.xml
 create mode 100644 Documentation/kdbus/kdbus.match.xml
 create mode 100644 Documentation/kdbus/kdbus.message.xml
 create mode 100644 Documentation/kdbus/kdbus.name.xml
 create mode 100644 Documentation/kdbus/kdbus.policy.xml
 create mode 100644 Documentation/kdbus/kdbus.pool.xml
 create mode 100644 Documentation/kdbus/kdbus.xml
 create mode 100644 Documentation/kdbus/stylesheet.xsl
 create mode 100644 include/uapi/linux/kdbus.h
 create mode 100644 ipc/kdbus/Makefile
 create mode 100644 ipc/kdbus/bus.c
 create mode 100644 ipc/kdbus/bus.h
 create mode 100644 ipc/kdbus/connection.c
 create mode 100644 ipc/kdbus/connection.h
 create mode 100644 ipc/kdbus/domain.c
 create mode 100644 ipc/kdbus/domain.h
 create mode 100644 ipc/kdbus/endpoint.c
 create mode 100644 ipc/kdbus/endpoint.h
 create mode 100644 ipc/kdbus/fs.c
 create mode 100644 ipc/kdbus/fs.h
 create mode 100644 ipc/kdbus/handle.c
 create mode 100644 ipc/kdbus/handle.h
 create mode 100644 ipc/kdbus/item.c
 create mode 100644 ipc/kdbus/item.h
 create mode 100644 ipc/kdbus/limits.h
 create mode 100644 ipc/kdbus/main.c
 create mode 100644 ipc/kdbus/match.c
 create mode 100644 ipc/kdbus/match.h
 create mode 100644 ipc/kdbus/message.c
 create mode 100644 ipc/kdbus/message.h
 create mode 100644 ipc/kdbus/metadata.c
 create mode 100644 ipc/kdbus/metadata.h
 create mode 100644 ipc/kdbus/names.c
 create mode 100644 ipc/kdbus/names.h
 create mode 100644 ipc/kdbus/node.c
 create mode 100644 ipc/kdbus/node.h
 create mode 100644 ipc/kdbus/notify.c
 create mode 100644 ipc/kdbus/notify.h
 create mode 100644 ipc/kdbus/policy.c
 create mode 100644 ipc/kdbus/policy.h
 create mode 100644 ipc/kdbus/pool.c
 create mode 100644 ipc/kdbus/pool.h
 create mode 100644 ipc/kdbus/queue.c
 create mode 100644 ipc/kdbus/queue.h
 create mode 100644 ipc/kdbus/reply.c
 create mode 100644 ipc/kdbus/reply.h
 create mode 100644 ipc/kdbus/util.c
 create mode 100644 ipc/kdbus/util.h
 create mode 100644 samples/kdbus/.gitignore
 create mode 100644 samples/kdbus/Makefile
 create mode 100644 samples/kdbus/kdbus-api.h
 create mode 100644 samples/kdbus/kdbus-workers.c
 create mode 100644 tools/testing/selftests/kdbus/.gitignore
 create mode 100644 tools/testing/selftests/kdbus/Makefile
 create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
 create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
 create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
 create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
 create mode 100644 tools/testing/selftests/kdbus/test-activator.c
 create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c
 create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
 create mode 100644 tools/testing/selftests/kdbus/test-bus.c
 create mode 100644 tools/testing/selftests/kdbus/test-chat.c
 create mode 100644 tools/testing/selftests/kdbus/test-connection.c
 create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
 create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
 create mode 100644 tools/testing/selftests/kdbus/test-fd.c
 create mode 100644 tools/testing/selftests/kdbus/test-free.c
 create mode 100644 tools/testing/selftests/kdbus/test-match.c
 create mode 100644 tools/testing/selftests/kdbus/test-message.c
 create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
 create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
 create mode 100644 tools/testing/selftests/kdbus/test-names.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
 create mode 100644 tools/testing/selftests/kdbus/test-policy.c
 create mode 100644 tools/testing/selftests/kdbus/test-sync.c
 create mode 100644 tools/testing/selftests/kdbus/test-timeout.c

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:03 [GIT PULL] kdbus for 4.1-rc1 Greg Kroah-Hartman
@ 2015-04-13 19:29 ` Eric W. Biederman
  2015-04-13 19:42   ` Greg Kroah-Hartman
                     ` (2 more replies)
  2015-04-13 20:13 ` Andy Lutomirski
  2015-04-23 13:05 ` Greg Kroah-Hartman
  2 siblings, 3 replies; 316+ messages in thread
From: Eric W. Biederman @ 2015-04-13 19:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg,
	jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz

Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:

> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>
>   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>
> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>
>   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>
> ----------------------------------------------------------------
> kdbus for 4.1-rc1
>
> Here's the kdbus pull request for 4.1-rc1.
>
> It's been under development for many years now, and been in linux-next
> for many months, and has undergone loads of testing a review and even a few
> good arguments.  It comes with full documentation and tests.

> There has been a few complaints about the code, notably from people who
> don't like the use of metadata in the bus messages.  That is actually
> one of the main features here, as we can get this data in a secure and
> reliable way, and it's something that userspace requires today.  So
> while it does look "odd" to people who are not familiar with dbus, this
> is something that finally fixes a number of almost unfixable races in
> the current dbus implementations.

And the code that transfers the meta-data is wrong.

It is generally not something that userspace requires today, certainly
userspace is not using it.

You are exporting a weird set of information in a unique way that makes
it race free enough to make ``security'' decisions upon but the data
in general is not appropriate to make those decisions.

I remain opposed to this half thought out trash of an ABI for the
meta-data.

Just because something happens to be exported in a DEBUG api today does
not make it appropriate for userspace to run around making security
decisions with that information.

Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>

I think it is premature to be merging kdbus.  You have fuddamental
issues that can not be fixed once the ABI is frozen.

The semantics of the meta-data you export are extremely poorly defined.

> The rest of this pull request message comes from the kdbus patch posting
> messages as sent to lkml previously:
>
> Reasons kdbus should be in the kernel, instead of userspace as it is
> currently done today includes the following:
>
>  * Performance: Fewer process context switches, fewer copies, fewer
>    syscalls, larger memory chunks via memfd.  This is really important
>    for a whole class of userspace programs that are ported from other
>    operating systems that are run on tiny ARM systems that rely on
>    hundreds of thousands of messages passed at boot time, and at
>    "critical" times in their user interaction loops. DBus is not used
>    for performance sensitive applications because DBus is slow.
>    We want to make it fast so we can finally use it for low-latency,
>    high-throughput applications. A simple DBus method-call+reply takes
>    200us on an up-to-date test machine, with kdbus it takes 8us (with
>    UDS about 2us). If the packet size is increased from 8k to 128k,
>    kdbus even beats UDS due to single-copy transfers.

And with a good design kdbus could be faster.

>  * Security: The peers which communicate do not have to trust each
>    other, as the only trustworthy component in the game is the kernel
>    which adds metadata and ensures that all data passed as payload is
>    either copied or sealed, so that the receiver can parse the data
>    without having to protect against changing memory while parsing
>    buffers. Also, all the data transfer is controlled by the kernel,
>    so that LSMs can track and control what is going on, without
>    involving userspace. Because of the LSM issue, security people are
>    much happier with this model than the current scheme of having to
>    hook into dbus to mediate things.
>  * More types of metadata can be attached to messages than in
>    userspace

The meta-data is poorly thought and and much of it is not appropriate
for making security decisions anywhere except in the kernel.

All I have seen with the meta-data discussion is sticking heads in the
sand and resubmitting and hoping your reviewers go away.

If you won't do a good responsible job on this before the code is merged
how can we possibly expect you to do a good job later.  Or is this going
to be another API where userspace will be broken at arbitrary moments by
arbitrary users?

How are you going to fix the security issues your poor API comes with it
when then are eventually spelled out clearly and to fix them means
breaking everyones desktop environment?

Eric


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:29 ` Eric W. Biederman
@ 2015-04-13 19:42   ` Greg Kroah-Hartman
  2015-04-13 19:49     ` Richard Weinberger
  2015-04-13 20:22     ` Al Viro
  2015-04-14  0:19   ` Eric W. Biederman
  2015-04-22  8:58   ` [GIT PULL] kdbus for 4.1-rc1 Borislav Petkov
  2 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-13 19:42 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg,
	jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz

On Mon, Apr 13, 2015 at 02:29:35PM -0500, Eric W. Biederman wrote:
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> 
> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >
> >   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >
> > are available in the git repository at:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >
> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >
> >   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >
> > ----------------------------------------------------------------
> > kdbus for 4.1-rc1
> >
> > Here's the kdbus pull request for 4.1-rc1.
> >
> > It's been under development for many years now, and been in linux-next
> > for many months, and has undergone loads of testing a review and even a few
> > good arguments.  It comes with full documentation and tests.
> 
> > There has been a few complaints about the code, notably from people who
> > don't like the use of metadata in the bus messages.  That is actually
> > one of the main features here, as we can get this data in a secure and
> > reliable way, and it's something that userspace requires today.  So
> > while it does look "odd" to people who are not familiar with dbus, this
> > is something that finally fixes a number of almost unfixable races in
> > the current dbus implementations.
> 
> And the code that transfers the meta-data is wrong.
> 
> It is generally not something that userspace requires today, certainly
> userspace is not using it.
> 
> You are exporting a weird set of information in a unique way that makes
> it race free enough to make ``security'' decisions upon but the data
> in general is not appropriate to make those decisions.

I asked this before but you didn't answer as to why you thought these
decisions were not valid.  It's what userspace does today already.

> I remain opposed to this half thought out trash of an ABI for the
> meta-data.

You don't have to enable the metadata if you don't want to use it, it's
an option :)

> Just because something happens to be exported in a DEBUG api today does
> not make it appropriate for userspace to run around making security
> decisions with that information.

What is exported in a debug api today that is being used here?  I asked
this before but never saw a response.

> >  * Performance: Fewer process context switches, fewer copies, fewer
> >    syscalls, larger memory chunks via memfd.  This is really important
> >    for a whole class of userspace programs that are ported from other
> >    operating systems that are run on tiny ARM systems that rely on
> >    hundreds of thousands of messages passed at boot time, and at
> >    "critical" times in their user interaction loops. DBus is not used
> >    for performance sensitive applications because DBus is slow.
> >    We want to make it fast so we can finally use it for low-latency,
> >    high-throughput applications. A simple DBus method-call+reply takes
> >    200us on an up-to-date test machine, with kdbus it takes 8us (with
> >    UDS about 2us). If the packet size is increased from 8k to 128k,
> >    kdbus even beats UDS due to single-copy transfers.
> 
> And with a good design kdbus could be faster.

Faster than today, sure, we've already found some areas that can be
optimized, but that's all internal changes, to be done later, nothing
affecting the userspace api at all.

Even then, today it's very fast.

> >  * Security: The peers which communicate do not have to trust each
> >    other, as the only trustworthy component in the game is the kernel
> >    which adds metadata and ensures that all data passed as payload is
> >    either copied or sealed, so that the receiver can parse the data
> >    without having to protect against changing memory while parsing
> >    buffers. Also, all the data transfer is controlled by the kernel,
> >    so that LSMs can track and control what is going on, without
> >    involving userspace. Because of the LSM issue, security people are
> >    much happier with this model than the current scheme of having to
> >    hook into dbus to mediate things.
> >  * More types of metadata can be attached to messages than in
> >    userspace
> 
> The meta-data is poorly thought and and much of it is not appropriate
> for making security decisions anywhere except in the kernel.
> 
> All I have seen with the meta-data discussion is sticking heads in the
> sand and resubmitting and hoping your reviewers go away.

No, we have asked for specifics but have gotten none, other than random
complaints like this.  Please be specific as to what is being used
incorrectly.

> If you won't do a good responsible job on this before the code is merged
> how can we possibly expect you to do a good job later.  Or is this going
> to be another API where userspace will be broken at arbitrary moments by
> arbitrary users?
> 
> How are you going to fix the security issues your poor API comes with it
> when then are eventually spelled out clearly and to fix them means
> breaking everyones desktop environment?

What security issues?  There are none that I know of, please be specific
and not just make vague accusations please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:42   ` Greg Kroah-Hartman
@ 2015-04-13 19:49     ` Richard Weinberger
  2015-04-13 19:54       ` Greg Kroah-Hartman
  2015-04-13 20:22     ` Al Viro
  1 sibling, 1 reply; 316+ messages in thread
From: Richard Weinberger @ 2015-04-13 19:49 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	LKML, daniel, David Herrmann, Djalal Harouni

On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>> I remain opposed to this half thought out trash of an ABI for the
>> meta-data.
>
> You don't have to enable the metadata if you don't want to use it, it's
> an option :)

Wasn't this also an argument for CONFIG_CGROUPS?
Now we're forced to enable it by default to boot a recent distro
and CONFIG_CGROUPS is still not fixed.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:49     ` Richard Weinberger
@ 2015-04-13 19:54       ` Greg Kroah-Hartman
  2015-04-13 19:57         ` Richard Weinberger
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-13 19:54 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	LKML, daniel, David Herrmann, Djalal Harouni

On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote:
> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >> I remain opposed to this half thought out trash of an ABI for the
> >> meta-data.
> >
> > You don't have to enable the metadata if you don't want to use it, it's
> > an option :)
> 
> Wasn't this also an argument for CONFIG_CGROUPS?
> Now we're forced to enable it by default to boot a recent distro
> and CONFIG_CGROUPS is still not fixed.

CONFIG_CGROUPS is "not fixed"?  I think Tejun would like to have some
words with you :)

Anyway, yes, it's an option, but given that people are using this
metadata today in userspace just fine, I fail to see how having the
kernel be a transport for this same data is an issue.  When the kernel
is the transport, it can do so in a race-free way, and you can properly
do security tests/logic based on it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:54       ` Greg Kroah-Hartman
@ 2015-04-13 19:57         ` Richard Weinberger
  2015-04-13 20:03           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 316+ messages in thread
From: Richard Weinberger @ 2015-04-13 19:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Richard Weinberger
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	LKML, daniel, David Herrmann, Djalal Harouni


Am 13.04.2015 um 21:54 schrieb Greg Kroah-Hartman:
> On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote:
>> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>>>> I remain opposed to this half thought out trash of an ABI for the
>>>> meta-data.
>>>
>>> You don't have to enable the metadata if you don't want to use it, it's
>>> an option :)
>>
>> Wasn't this also an argument for CONFIG_CGROUPS?
>> Now we're forced to enable it by default to boot a recent distro
>> and CONFIG_CGROUPS is still not fixed.
> 
> CONFIG_CGROUPS is "not fixed"?  I think Tejun would like to have some
> words with you :)

Tejun is working on it and does a *very* good job. But as long the unified
hirarchy is not complete/stable we're facing issues.
Ever tried to run systemd a linux container? ;)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:57         ` Richard Weinberger
@ 2015-04-13 20:03           ` Greg Kroah-Hartman
  2015-04-13 20:08             ` Richard Weinberger
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-13 20:03 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Richard Weinberger, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Andy Lutomirski, LKML, daniel, David Herrmann,
	Djalal Harouni

On Mon, Apr 13, 2015 at 09:57:24PM +0200, Richard Weinberger wrote:
> 
> Am 13.04.2015 um 21:54 schrieb Greg Kroah-Hartman:
> > On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote:
> >> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
> >> <gregkh@linuxfoundation.org> wrote:
> >>>> I remain opposed to this half thought out trash of an ABI for the
> >>>> meta-data.
> >>>
> >>> You don't have to enable the metadata if you don't want to use it, it's
> >>> an option :)
> >>
> >> Wasn't this also an argument for CONFIG_CGROUPS?
> >> Now we're forced to enable it by default to boot a recent distro
> >> and CONFIG_CGROUPS is still not fixed.
> > 
> > CONFIG_CGROUPS is "not fixed"?  I think Tejun would like to have some
> > words with you :)
> 
> Tejun is working on it and does a *very* good job. But as long the unified
> hirarchy is not complete/stable we're facing issues.
> Ever tried to run systemd a linux container? ;)

Works just fine for me, I do it daily.  Here's how I spin up a debian
image on my local filesystem, running systemd within it just swimmingly:
	sudo systemd-nspawn -D debian/ /sbin/init

Also works just fine with gentoo and arch images, both of which I use on
a weekly basis in this manner.

Perhaps you are doing something odd that prevents this from working for
you?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 20:03           ` Greg Kroah-Hartman
@ 2015-04-13 20:08             ` Richard Weinberger
  0 siblings, 0 replies; 316+ messages in thread
From: Richard Weinberger @ 2015-04-13 20:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Richard Weinberger, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Andy Lutomirski, LKML, daniel, David Herrmann,
	Djalal Harouni

Am 13.04.2015 um 22:03 schrieb Greg Kroah-Hartman:
> On Mon, Apr 13, 2015 at 09:57:24PM +0200, Richard Weinberger wrote:
>>
>> Am 13.04.2015 um 21:54 schrieb Greg Kroah-Hartman:
>>> On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote:
>>>> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
>>>> <gregkh@linuxfoundation.org> wrote:
>>>>>> I remain opposed to this half thought out trash of an ABI for the
>>>>>> meta-data.
>>>>>
>>>>> You don't have to enable the metadata if you don't want to use it, it's
>>>>> an option :)
>>>>
>>>> Wasn't this also an argument for CONFIG_CGROUPS?
>>>> Now we're forced to enable it by default to boot a recent distro
>>>> and CONFIG_CGROUPS is still not fixed.
>>>
>>> CONFIG_CGROUPS is "not fixed"?  I think Tejun would like to have some
>>> words with you :)
>>
>> Tejun is working on it and does a *very* good job. But as long the unified
>> hirarchy is not complete/stable we're facing issues.
>> Ever tried to run systemd a linux container? ;)
> 
> Works just fine for me, I do it daily.  Here's how I spin up a debian
> image on my local filesystem, running systemd within it just swimmingly:
> 	sudo systemd-nspawn -D debian/ /sbin/init
> 
> Also works just fine with gentoo and arch images, both of which I use on
> a weekly basis in this manner.
> 
> Perhaps you are doing something odd that prevents this from working for
> you?

systemd-nspawn does not support user namespaces.

But the real issue is that cgroup notification does not work within namespaces.
I.e. systemd within the namespaces does not get a notify when all processes within a cgroup
are gone.
You'll notice that by running a container a long time, systemd will get slower and slower
as a lot of sessions (mostly crond) will stay.
It is known by systemd folks and I have been told that they need the new unified cgroup
hirarchy to deal with that.

I consult a lot in the linux container hosting area and had a lot of "fun" with issues like
that...

Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:03 [GIT PULL] kdbus for 4.1-rc1 Greg Kroah-Hartman
  2015-04-13 19:29 ` Eric W. Biederman
@ 2015-04-13 20:13 ` Andy Lutomirski
  2015-04-13 20:45   ` Greg Kroah-Hartman
  2015-04-23 13:05 ` Greg Kroah-Hartman
  2 siblings, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-13 20:13 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>
>   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>
> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>
>   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>
> ----------------------------------------------------------------
> kdbus for 4.1-rc1
>
> Here's the kdbus pull request for 4.1-rc1.
>
> It's been under development for many years now, and been in linux-next
> for many months, and has undergone loads of testing a review and even a few
> good arguments.  It comes with full documentation and tests.
>
> There has been a few complaints about the code, notably from people who
> don't like the use of metadata in the bus messages.  That is actually
> one of the main features here, as we can get this data in a secure and
> reliable way, and it's something that userspace requires today.  So
> while it does look "odd" to people who are not familiar with dbus, this
> is something that finally fixes a number of almost unfixable races in
> the current dbus implementations.

While I generally like the concept of having a better in-kernel IPC
mechanism, after some consideration I don't think this belongs in the
kernel in its current form.  Here's why.

First, the naming is counterintuitive.  There are "endpoints", but you
don't send messages to endpoints.  In fact, an basic kdbus setup will
have exactly one endpoint AFAICT.  Wtf?  This makes talking about it
awkward.

A lot of the design seems to be to violate the concept of "mechanism,
not policy".  Kdbus is very much a port of userspace dbus to the
kernel, and it appears to be a port designed to preserve some
questionable design decisions instead of learning from them.

For example, kdbus sticks a whole policy database in the kernel, but
that policy database (AFAICT -- holy crap it's overcomplicated) is
*not* a simple set of rules like "if A then allow B".  Instead it has
really weird dependencies not on what name you're sending to but on
what *other* names the thing you're sending to has.  Sorry, but this
way lies (a) the inability for a large set of developers to understand
what's going on and (b) security bugs.  Also, the result probably
can't be reused as part of a non-legacy-filled sensible design

Kdbus claims to be very fast.  Unfortunately, requests for a broad set
of benchmarks have mostly been ignored, my attempts to benchmark it
(admittedly I didn't try that hard) were several times worse than
published figures, and, most tellingly, *no one* has claimed that
kdbus is faster than AF_UNIX.  In fact, everyone seems to acknowledge
that kdbus is several times slower than AF_UNIX.

The metadata thing is problematic.  It seems to be intended to serve
two purposes: data gathering for logging and authentication.
Unfortunately, it has issues.  There are no fewer than *three*
metadata capture points: creation of a bus, connection to a bus, and
sending of a message.  The kdbus authors like to point out that these
are all optional, but IMO that's bunk.  Someone will write a userspace
library that rejects messages from people who don't enable all of
them, then then we're screwed.

Why are we screwed?  Because any kdbus client *won't know which
metadata matters*.  That means that we automatically have the worst of
all worlds, not the best.  Also, the bus creation metadata is
completely worthless for anything other than logging, but someone will
use it for something other than logging, at which point it's
vulnerable to a DoS.  No one has explained to my satisfaction why this
isn't a problem.

Also, the metadata code captures things that are, in my book
completely unacceptable, such as cmdline and (!) capabilities.  I bet
that the cmdline capture is extra special fscked up when cgroups and
such are in play because *it reads from the sender's VM*.  IOW it's
insecure and pointless.  (OK, it has a point: logging.  But I really
don't think that belongs in the kernel.)

In summary, the general idea is good, but the implementation isn't
general enough, the policy stuff is too specialized and enshrines bad
design, the performance isn't good enough to justify it, and the
metadata is nasty.

So, for what it's worth, NACK in its present form.  Sorry.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:42   ` Greg Kroah-Hartman
  2015-04-13 19:49     ` Richard Weinberger
@ 2015-04-13 20:22     ` Al Viro
  2015-04-13 20:37       ` Greg Kroah-Hartman
  2015-04-15  1:36       ` Andy Lutomirski
  1 sibling, 2 replies; 316+ messages in thread
From: Al Viro @ 2015-04-13 20:22 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann,
	tixxdz

On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> > I remain opposed to this half thought out trash of an ABI for the
> > meta-data.
> 
> You don't have to enable the metadata if you don't want to use it, it's
> an option :)

OK, _that_ argument needs to be stomped out.  It had been used before,
and it was a deliberate scam.  There is no such thing as optional kernel
interface, especially when udev/dbus/systemd crowd is nearby.  We'd been
through that excuse before; remember how devtmpfs was pushed in as "optional"?

This is a huge red flag.  On the level of "I need your account information
to transfer $200M you might have inherited from my deceased client".

Just to recap how it went the last time around: Kay kept pushing his piece of
code into the tree, claiming that it was optional, that nobody who doesn't
like it has to enable it, so what's the problem?  OK, in it went.  And pretty
soon udev (maintained by the same... meticulously honorable person) had
stopped working on the kernels that didn't have that enabled.

We had been there before.  To paraphrase another... meticulously honorable
person, "if you didn't want something relied upon, why have you put it into the
kernel?" Said person is on the record as having no problem whatsoever with
adding dependencies to the bottom of userland stack.

IMO either it's OK without "if you don't like it, don't enable it", or it
should not be merged at all.

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 20:22     ` Al Viro
@ 2015-04-13 20:37       ` Greg Kroah-Hartman
  2015-04-15  1:36       ` Andy Lutomirski
  1 sibling, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-13 20:37 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann,
	tixxdz

On Mon, Apr 13, 2015 at 09:22:33PM +0100, Al Viro wrote:
> On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> > > I remain opposed to this half thought out trash of an ABI for the
> > > meta-data.
> > 
> > You don't have to enable the metadata if you don't want to use it, it's
> > an option :)
> 
> OK, _that_ argument needs to be stomped out.  It had been used before,
> and it was a deliberate scam.  There is no such thing as optional kernel
> interface, especially when udev/dbus/systemd crowd is nearby.  We'd been
> through that excuse before; remember how devtmpfs was pushed in as "optional"?
> 
> This is a huge red flag.  On the level of "I need your account information
> to transfer $200M you might have inherited from my deceased client".
> 
> Just to recap how it went the last time around: Kay kept pushing his piece of
> code into the tree, claiming that it was optional, that nobody who doesn't
> like it has to enable it, so what's the problem?  OK, in it went.  And pretty
> soon udev (maintained by the same... meticulously honorable person) had
> stopped working on the kernels that didn't have that enabled.
> 
> We had been there before.  To paraphrase another... meticulously honorable
> person, "if you didn't want something relied upon, why have you put it into the
> kernel?" Said person is on the record as having no problem whatsoever with
> adding dependencies to the bottom of userland stack.
> 
> IMO either it's OK without "if you don't like it, don't enable it", or it
> should not be merged at all.

We want it.  I want it.  Andy asked for the option to be disabled as he
didn't want it, so it was made that way.  I'll gladly put that back in,
as I don't know of any problems with it, other than Eric's vague rants
about the issue.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 20:13 ` Andy Lutomirski
@ 2015-04-13 20:45   ` Greg Kroah-Hartman
  2015-04-13 21:01     ` Andy Lutomirski
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-13 20:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >
> >   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >
> > are available in the git repository at:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >
> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >
> >   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >
> > ----------------------------------------------------------------
> > kdbus for 4.1-rc1
> >
> > Here's the kdbus pull request for 4.1-rc1.
> >
> > It's been under development for many years now, and been in linux-next
> > for many months, and has undergone loads of testing a review and even a few
> > good arguments.  It comes with full documentation and tests.
> >
> > There has been a few complaints about the code, notably from people who
> > don't like the use of metadata in the bus messages.  That is actually
> > one of the main features here, as we can get this data in a secure and
> > reliable way, and it's something that userspace requires today.  So
> > while it does look "odd" to people who are not familiar with dbus, this
> > is something that finally fixes a number of almost unfixable races in
> > the current dbus implementations.
> 
> While I generally like the concept of having a better in-kernel IPC
> mechanism, after some consideration I don't think this belongs in the
> kernel in its current form.  Here's why.
> 
> First, the naming is counterintuitive.  There are "endpoints", but you
> don't send messages to endpoints.  In fact, an basic kdbus setup will
> have exactly one endpoint AFAICT.  Wtf?  This makes talking about it
> awkward.

Did you read the documentation?  We've been over this before, and it
should all be addressed in the documentation based on this coming up.

> A lot of the design seems to be to violate the concept of "mechanism,
> not policy".  Kdbus is very much a port of userspace dbus to the
> kernel, and it appears to be a port designed to preserve some
> questionable design decisions instead of learning from them.
> 
> For example, kdbus sticks a whole policy database in the kernel, but
> that policy database (AFAICT -- holy crap it's overcomplicated) is
> *not* a simple set of rules like "if A then allow B".  Instead it has
> really weird dependencies not on what name you're sending to but on
> what *other* names the thing you're sending to has.  Sorry, but this
> way lies (a) the inability for a large set of developers to understand
> what's going on and (b) security bugs.  Also, the result probably
> can't be reused as part of a non-legacy-filled sensible design

What policy database?  Matching messages to subscribers?  That's the
same type of "database" that other ipc subsystems need/want, there's
nothing radical here.

And lots of things has changed from userspace, based on a decade of
knowledge of how dbus works, and how dbus itself was implemented.  The
design, and code, has been reviewed by those developers.  Where issues
were raised, they were fixed.

Yes, dbus is "odd", but it serves a real need, and does so quite well,
and now kdbus is the next evolution of that system, fixing and
addressing the issues learned from implementing and designing dbus and
previous versions of this type of ipc (corba, dcom, com, etc.)

> Kdbus claims to be very fast.  Unfortunately, requests for a broad set
> of benchmarks have mostly been ignored, my attempts to benchmark it
> (admittedly I didn't try that hard) were several times worse than
> published figures, and, most tellingly, *no one* has claimed that
> kdbus is faster than AF_UNIX.  In fact, everyone seems to acknowledge
> that kdbus is several times slower than AF_UNIX.

It does more than AF_UNIX, so of course it's going to be slower.  But
you can't do all the things you need to do with dbus with just AF_UNIX,
it's a different model.  Again, the documentation should explain this.

And the benchmarks and source were posted by David previously, with full
details, this is the first time I've heard you could not reproduce them
using that code.

> The metadata thing is problematic.  It seems to be intended to serve
> two purposes: data gathering for logging and authentication.
> Unfortunately, it has issues.  There are no fewer than *three*
> metadata capture points: creation of a bus, connection to a bus, and
> sending of a message.  The kdbus authors like to point out that these
> are all optional, but IMO that's bunk.  Someone will write a userspace
> library that rejects messages from people who don't enable all of
> them, then then we're screwed.

Remember, you asked for it to be optional, it wasn't in the beginning :)

So let's make it not optional, great.  And the capture points are in
different places as it is different data and entry points.

> Why are we screwed?  Because any kdbus client *won't know which
> metadata matters*.  That means that we automatically have the worst of
> all worlds, not the best.  Also, the bus creation metadata is
> completely worthless for anything other than logging, but someone will
> use it for something other than logging, at which point it's
> vulnerable to a DoS.  No one has explained to my satisfaction why this
> isn't a problem.

I don't think the creation data is worthless, I'm pretty sure the
SELinux people are using it to validate things, but I could be wrong.
Others on the cc: know more about that than I do and can provide
details.

> Also, the metadata code captures things that are, in my book
> completely unacceptable, such as cmdline and (!) capabilities.  I bet
> that the cmdline capture is extra special fscked up when cgroups and
> such are in play because *it reads from the sender's VM*.  IOW it's
> insecure and pointless.  (OK, it has a point: logging.  But I really
> don't think that belongs in the kernel.)

The sender's vm is what is wanted here.  And cmdline is something that
userspace gets today, and does things with, as does SELinux, and
auditing.  Same for capabilities, it's not insecure and pointless, it's
the same thing that is provided to userspace, and userspace makes
decisions on today, independent of kdbus/dbus.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 20:45   ` Greg Kroah-Hartman
@ 2015-04-13 21:01     ` Andy Lutomirski
  2015-04-14 17:50       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-13 21:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
>> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>> >
>> >   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>> >
>> > are available in the git repository at:
>> >
>> >   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>> >
>> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>> >
>> >   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>> >
>> > ----------------------------------------------------------------
>> > kdbus for 4.1-rc1
>> >
>> > Here's the kdbus pull request for 4.1-rc1.
>> >
>> > It's been under development for many years now, and been in linux-next
>> > for many months, and has undergone loads of testing a review and even a few
>> > good arguments.  It comes with full documentation and tests.
>> >
>> > There has been a few complaints about the code, notably from people who
>> > don't like the use of metadata in the bus messages.  That is actually
>> > one of the main features here, as we can get this data in a secure and
>> > reliable way, and it's something that userspace requires today.  So
>> > while it does look "odd" to people who are not familiar with dbus, this
>> > is something that finally fixes a number of almost unfixable races in
>> > the current dbus implementations.
>>
>> While I generally like the concept of having a better in-kernel IPC
>> mechanism, after some consideration I don't think this belongs in the
>> kernel in its current form.  Here's why.
>>
>> First, the naming is counterintuitive.  There are "endpoints", but you
>> don't send messages to endpoints.  In fact, an basic kdbus setup will
>> have exactly one endpoint AFAICT.  Wtf?  This makes talking about it
>> awkward.
>
> Did you read the documentation?  We've been over this before, and it
> should all be addressed in the documentation based on this coming up.
>
>> A lot of the design seems to be to violate the concept of "mechanism,
>> not policy".  Kdbus is very much a port of userspace dbus to the
>> kernel, and it appears to be a port designed to preserve some
>> questionable design decisions instead of learning from them.
>>
>> For example, kdbus sticks a whole policy database in the kernel, but
>> that policy database (AFAICT -- holy crap it's overcomplicated) is
>> *not* a simple set of rules like "if A then allow B".  Instead it has
>> really weird dependencies not on what name you're sending to but on
>> what *other* names the thing you're sending to has.  Sorry, but this
>> way lies (a) the inability for a large set of developers to understand
>> what's going on and (b) security bugs.  Also, the result probably
>> can't be reused as part of a non-legacy-filled sensible design
>
> What policy database?  Matching messages to subscribers?  That's the
> same type of "database" that other ipc subsystems need/want, there's
> nothing radical here.

Let me quote from the latest version of the kdbus docs:

      Note that TALK access is checked against all names of a connection. For
      example, if a connection owns both <constant>'org.foo.bar'</constant> and
      <constant>'org.blah.baz'</constant>, and the policy database allows
      <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
      permission is also granted to <constant>'org.foo.bar'</constant>. That
      might sound illogical, but after all, we allow messages to be directed to
      either the ID or a well-known name, and policy is applied to the
      connection, not the name. In other words, the effective TALK policy for a
      connection is the most permissive of all names the connection owns.

In my humble opinion, this paragraph speaks for itself.  The design is
bad, full stop.

[...]

> And the benchmarks and source were posted by David previously, with full
> details, this is the first time I've heard you could not reproduce them
> using that code.

No it's not.  But I got bored and didn't try again.

>
>> The metadata thing is problematic.  It seems to be intended to serve
>> two purposes: data gathering for logging and authentication.
>> Unfortunately, it has issues.  There are no fewer than *three*
>> metadata capture points: creation of a bus, connection to a bus, and
>> sending of a message.  The kdbus authors like to point out that these
>> are all optional, but IMO that's bunk.  Someone will write a userspace
>> library that rejects messages from people who don't enable all of
>> them, then then we're screwed.
>
> Remember, you asked for it to be optional, it wasn't in the beginning :)
>
> So let's make it not optional, great.  And the capture points are in
> different places as it is different data and entry points.

Then I'll have to find a way to embolden my NACK further.  My point is
that capturing garbage like cmdline and capabilities (again, that
latter part is completely unacceptable under any circumstances
whatsoever) on behalf of *all* senders is a disaster.  If it's
optional, then I can at least hope that userspace will honor the
optionality and let everything turn it off.  If it's mandatory, then
kdbus is just unsafe to use to send messages to untrusted parties.

>
>> Why are we screwed?  Because any kdbus client *won't know which
>> metadata matters*.  That means that we automatically have the worst of
>> all worlds, not the best.  Also, the bus creation metadata is
>> completely worthless for anything other than logging, but someone will
>> use it for something other than logging, at which point it's
>> vulnerable to a DoS.  No one has explained to my satisfaction why this
>> isn't a problem.
>
> I don't think the creation data is worthless, I'm pretty sure the
> SELinux people are using it to validate things, but I could be wrong.
> Others on the cc: know more about that than I do and can provide
> details.

Does that code even exist in public form yet?

>
>> Also, the metadata code captures things that are, in my book
>> completely unacceptable, such as cmdline and (!) capabilities.  I bet
>> that the cmdline capture is extra special fscked up when cgroups and
>> such are in play because *it reads from the sender's VM*.  IOW it's
>> insecure and pointless.  (OK, it has a point: logging.  But I really
>> don't think that belongs in the kernel.)
>
> The sender's vm is what is wanted here.  And cmdline is something that
> userspace gets today, and does things with, as does SELinux, and
> auditing.  Same for capabilities, it's not insecure and pointless, it's
> the same thing that is provided to userspace, and userspace makes
> decisions on today, independent of kdbus/dbus.

Is there anything that userspace makes decisions on based on
capabilities?  If so, please tell me and I'll entertain myself by
writing exploits for them.

The fact that some existing userspace does awful things does *not*
justify adding new kernel mechanisms with which to repeat those
mistakes.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:29 ` Eric W. Biederman
  2015-04-13 19:42   ` Greg Kroah-Hartman
@ 2015-04-14  0:19   ` Eric W. Biederman
  2015-04-14  0:34     ` Andy Lutomirski
  2015-04-14 17:55     ` Greg Kroah-Hartman
  2015-04-22  8:58   ` [GIT PULL] kdbus for 4.1-rc1 Borislav Petkov
  2 siblings, 2 replies; 316+ messages in thread
From: Eric W. Biederman @ 2015-04-14  0:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg,
	jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz

ebiederm@xmission.com (Eric W. Biederman) writes:

> Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
>
>> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>>
>>   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>>
>> are available in the git repository at:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>>
>> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>>
>>   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>>
>> ----------------------------------------------------------------
>> kdbus for 4.1-rc1
>>
>> Here's the kdbus pull request for 4.1-rc1.
>>
>> It's been under development for many years now, and been in linux-next
>> for many months, and has undergone loads of testing a review and even a few
>> good arguments.  It comes with full documentation and tests.
>
>> There has been a few complaints about the code, notably from people who
>> don't like the use of metadata in the bus messages.  That is actually
>> one of the main features here, as we can get this data in a secure and
>> reliable way, and it's something that userspace requires today.  So
>> while it does look "odd" to people who are not familiar with dbus, this
>> is something that finally fixes a number of almost unfixable races in
>> the current dbus implementations.
>
> And the code that transfers the meta-data is wrong.

In fact it is worse than I thought.

With an userspace application able to give meaning to any of the bits of
meta-data that are passed (capabilities, cgroup, security labels, etc)
that in the fullness of time dropping in them will grant you more
permissions somewhere.

Which means that it becomes impossible to change anything.  Impossible
to jail anything.  It in fact becomes impossible to do anything right.

Which means the ultimate result of the direction kdbus is going is a
world where nothing can be done without introducing a security issue or
breaking userspace.

So as far as I can tell kdbus has a fundamental design flaw.

My apologies for being the bearer of bad news.

Eric


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14  0:19   ` Eric W. Biederman
@ 2015-04-14  0:34     ` Andy Lutomirski
  2015-04-14 17:55     ` Greg Kroah-Hartman
  1 sibling, 0 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-14  0:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Mon, Apr 13, 2015 at 5:19 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
>>
>>> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>>>
>>>   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>>>
>>> are available in the git repository at:
>>>
>>>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>>>
>>> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>>>
>>>   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>>>
>>> ----------------------------------------------------------------
>>> kdbus for 4.1-rc1
>>>
>>> Here's the kdbus pull request for 4.1-rc1.
>>>
>>> It's been under development for many years now, and been in linux-next
>>> for many months, and has undergone loads of testing a review and even a few
>>> good arguments.  It comes with full documentation and tests.
>>
>>> There has been a few complaints about the code, notably from people who
>>> don't like the use of metadata in the bus messages.  That is actually
>>> one of the main features here, as we can get this data in a secure and
>>> reliable way, and it's something that userspace requires today.  So
>>> while it does look "odd" to people who are not familiar with dbus, this
>>> is something that finally fixes a number of almost unfixable races in
>>> the current dbus implementations.
>>
>> And the code that transfers the meta-data is wrong.
>
> In fact it is worse than I thought.
>
> With an userspace application able to give meaning to any of the bits of
> meta-data that are passed (capabilities, cgroup, security labels, etc)
> that in the fullness of time dropping in them will grant you more
> permissions somewhere.
>
> Which means that it becomes impossible to change anything.  Impossible
> to jail anything.  It in fact becomes impossible to do anything right.
>
> Which means the ultimate result of the direction kdbus is going is a
> world where nothing can be done without introducing a security issue or
> breaking userspace.
>
> So as far as I can tell kdbus has a fundamental design flaw.
>
> My apologies for being the bearer of bad news.
>

I agree here.  I cannot overstate the degree to which passing caps
around through metadata is a bad idea.

LSM labels are probably nearly as bad.  Having LSM hooks in kdbus is
one thing, but passing the *raw labels* around and letting userspace
muck with them will cause the policy situation to be incomprehensible.

User code should get simple yes/no answers from LSM policy, not raw data.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 21:01     ` Andy Lutomirski
@ 2015-04-14 17:50       ` Greg Kroah-Hartman
  2015-04-14 18:57         ` Andy Lutomirski
  2015-04-14 22:33         ` Jiri Kosina
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-14 17:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
> >> <gregkh@linuxfoundation.org> wrote:
> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >> >
> >> >   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >> >
> >> > are available in the git repository at:
> >> >
> >> >   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >> >
> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >> >
> >> >   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >> >
> >> > ----------------------------------------------------------------
> >> > kdbus for 4.1-rc1
> >> >
> >> > Here's the kdbus pull request for 4.1-rc1.
> >> >
> >> > It's been under development for many years now, and been in linux-next
> >> > for many months, and has undergone loads of testing a review and even a few
> >> > good arguments.  It comes with full documentation and tests.
> >> >
> >> > There has been a few complaints about the code, notably from people who
> >> > don't like the use of metadata in the bus messages.  That is actually
> >> > one of the main features here, as we can get this data in a secure and
> >> > reliable way, and it's something that userspace requires today.  So
> >> > while it does look "odd" to people who are not familiar with dbus, this
> >> > is something that finally fixes a number of almost unfixable races in
> >> > the current dbus implementations.
> >>
> >> While I generally like the concept of having a better in-kernel IPC
> >> mechanism, after some consideration I don't think this belongs in the
> >> kernel in its current form.  Here's why.
> >>
> >> First, the naming is counterintuitive.  There are "endpoints", but you
> >> don't send messages to endpoints.  In fact, an basic kdbus setup will
> >> have exactly one endpoint AFAICT.  Wtf?  This makes talking about it
> >> awkward.
> >
> > Did you read the documentation?  We've been over this before, and it
> > should all be addressed in the documentation based on this coming up.
> >
> >> A lot of the design seems to be to violate the concept of "mechanism,
> >> not policy".  Kdbus is very much a port of userspace dbus to the
> >> kernel, and it appears to be a port designed to preserve some
> >> questionable design decisions instead of learning from them.
> >>
> >> For example, kdbus sticks a whole policy database in the kernel, but
> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
> >> *not* a simple set of rules like "if A then allow B".  Instead it has
> >> really weird dependencies not on what name you're sending to but on
> >> what *other* names the thing you're sending to has.  Sorry, but this
> >> way lies (a) the inability for a large set of developers to understand
> >> what's going on and (b) security bugs.  Also, the result probably
> >> can't be reused as part of a non-legacy-filled sensible design
> >
> > What policy database?  Matching messages to subscribers?  That's the
> > same type of "database" that other ipc subsystems need/want, there's
> > nothing radical here.
> 
> Let me quote from the latest version of the kdbus docs:
> 
>       Note that TALK access is checked against all names of a connection. For
>       example, if a connection owns both <constant>'org.foo.bar'</constant> and
>       <constant>'org.blah.baz'</constant>, and the policy database allows
>       <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
>       permission is also granted to <constant>'org.foo.bar'</constant>. That
>       might sound illogical, but after all, we allow messages to be directed to
>       either the ID or a well-known name, and policy is applied to the
>       connection, not the name. In other words, the effective TALK policy for a
>       connection is the most permissive of all names the connection owns.
> 
> In my humble opinion, this paragraph speaks for itself.  The design is
> bad, full stop.

First off, thanks for reading the docs, I appreciate that.  But realize
also, that this is straight from the D-Bus spec.  We aren't doing
anything "radical" here, this is what your desktop uses that you are
typing your email from.

Yes, it's an unfortunate design, but one that we are all stuck with
(think of it as having to implement code for horrid hardware that you
have to get to work properly.)  There are many applications out there
which don't address messages to their well-known name destination but
to the ID which they looked up earlier and cached. In fact, that
behavior is the default in the gdbus library implementation.

If a connection owns two names, and one is more permissive than the 
other one, an attacker could as well choose the more openly configured 
name to get a message delivered.  That's nothing we can protect from 
really.  So ideally you never do that, just like you shouldn't do that
in an network configuration with DNS, if you want to manage access
properly.

The logic here is comparable to IP vs. DNS
 - A host may have multiple DNS names assigned, just like a service may
   be the owner of multiple well-known names
 - Clients can talk to a service using its unique ID (uint64_t) or its
   well known name.
 - Clients can as well look up the ID of a well-known name and address
   messages to it directly
 - Hence, we cannot make decisions based on the well-known name that has
   been used to send the message
 - Instead, we have to fall back to the logic described in the docs
 - Firewall rules are applied to IPs, _not_ DNS names!

D-Bus is a specification that has been out there for over a decade, and
we are not designing anything new here, but rather implementing it as
designed.  We have to be compatible to the existing users of the DBus
system, and don't have the luxury of being able to change core things
like this and expect the world to be able to change just because the
design is not as clean as it should/could be.

Again, just like getting horrid hardware to work properly, sometimes we
have to write odd code.  Or having to implement a network protocol that
doesn't seem to be designed "perfectly", yet is used by a few hundred
million systems so we have to remain compatible.  This is all that we
are doing here for stuff like this.

Remember, this is called kDBUS, not kGENERICIPC, no matter how much we
would have liked that to happen from a kernel standpoint. :)

> > And the benchmarks and source were posted by David previously, with full
> > details, this is the first time I've heard you could not reproduce them
> > using that code.
> 
> No it's not.  But I got bored and didn't try again.

Sorry, I was not aware of that.

> >> The metadata thing is problematic.  It seems to be intended to serve
> >> two purposes: data gathering for logging and authentication.

You forgot about introspection, more on that below.

> >> Unfortunately, it has issues.  There are no fewer than *three*
> >> metadata capture points: creation of a bus, connection to a bus, and
> >> sending of a message.  The kdbus authors like to point out that these
> >> are all optional, but IMO that's bunk.  Someone will write a userspace
> >> library that rejects messages from people who don't enable all of
> >> them, then then we're screwed.
> >
> > Remember, you asked for it to be optional, it wasn't in the beginning :)
> >
> > So let's make it not optional, great.  And the capture points are in
> > different places as it is different data and entry points.
> 
> Then I'll have to find a way to embolden my NACK further.  My point is
> that capturing garbage like cmdline and capabilities (again, that
> latter part is completely unacceptable under any circumstances
> whatsoever) on behalf of *all* senders is a disaster.  If it's
> optional, then I can at least hope that userspace will honor the
> optionality and let everything turn it off.  If it's mandatory, then
> kdbus is just unsafe to use to send messages to untrusted parties.

It's opted in by the receiving peer if the task implementing a service
wants to access these pieces of information.  It is optional, and the
documentation clearly states that userspace should cope with this, and
also, when they are available we make sure to provide the correct
race-free information.

As said many times before, an application can do so already today with
information from other API file systems, so why is this suddenly a
problem when kdbus optionally offers the exact same information along
with each transmitted message?  Yes, we all "hate" capabilities, but
userspace uses them, and gets access to them all the time through the
POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through
/proc/pid/status.  They are something that we have to support and handle
properly.

In the very first submission of kdbus, we stated that we want to allow
userspace methods to access these same bits to be able to make decisions
about permissions.  And to do so in a race-free manner, which is very
hard, if not almost impossible, to do so from userspace alone.

For instance, if a task has CAP_NET_ADMIN set, we can use that
information in order to allow or disallow certain actions to be taken by
a privileged process.  Or, if a client that has the capability to call
reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the
system, the system daemon listening for that message knows that yes, at
the time that the client made that call, it really did have that
capability so it is ok to actually reboot the system.

Instead of trying to use SCM_CREDENTIALS to get the pid and another
round of cap_get_pid() and the like, all of which are susceptable to
racing and all sorts of other horrors, that are insecure, we can provide
this information in an atomic, and secure way.

The kernel today, and userspace, relies on capabilities all the time
(i.e. almost every syscall), how are they something that is somehow not
valid to use and support?


And of course, as Eric will point out, capabailities are not
translatable across user namespaces, which is a problem.  Because of
this, we dispose of that piece of metadata information when a message
crosses a user namespace boundry.  This is the right thing to do, which
is not the case for almost all other kernel apis which report bogus
capabilies when user namespaces are crossed.

So we implemented this correctly, and somehow that is a feature so bad
that both you and Eric think the whole baby should be thrown out?  How
else should this be implemented?

As for the command line information, yes, it is "unsafe", and we clearly
state taht in the documentation.  However, it is still a very valid
piece of information.  For example, when a service is activated by a
method, getting to know which binary caused that to happen is very
usefull when debugging.  It's also very useful when debugging multi-call
binaries because the command line actually tells you argv[0] correctly.

Because of this, that's why lots of userspace tools use the command line
information today, again, providing that information is a help to them,
why wouldn't we provide them that help when we have access to it?

Metadata attachment has always been optional, based on the setting of
the receiving peer, but we have added, at your request, the ability to
globally limit what kdbus is able to transport for that metadata,
regardless of the settings on both sides.  It sounds like this option
isn't liked, and I'll be glad to revert it as I do think the metadata is
useful and wanted.

> >> Why are we screwed?  Because any kdbus client *won't know which
> >> metadata matters*.  That means that we automatically have the worst of
> >> all worlds, not the best.  Also, the bus creation metadata is
> >> completely worthless for anything other than logging, but someone will
> >> use it for something other than logging, at which point it's
> >> vulnerable to a DoS.  No one has explained to my satisfaction why this
> >> isn't a problem.


Metadata is gathered for logging, authentication and introspection.  Bus
creator metadata is not used for logging or authentication, but for
introspection only. It could be really useful for a service that has a
bus handle to actually know which bus it is connected to, but it's not
supposed to be used as authentication measure.  So I was wrong to think
that the SELinux people use it, sorry about that.

Remember there are three different places that metadata is collected,
for three different things.  Yeah, we call them all "metadata", which is
probably why the confusion here, but these all are different "things"
entirely, and the documentation does describe this really well.  If not,
please let me know and I will work on it to make it more clear.

The important point here is that you cannot look at this concept without
keeping the dbus spec in scope. Nobody is supposed to write native kdbus
clients directly. you can, of course, but the entire concept of how
services are implemented follows a higer-level logic which is supposed
to be implemented by high-level libraries.

Yes, this isn't the best argument for why you might feel more
comforatable about merging this code, as us kernel developers are used
to stand-alone apis that they can use without library helpers, but it is
common and needed.  But really, when was the last time you wrote an ALSA
library from scratch?  :)

Again, remember the compatibility requirements for your userspace D-Bus
clients today, we have to ensure this, or this code is pointless.

A word about introspection.  In talking about this with Daniel on IRC
today, he came up with this good example to explain it better to me, as
I didn't quite understand it well.  I'll paraphrase it here, keeping
with the "bus" metaphor that D-Bus requires:

	Imagine we're all taking a little tour, out to the nature, a
	lake or something.  We're taking a bus to get there.  The bus can
	accommodate a large number of people, and we don't know yet who
	will join.  Everybody who enters the bus has to show their
	passport to the conductor (refrain from calling it driver,
	because hell no, it clearly isn't a driver!!  ;)).

	The conductor makes a copy of each of the people entering the
	bus, because it wants to know who's on the bus.  One property of
	that strange bunch of programmers on the bus is that they don't
	necessarily respond to anything, but whenever anyone in the bus
	talks to another person, they show their passport in order to
	identify themselves, because you know you can't trust anyone.

	Next, the police stops the bus and wants to know who's on it.
	As the programmers usually don't respond when being spoken to,
	especially if it is the police, the bus conductor hands out a
	list of all the passport copies he gathered.  That is called
	introspection that is not backed by cooperative bus members.
	The conductor makes a copy of each OF THE PASSPORT of the people
	entering the bus, to help the police (i.e. debuggers) determine
	who is on the bus.

	It is a property of the bus itself which describes which
	personal data you have to give to the conductor in order to be
	allowed in.  If you're not willing to give out all the bits the
	bus requires you to, you have to stay out.  That's not a problem
	of the system, but rather something to discuss with the owner of
	the bus.  This way, it is totally possible to have a bus that
	does not require anything from its passengers, and passengers
	that do not allow any personal information to be revealed, but
	then the police can't do much of course when it stops the bus in
	order to introspect it.

	Then there is a set of global laws in that world in which all
	the busses live. These laws define which data is allowed to be
	passed around at all in general.  When a bus requires its
	passengers to reveal their hair color, for instance, but passing
	that information around is forbidden by global law.  This
	requirement is ignored when buses are created or anyone enters
	any of those buses.

	And to complete the story and outline the differences of the
	passports that were used to make a copy from and the one that is
	used during communication, we'd have to a add story about people
	changing their hair color constantly in the washroom on the back
	of the bus, out of sight of the conductor, but this metaphor is
	getting quite long enough already...

Does that help explain introspection and the need for it here?


> >> Also, the metadata code captures things that are, in my book
> >> completely unacceptable, such as cmdline and (!) capabilities.  I bet
> >> that the cmdline capture is extra special fscked up when cgroups and
> >> such are in play because *it reads from the sender's VM*.  IOW it's
> >> insecure and pointless.  (OK, it has a point: logging.  But I really
> >> don't think that belongs in the kernel.)
> >
> > The sender's vm is what is wanted here.  And cmdline is something that
> > userspace gets today, and does things with, as does SELinux, and
> > auditing.  Same for capabilities, it's not insecure and pointless, it's
> > the same thing that is provided to userspace, and userspace makes
> > decisions on today, independent of kdbus/dbus.
> 
> Is there anything that userspace makes decisions on based on
> capabilities?  If so, please tell me and I'll entertain myself by
> writing exploits for them.
> 
> The fact that some existing userspace does awful things does *not*
> justify adding new kernel mechanisms with which to repeat those
> mistakes.

polkit used to do something like this, but the obvious race conditions
that you know about prevented it from working properly, so other odd
work-arounds had to be created.  However, if we can provide this in a
race free manner, those work-arounds are no longer needed.

As documented in the original email on this thread, Tizen wants to use
this, as it solves a real need that they have.  Their workarounds
involve using custom UDS sockets, but the latency involved is horrid and
unacceptable.  Using a kdbus message solves this issue for them,
allowing UI rendering to work properly/quickly.

Again, capabilities are something we all require and rely on today,
passing the current capability on to a recipient isn't a way to raise
privileges at all, but rather, properly determine if they are present
at sending time, if wanted.  How does that create an insecure system?
What am I missing that is so bad here with the design we have?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14  0:19   ` Eric W. Biederman
  2015-04-14  0:34     ` Andy Lutomirski
@ 2015-04-14 17:55     ` Greg Kroah-Hartman
  2015-04-21 21:06       ` Issues with capability bits and meta-data in kdbus Eric W. Biederman
  1 sibling, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-14 17:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg,
	jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz

On Mon, Apr 13, 2015 at 07:19:49PM -0500, Eric W. Biederman wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
> 
> > Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> >
> >> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >>
> >>   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >>
> >> are available in the git repository at:
> >>
> >>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >>
> >> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >>
> >>   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >>
> >> ----------------------------------------------------------------
> >> kdbus for 4.1-rc1
> >>
> >> Here's the kdbus pull request for 4.1-rc1.
> >>
> >> It's been under development for many years now, and been in linux-next
> >> for many months, and has undergone loads of testing a review and even a few
> >> good arguments.  It comes with full documentation and tests.
> >
> >> There has been a few complaints about the code, notably from people who
> >> don't like the use of metadata in the bus messages.  That is actually
> >> one of the main features here, as we can get this data in a secure and
> >> reliable way, and it's something that userspace requires today.  So
> >> while it does look "odd" to people who are not familiar with dbus, this
> >> is something that finally fixes a number of almost unfixable races in
> >> the current dbus implementations.
> >
> > And the code that transfers the meta-data is wrong.
> 
> In fact it is worse than I thought.

Please see the email response I just wrote to Andy about this, it should
address these misconceptions.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 17:50       ` Greg Kroah-Hartman
@ 2015-04-14 18:57         ` Andy Lutomirski
  2015-04-14 19:23           ` Greg Kroah-Hartman
  2015-04-15 12:00           ` Greg Kroah-Hartman
  2015-04-14 22:33         ` Jiri Kosina
  1 sibling, 2 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-14 18:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
>> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
>> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
>> >> <gregkh@linuxfoundation.org> wrote:
>> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>> >> >
>> >> >   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>> >> >
>> >> > are available in the git repository at:
>> >> >
>> >> >   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>> >> >
>> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>> >> >
>> >> >   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>> >> >
>> >> > ----------------------------------------------------------------
>> >> > kdbus for 4.1-rc1
>> >> >
>> >> > Here's the kdbus pull request for 4.1-rc1.
>> >> >
>> >> > It's been under development for many years now, and been in linux-next
>> >> > for many months, and has undergone loads of testing a review and even a few
>> >> > good arguments.  It comes with full documentation and tests.
>> >> >
>> >> > There has been a few complaints about the code, notably from people who
>> >> > don't like the use of metadata in the bus messages.  That is actually
>> >> > one of the main features here, as we can get this data in a secure and
>> >> > reliable way, and it's something that userspace requires today.  So
>> >> > while it does look "odd" to people who are not familiar with dbus, this
>> >> > is something that finally fixes a number of almost unfixable races in
>> >> > the current dbus implementations.
>> >>
>> >> While I generally like the concept of having a better in-kernel IPC
>> >> mechanism, after some consideration I don't think this belongs in the
>> >> kernel in its current form.  Here's why.
>> >>
>> >> First, the naming is counterintuitive.  There are "endpoints", but you
>> >> don't send messages to endpoints.  In fact, an basic kdbus setup will
>> >> have exactly one endpoint AFAICT.  Wtf?  This makes talking about it
>> >> awkward.
>> >
>> > Did you read the documentation?  We've been over this before, and it
>> > should all be addressed in the documentation based on this coming up.
>> >
>> >> A lot of the design seems to be to violate the concept of "mechanism,
>> >> not policy".  Kdbus is very much a port of userspace dbus to the
>> >> kernel, and it appears to be a port designed to preserve some
>> >> questionable design decisions instead of learning from them.
>> >>
>> >> For example, kdbus sticks a whole policy database in the kernel, but
>> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
>> >> *not* a simple set of rules like "if A then allow B".  Instead it has
>> >> really weird dependencies not on what name you're sending to but on
>> >> what *other* names the thing you're sending to has.  Sorry, but this
>> >> way lies (a) the inability for a large set of developers to understand
>> >> what's going on and (b) security bugs.  Also, the result probably
>> >> can't be reused as part of a non-legacy-filled sensible design
>> >
>> > What policy database?  Matching messages to subscribers?  That's the
>> > same type of "database" that other ipc subsystems need/want, there's
>> > nothing radical here.
>>
>> Let me quote from the latest version of the kdbus docs:
>>
>>       Note that TALK access is checked against all names of a connection. For
>>       example, if a connection owns both <constant>'org.foo.bar'</constant> and
>>       <constant>'org.blah.baz'</constant>, and the policy database allows
>>       <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
>>       permission is also granted to <constant>'org.foo.bar'</constant>. That
>>       might sound illogical, but after all, we allow messages to be directed to
>>       either the ID or a well-known name, and policy is applied to the
>>       connection, not the name. In other words, the effective TALK policy for a
>>       connection is the most permissive of all names the connection owns.
>>
>> In my humble opinion, this paragraph speaks for itself.  The design is
>> bad, full stop.
>
> First off, thanks for reading the docs, I appreciate that.  But realize
> also, that this is straight from the D-Bus spec.  We aren't doing
> anything "radical" here, this is what your desktop uses that you are
> typing your email from.
>
> Yes, it's an unfortunate design, but one that we are all stuck with
> (think of it as having to implement code for horrid hardware that you
> have to get to work properly.)

I agree.  You've sent a pull request for an unfortunate design.  I
don't think that unfortunate design belongs in the kernel.  If it says
in userspace, then user programmers could potentially fix it some day.

>  There are many applications out there
> which don't address messages to their well-known name destination but
> to the ID which they looked up earlier and cached. In fact, that
> behavior is the default in the gdbus library implementation.
>
> If a connection owns two names, and one is more permissive than the
> other one, an attacker could as well choose the more openly configured
> name to get a message delivered.  That's nothing we can protect from
> really.  So ideally you never do that, just like you shouldn't do that
> in an network configuration with DNS, if you want to manage access
> properly.
>
> The logic here is comparable to IP vs. DNS

[snip some]

It's comparable to someone trying to write a firewall that filters on
DNS names.  There's a good reason that people don't do that.

[snip]


>>
>> Then I'll have to find a way to embolden my NACK further.  My point is
>> that capturing garbage like cmdline and capabilities (again, that
>> latter part is completely unacceptable under any circumstances
>> whatsoever) on behalf of *all* senders is a disaster.  If it's
>> optional, then I can at least hope that userspace will honor the
>> optionality and let everything turn it off.  If it's mandatory, then
>> kdbus is just unsafe to use to send messages to untrusted parties.
>
> It's opted in by the receiving peer if the task implementing a service
> wants to access these pieces of information.  It is optional, and the
> documentation clearly states that userspace should cope with this, and
> also, when they are available we make sure to provide the correct
> race-free information.
>
> As said many times before, an application can do so already today with
> information from other API file systems, so why is this suddenly a
> problem when kdbus optionally offers the exact same information along
> with each transmitted message?  Yes, we all "hate" capabilities, but
> userspace uses them, and gets access to them all the time through the
> POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through
> /proc/pid/status.  They are something that we have to support and handle
> properly.
>
> In the very first submission of kdbus, we stated that we want to allow
> userspace methods to access these same bits to be able to make decisions
> about permissions.  And to do so in a race-free manner, which is very
> hard, if not almost impossible, to do so from userspace alone.
>
> For instance, if a task has CAP_NET_ADMIN set, we can use that
> information in order to allow or disallow certain actions to be taken by
> a privileged process.  Or, if a client that has the capability to call
> reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the
> system, the system daemon listening for that message knows that yes, at
> the time that the client made that call, it really did have that
> capability so it is ok to actually reboot the system.
>
> Instead of trying to use SCM_CREDENTIALS to get the pid and another
> round of cap_get_pid() and the like, all of which are susceptable to
> racing and all sorts of other horrors, that are insecure, we can provide
> this information in an atomic, and secure way.

/me suppresses a long string of expletives.

Please point me at the code that does this with caps.  It's WRONG in
userspace and it's WRONG in the kernel.  I want to know what code that
runs on my system does this so I can send the appropriate bug reports
and get it fixed.  I think the RHEL crowd at least will take it
seriously when I tell them that this is a security hole.

>
> The kernel today, and userspace, relies on capabilities all the time
> (i.e. almost every syscall), how are they something that is somehow not
> valid to use and support?

No.  The *kernel* relies on caps.  Userspace should not.

>
>
> And of course, as Eric will point out, capabailities are not
> translatable across user namespaces, which is a problem.  Because of
> this, we dispose of that piece of metadata information when a message
> crosses a user namespace boundry.  This is the right thing to do, which
> is not the case for almost all other kernel apis which report bogus
> capabilies when user namespaces are crossed.

The right thing to do is to not use capabilities for userspace stuff.

>
> So we implemented this correctly, and somehow that is a feature so bad
> that both you and Eric think the whole baby should be thrown out?  How
> else should this be implemented?

It shouldn't be implemented.

>
> As documented in the original email on this thread, Tizen wants to use
> this, as it solves a real need that they have.  Their workarounds
> involve using custom UDS sockets, but the latency involved is horrid and
> unacceptable.  Using a kdbus message solves this issue for them,
> allowing UI rendering to work properly/quickly.
>
> Again, capabilities are something we all require and rely on today,
> passing the current capability on to a recipient isn't a way to raise
> privileges at all, but rather, properly determine if they are present
> at sending time, if wanted.  How does that create an insecure system?
> What am I missing that is so bad here with the design we have?

That, even if the implementation could be made to be useful and
correct, capabilities refer to privileges wrt the kernel, not
userspace.  They're not the right bit of policy to look at here.

For example, the thing that should make it possible to run 'systemctl
reboot' or whatever is not CAP_SYS_BOOT, because CAP_SYS_BOOT is the
permission to hard reboot the system immediately, and that's not what
'systemctl reboot' is for.

I find myself comparing kdbus to win32k, and that's not a good sign...


--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 18:57         ` Andy Lutomirski
@ 2015-04-14 19:23           ` Greg Kroah-Hartman
  2015-04-14 19:24             ` Borislav Petkov
                               ` (2 more replies)
  2015-04-15 12:00           ` Greg Kroah-Hartman
  1 sibling, 3 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-14 19:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote:
> On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
> >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
> >> <gregkh@linuxfoundation.org> wrote:
> >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
> >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
> >> >> <gregkh@linuxfoundation.org> wrote:
> >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >> >> >
> >> >> >   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >> >> >
> >> >> > are available in the git repository at:
> >> >> >
> >> >> >   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >> >> >
> >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >> >> >
> >> >> >   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >> >> >
> >> >> > ----------------------------------------------------------------
> >> >> > kdbus for 4.1-rc1
> >> >> >
> >> >> > Here's the kdbus pull request for 4.1-rc1.
> >> >> >
> >> >> > It's been under development for many years now, and been in linux-next
> >> >> > for many months, and has undergone loads of testing a review and even a few
> >> >> > good arguments.  It comes with full documentation and tests.
> >> >> >
> >> >> > There has been a few complaints about the code, notably from people who
> >> >> > don't like the use of metadata in the bus messages.  That is actually
> >> >> > one of the main features here, as we can get this data in a secure and
> >> >> > reliable way, and it's something that userspace requires today.  So
> >> >> > while it does look "odd" to people who are not familiar with dbus, this
> >> >> > is something that finally fixes a number of almost unfixable races in
> >> >> > the current dbus implementations.
> >> >>
> >> >> While I generally like the concept of having a better in-kernel IPC
> >> >> mechanism, after some consideration I don't think this belongs in the
> >> >> kernel in its current form.  Here's why.
> >> >>
> >> >> First, the naming is counterintuitive.  There are "endpoints", but you
> >> >> don't send messages to endpoints.  In fact, an basic kdbus setup will
> >> >> have exactly one endpoint AFAICT.  Wtf?  This makes talking about it
> >> >> awkward.
> >> >
> >> > Did you read the documentation?  We've been over this before, and it
> >> > should all be addressed in the documentation based on this coming up.
> >> >
> >> >> A lot of the design seems to be to violate the concept of "mechanism,
> >> >> not policy".  Kdbus is very much a port of userspace dbus to the
> >> >> kernel, and it appears to be a port designed to preserve some
> >> >> questionable design decisions instead of learning from them.
> >> >>
> >> >> For example, kdbus sticks a whole policy database in the kernel, but
> >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
> >> >> *not* a simple set of rules like "if A then allow B".  Instead it has
> >> >> really weird dependencies not on what name you're sending to but on
> >> >> what *other* names the thing you're sending to has.  Sorry, but this
> >> >> way lies (a) the inability for a large set of developers to understand
> >> >> what's going on and (b) security bugs.  Also, the result probably
> >> >> can't be reused as part of a non-legacy-filled sensible design
> >> >
> >> > What policy database?  Matching messages to subscribers?  That's the
> >> > same type of "database" that other ipc subsystems need/want, there's
> >> > nothing radical here.
> >>
> >> Let me quote from the latest version of the kdbus docs:
> >>
> >>       Note that TALK access is checked against all names of a connection. For
> >>       example, if a connection owns both <constant>'org.foo.bar'</constant> and
> >>       <constant>'org.blah.baz'</constant>, and the policy database allows
> >>       <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
> >>       permission is also granted to <constant>'org.foo.bar'</constant>. That
> >>       might sound illogical, but after all, we allow messages to be directed to
> >>       either the ID or a well-known name, and policy is applied to the
> >>       connection, not the name. In other words, the effective TALK policy for a
> >>       connection is the most permissive of all names the connection owns.
> >>
> >> In my humble opinion, this paragraph speaks for itself.  The design is
> >> bad, full stop.
> >
> > First off, thanks for reading the docs, I appreciate that.  But realize
> > also, that this is straight from the D-Bus spec.  We aren't doing
> > anything "radical" here, this is what your desktop uses that you are
> > typing your email from.
> >
> > Yes, it's an unfortunate design, but one that we are all stuck with
> > (think of it as having to implement code for horrid hardware that you
> > have to get to work properly.)
> 
> I agree.  You've sent a pull request for an unfortunate design.  I
> don't think that unfortunate design belongs in the kernel.  If it says
> in userspace, then user programmers could potentially fix it some day.

You might not like the design, but it is a valid design.  Again, we
don't refuse to support hardware that is designed badly.  Or support
protocols we don't necessarily like, that's not the job of a kernel or
operating system.

And here's Havoc's response as to why actually, this is a good design:
	http://lists.freedesktop.org/archives/dbus/2015-April/016651.html

so while we might not think it's nice, maybe we are just not that
knowledgeable in this design space, and need to trust those that are.

I know I do.

I'll respond to the rest after I get some dinner...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:23           ` Greg Kroah-Hartman
@ 2015-04-14 19:24             ` Borislav Petkov
  2015-04-14 19:32               ` Greg Kroah-Hartman
  2015-04-14 19:35             ` Al Viro
  2015-04-14 20:14             ` John Stoffel
  2 siblings, 1 reply; 316+ messages in thread
From: Borislav Petkov @ 2015-04-14 19:24 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
> You might not like the design, but it is a valid design.  Again, we
> don't refuse to support hardware that is designed badly.

Yeah except the small difference that unlike this, we can't change
hardware.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:24             ` Borislav Petkov
@ 2015-04-14 19:32               ` Greg Kroah-Hartman
  2015-04-14 19:40                 ` Al Viro
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-14 19:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote:
> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
> > You might not like the design, but it is a valid design.  Again, we
> > don't refuse to support hardware that is designed badly.
> 
> Yeah except the small difference that unlike this, we can't change
> hardware.

And we can't change the design/implementation of many things, again,
it's not the kernel's job to prevent something, just because we don't
like the RFC, from being accepted.

Go read Havoc's email about why the design is the way it is that I just
posted.  Maybe we are the ones that really don't know the issues
involved enough to say that the current design is somehow "wrong".

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:23           ` Greg Kroah-Hartman
  2015-04-14 19:24             ` Borislav Petkov
@ 2015-04-14 19:35             ` Al Viro
  2015-04-14 19:43               ` Greg Kroah-Hartman
  2015-04-14 20:14             ` John Stoffel
  2 siblings, 1 reply; 316+ messages in thread
From: Al Viro @ 2015-04-14 19:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:

> > I agree.  You've sent a pull request for an unfortunate design.  I
> > don't think that unfortunate design belongs in the kernel.  If it says
> > in userspace, then user programmers could potentially fix it some day.
> 
> You might not like the design, but it is a valid design.  Again, we
> don't refuse to support hardware that is designed badly.  Or support
> protocols we don't necessarily like, that's not the job of a kernel or
> operating system.

Bullshit.  The problem you seem to deliberately ignore is that once it's
in the kernel, it's impossible to eradicate.  It's not just a crap design,
it's a crap design you are taking in as-is.

And no, "the sole consumer of that API knows better, so bend over" is not
a good idea.  We have shitloads of examples when single-consumer APIs
turned into screaming horrors; taking that in over the objections to API
design, merely on "they do it that way, who the hell we are to say they
are wrong?" is insane.

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:32               ` Greg Kroah-Hartman
@ 2015-04-14 19:40                 ` Al Viro
  2015-04-14 19:48                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 316+ messages in thread
From: Al Viro @ 2015-04-14 19:40 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Tue, Apr 14, 2015 at 09:32:29PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote:
> > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
> > > You might not like the design, but it is a valid design.  Again, we
> > > don't refuse to support hardware that is designed badly.
> > 
> > Yeah except the small difference that unlike this, we can't change
> > hardware.
> 
> And we can't change the design/implementation of many things, again,
> it's not the kernel's job to prevent something, just because we don't
> like the RFC, from being accepted.

Translate, please.  What exactly will be prevented by NAK on your Fine
Piece Of Software?  Not dbus working as it does, surely?

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:35             ` Al Viro
@ 2015-04-14 19:43               ` Greg Kroah-Hartman
  2015-04-15 17:59                 ` Austin S Hemmelgarn
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-14 19:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote:
> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
> 
> > > I agree.  You've sent a pull request for an unfortunate design.  I
> > > don't think that unfortunate design belongs in the kernel.  If it says
> > > in userspace, then user programmers could potentially fix it some day.
> > 
> > You might not like the design, but it is a valid design.  Again, we
> > don't refuse to support hardware that is designed badly.  Or support
> > protocols we don't necessarily like, that's not the job of a kernel or
> > operating system.
> 
> Bullshit.  The problem you seem to deliberately ignore is that once it's
> in the kernel, it's impossible to eradicate.  It's not just a crap design,
> it's a crap design you are taking in as-is.

It is not a crap design.  Go read the link I provided.  Havoc points out
exactly why the design is the way it is, for very valid reasons.  It's
actually much like X11 is as well, but not like "normal" IP connections
at all.

> And no, "the sole consumer of that API knows better, so bend over" is not
> a good idea.  We have shitloads of examples when single-consumer APIs
> turned into screaming horrors; taking that in over the objections to API
> design, merely on "they do it that way, who the hell we are to say they
> are wrong?" is insane.

Again, in this domain, the design is sound.  So much so that everyone
who works in that area moved toward it (KDE, Qt, Go, etc.)  We might not
think it makes sense, and it did take me a while to wrap my head around
it, but to call it "crap" is unfair, sorry.

greg k-h


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:40                 ` Al Viro
@ 2015-04-14 19:48                   ` Greg Kroah-Hartman
  2015-04-14 19:53                     ` Borislav Petkov
                                       ` (2 more replies)
  0 siblings, 3 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-14 19:48 UTC (permalink / raw)
  To: Al Viro
  Cc: Borislav Petkov, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Tue, Apr 14, 2015 at 08:40:04PM +0100, Al Viro wrote:
> On Tue, Apr 14, 2015 at 09:32:29PM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote:
> > > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
> > > > You might not like the design, but it is a valid design.  Again, we
> > > > don't refuse to support hardware that is designed badly.
> > > 
> > > Yeah except the small difference that unlike this, we can't change
> > > hardware.
> > 
> > And we can't change the design/implementation of many things, again,
> > it's not the kernel's job to prevent something, just because we don't
> > like the RFC, from being accepted.
> 
> Translate, please.  What exactly will be prevented by NAK on your Fine
> Piece Of Software?  Not dbus working as it does, surely?

I don't understand.  You can not like the D-Bus model (and accordingly
the X11 model), but to prevent users from wanting to use it in a more
secure, and faster way by implementing it like we have seems very odd to
me.

It's not going to stop anything from working, it's just going to stop
some programs from being able to do things they really want to do (see
the first email for examples.)

Yes, we could make this live outside the kernel tree, but that's not the
way we work anymore.  We merge things that are useful, that match our
security and coding requirements, and are going to be maintained by
people we trust.  To have the only major objection be "we don't like the
way the protocol is designed because we know better, sorry", isn't ok at
all.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:48                   ` Greg Kroah-Hartman
@ 2015-04-14 19:53                     ` Borislav Petkov
  2015-04-15  8:44                       ` Greg Kroah-Hartman
  2015-04-14 20:11                     ` Martin Steigerwald
  2015-04-14 22:39                     ` Jiri Kosina
  2 siblings, 1 reply; 316+ messages in thread
From: Borislav Petkov @ 2015-04-14 19:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Tue, Apr 14, 2015 at 09:48:04PM +0200, Greg Kroah-Hartman wrote:
> It's not going to stop anything from working, it's just going to stop
> some programs from being able to do things they really want to do (see
> the first email for examples.)

Until it is made "mandatory" as Al said earlier.

> Yes, we could make this live outside the kernel tree, but that's not the
> way we work anymore.

> We merge things that are useful, that match our
> security and coding requirements, and are going to be maintained by
> people we trust.

We trust? I'm not going to even comment on that.

And frankly, merging a useful piece of code sounds completely different
to me than this serious backlash I'm reading from the sidelines.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:48                   ` Greg Kroah-Hartman
  2015-04-14 19:53                     ` Borislav Petkov
@ 2015-04-14 20:11                     ` Martin Steigerwald
  2015-04-14 22:39                     ` Jiri Kosina
  2 siblings, 0 replies; 316+ messages in thread
From: Martin Steigerwald @ 2015-04-14 20:11 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

Am Dienstag, 14. April 2015, 21:48:04 schrieb Greg Kroah-Hartman:
> On Tue, Apr 14, 2015 at 08:40:04PM +0100, Al Viro wrote:
> > On Tue, Apr 14, 2015 at 09:32:29PM +0200, Greg Kroah-Hartman wrote:
> > > On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote:
> > > > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman 
wrote:
> > > > > You might not like the design, but it is a valid design.  Again,
> > > > > we
> > > > > don't refuse to support hardware that is designed badly.
> > > > 
> > > > Yeah except the small difference that unlike this, we can't change
> > > > hardware.
> > > 
> > > And we can't change the design/implementation of many things, again,
> > > it's not the kernel's job to prevent something, just because we
> > > don't
> > > like the RFC, from being accepted.
> > 
> > Translate, please.  What exactly will be prevented by NAK on your Fine
> > Piece Of Software?  Not dbus working as it does, surely?
> 
> I don't understand.  You can not like the D-Bus model (and accordingly
> the X11 model), but to prevent users from wanting to use it in a more
> secure, and faster way by implementing it like we have seems very odd to
> me.
> 
> It's not going to stop anything from working, it's just going to stop
> some programs from being able to do things they really want to do (see
> the first email for examples.)
> 
> Yes, we could make this live outside the kernel tree, but that's not the
> way we work anymore.  We merge things that are useful, that match our
> security and coding requirements, and are going to be maintained by
> people we trust.  To have the only major objection be "we don't like
> the way the protocol is designed because we know better, sorry", isn't
> ok at all.

Greg, I think I understood Al here.

dbus as it is used in KDE, GNOME, network-manager, systemd, you name it 
does work. Not merging kdbus will not break it.

So the ones who want to see kdbus in kernel want to do something better or 
differently like it is currently done in dbus. And yes, I have seen the 
presentations about the benefits of having dbus in the kernel.

But if thats the case, what I think Al asks for a *new* kernel component 
is a sound design that does not repeat any flaws from the original design 
as the original design is no hardware that cannot be changed anymore after 
production.

And to whether the design of kdbus is sound there seem to be strong 
different oppinions about it. I think it is important to accept that and go 
from there.

On the other hand, if you do things differently enough from the way 
userspace dbus is doing it in order to have such a sound design, it may be 
necessary to adapt all applications to it. But since kdbus is not yet in 
the kernel officially this would not violate the "we never ever break 
userspace" rule, cause the kernel obviously doesn´t guarantee the 
stability of the current userspace dbus API, cause it doesn´t yet have 
such an API at all. But if kdbus goes in, it has, and then it needs to 
guarantee it until this "never break userspace" rule is changed, *if* 
ever.

And also: Even if the kernel API is different in order to be sound, it may 
be possible to adapt userspace dbus to use it to improve upon some of its 
current flaws so that applications using it do not need to be changed at 
all.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:23           ` Greg Kroah-Hartman
  2015-04-14 19:24             ` Borislav Petkov
  2015-04-14 19:35             ` Al Viro
@ 2015-04-14 20:14             ` John Stoffel
  2015-04-14 21:51               ` Steven Rostedt
  2015-04-15  8:35               ` Greg Kroah-Hartman
  2 siblings, 2 replies; 316+ messages in thread
From: John Stoffel @ 2015-04-14 20:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

>>>>> "Greg" == Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:

Greg> On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote:
>> On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
>> >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
>> >> <gregkh@linuxfoundation.org> wrote:
>> >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
>> >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
>> >> >> <gregkh@linuxfoundation.org> wrote:
>> >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>> >> >> >
>> >> >> >   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>> >> >> >
>> >> >> > are available in the git repository at:
>> >> >> >
>> >> >> >   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>> >> >> >
>> >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>> >> >> >
>> >> >> >   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>> >> >> >
>> >> >> > ----------------------------------------------------------------
>> >> >> > kdbus for 4.1-rc1
>> >> >> >
>> >> >> > Here's the kdbus pull request for 4.1-rc1.
>> >> >> >
>> >> >> > It's been under development for many years now, and been in linux-next
>> >> >> > for many months, and has undergone loads of testing a review and even a few
>> >> >> > good arguments.  It comes with full documentation and tests.
>> >> >> >
>> >> >> > There has been a few complaints about the code, notably from people who
>> >> >> > don't like the use of metadata in the bus messages.  That is actually
>> >> >> > one of the main features here, as we can get this data in a secure and
>> >> >> > reliable way, and it's something that userspace requires today.  So
>> >> >> > while it does look "odd" to people who are not familiar with dbus, this
>> >> >> > is something that finally fixes a number of almost unfixable races in
>> >> >> > the current dbus implementations.
>> >> >>
>> >> >> While I generally like the concept of having a better in-kernel IPC
>> >> >> mechanism, after some consideration I don't think this belongs in the
>> >> >> kernel in its current form.  Here's why.
>> >> >>
>> >> >> First, the naming is counterintuitive.  There are "endpoints", but you
>> >> >> don't send messages to endpoints.  In fact, an basic kdbus setup will
>> >> >> have exactly one endpoint AFAICT.  Wtf?  This makes talking about it
>> >> >> awkward.
>> >> >
>> >> > Did you read the documentation?  We've been over this before, and it
>> >> > should all be addressed in the documentation based on this coming up.
>> >> >
>> >> >> A lot of the design seems to be to violate the concept of "mechanism,
>> >> >> not policy".  Kdbus is very much a port of userspace dbus to the
>> >> >> kernel, and it appears to be a port designed to preserve some
>> >> >> questionable design decisions instead of learning from them.
>> >> >>
>> >> >> For example, kdbus sticks a whole policy database in the kernel, but
>> >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
>> >> >> *not* a simple set of rules like "if A then allow B".  Instead it has
>> >> >> really weird dependencies not on what name you're sending to but on
>> >> >> what *other* names the thing you're sending to has.  Sorry, but this
>> >> >> way lies (a) the inability for a large set of developers to understand
>> >> >> what's going on and (b) security bugs.  Also, the result probably
>> >> >> can't be reused as part of a non-legacy-filled sensible design
>> >> >
>> >> > What policy database?  Matching messages to subscribers?  That's the
>> >> > same type of "database" that other ipc subsystems need/want, there's
>> >> > nothing radical here.
>> >>
>> >> Let me quote from the latest version of the kdbus docs:
>> >>
>> >>       Note that TALK access is checked against all names of a connection. For
>> >>       example, if a connection owns both <constant>'org.foo.bar'</constant> and
>> >>       <constant>'org.blah.baz'</constant>, and the policy database allows
>> >>       <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
>> >>       permission is also granted to <constant>'org.foo.bar'</constant>. That
>> >>       might sound illogical, but after all, we allow messages to be directed to
>> >>       either the ID or a well-known name, and policy is applied to the
>> >>       connection, not the name. In other words, the effective TALK policy for a
>> >>       connection is the most permissive of all names the connection owns.
>> >>
>> >> In my humble opinion, this paragraph speaks for itself.  The design is
>> >> bad, full stop.
>> >
>> > First off, thanks for reading the docs, I appreciate that.  But realize
>> > also, that this is straight from the D-Bus spec.  We aren't doing
>> > anything "radical" here, this is what your desktop uses that you are
>> > typing your email from.
>> >
>> > Yes, it's an unfortunate design, but one that we are all stuck with
>> > (think of it as having to implement code for horrid hardware that you
>> > have to get to work properly.)
>> 
>> I agree.  You've sent a pull request for an unfortunate design.  I
>> don't think that unfortunate design belongs in the kernel.  If it says
>> in userspace, then user programmers could potentially fix it some day.

Greg> You might not like the design, but it is a valid design.  Again, we
Greg> don't refuse to support hardware that is designed badly.  Or support
Greg> protocols we don't necessarily like, that's not the job of a kernel or
Greg> operating system.

Greg> And here's Havoc's response as to why actually, this is a good design:
Greg> 	http://lists.freedesktop.org/archives/dbus/2015-April/016651.html

This is an interesting discussion, and one thing that sticks out to me
is the comments in the URL above talking about how clients are
supposed to use a generic name to bind to a resource, but actually do
a lookup to get the specific name, and then bind to THAT.

So the security concerns raised by Andy do seem to make sense, in that
either security needs to be the same across all names of a service, so
that you don't have problems with varying levels once people have
connected.  In terms of the X11 analogy, if I have someone connect,
and then I do 'xhost -' it removes all access.  It's not dependent on
whether I'm bound to a specific or general service.  

So the security aspect really needs to be that the most restrictive
takes precedence, not the other way around.  

And after having read a bunch of the docs, looked at the FAQ, etc;
it's still no clearer to me what DBUS and KDBUS provides that's all so
important or critical.  Sure, it might be nice to have, but that's ok.

So I think that's the steps people need to take, give concrete example
of how DBUS is better than anything else out there and won't cause
more problems down the line.

John

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 20:14             ` John Stoffel
@ 2015-04-14 21:51               ` Steven Rostedt
  2015-04-14 22:05                 ` Jiri Kosina
  2015-04-15  8:35               ` Greg Kroah-Hartman
  1 sibling, 1 reply; 316+ messages in thread
From: Steven Rostedt @ 2015-04-14 21:51 UTC (permalink / raw)
  To: John Stoffel
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni, Paul E. McKenney,
	James Bottomley

On Tue, Apr 14, 2015 at 04:14:34PM -0400, John Stoffel wrote:
> 
> So I think that's the steps people need to take, give concrete example
> of how DBUS is better than anything else out there and won't cause
> more problems down the line.

I believe that Linux Plumbers is still accepting MicroConferences. I wonder
if this would be a good one to have. Try to get everyone face to face and
talk about how exactly kdbus should be implemented in the kernel.

This doesn't look to me like it is going to be solved via electronic
communication. Looks like the old free beer at a convention where everyone
can give their drunken arguments may be quite productive.

Greg, you told me you'll be there. What about everyone else? Want to
write up a MicroConf:

  http://wiki.linuxplumbersconf.org/2015:topics

It's not that far off. Kdbus has waited this long, I'm sure it can wait
till August as well.

I'd really love to see this happen. I'll even supply the popcorn ;-)

-- Steve



^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 21:51               ` Steven Rostedt
@ 2015-04-14 22:05                 ` Jiri Kosina
  2015-04-15  6:56                   ` Borislav Petkov
  2015-04-15  8:37                   ` Greg Kroah-Hartman
  0 siblings, 2 replies; 316+ messages in thread
From: Jiri Kosina @ 2015-04-14 22:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: John Stoffel, Greg Kroah-Hartman, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni, Paul E. McKenney,
	James Bottomley

On Tue, 14 Apr 2015, Steven Rostedt wrote:

> I believe that Linux Plumbers is still accepting MicroConferences. I 
> wonder if this would be a good one to have. Try to get everyone face to 
> face and talk about how exactly kdbus should be implemented in the 
> kernel.

I personally would even put more emphasis on a session that would first 
focus on "why", before we look at "how".

I have already asked about this during the earlier RFC submissions, but 
the only "take-home message" I took from that discussion was "because it's 
faster than what we currently have". I don't find that a sufficient 
justification by itself for something so complex (with potential 
implications all over the place for the whole Linux ecosystem), especially 
given the fact we already have sealed memfds zerocopy etc (and I am not 
even talking about the "infinite set-in-stone userspace API" implications 
this has).

So definitely +1 from me for this discussion to happen, being it either 
LPC (which I will unfortunately probably have to miss due to personal 
reaons this year) or KS. It might help people like me, who have trouble 
understanding why we need it, and LKML discussions don't provide enough 
answers for them.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 17:50       ` Greg Kroah-Hartman
  2015-04-14 18:57         ` Andy Lutomirski
@ 2015-04-14 22:33         ` Jiri Kosina
  2015-04-15  8:56           ` Greg Kroah-Hartman
  1 sibling, 1 reply; 316+ messages in thread
From: Jiri Kosina @ 2015-04-14 22:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:

> Yes, it's an unfortunate design, but one that we are all stuck with 
> (think of it as having to implement code for horrid hardware that you 
> have to get to work properly.)  

Greg, I personally consider this a rather defunct analogy. Broken hardware 
comes from "outter space" we just have to live with somehow, and 
eventually try to gradually improve by working with vendors (and you 
yourself have of course made huge improvements in this very area).

Linux userspace is coming, well, from Linux developers. The sole fact that 
someone wrote a daemon that runs on Linux seems like a very poor 
justification for sucking the daemon into kernel "because we have to live 
with it". 
Userspace has to live with it somehow (and eventually fix itself if 
necessary), yes. Why should kernel just contribute to this "unfortunate 
design" if it really isn't, in any way, obliged or forced to do so?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:48                   ` Greg Kroah-Hartman
  2015-04-14 19:53                     ` Borislav Petkov
  2015-04-14 20:11                     ` Martin Steigerwald
@ 2015-04-14 22:39                     ` Jiri Kosina
  2015-04-15  8:38                       ` Greg Kroah-Hartman
  2015-04-15 10:37                       ` One Thousand Gnomes
  2 siblings, 2 replies; 316+ messages in thread
From: Jiri Kosina @ 2015-04-14 22:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:

> I don't understand.  You can not like the D-Bus model (and accordingly
> the X11 model), 

I thought that the general hatred level of the X11 "model" and the 
protocol lead to al the efforts to reimplement this properly ... in 
userspace (for example Wayland, right?).

I don't think anyone was ever seriously suggesting "X11 model is broken, 
so let's push it to kernel" ... ?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 20:22     ` Al Viro
  2015-04-13 20:37       ` Greg Kroah-Hartman
@ 2015-04-15  1:36       ` Andy Lutomirski
  2015-04-15  6:54         ` Richard Weinberger
                           ` (2 more replies)
  1 sibling, 3 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-15  1:36 UTC (permalink / raw)
  To: Al Viro
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
>> > I remain opposed to this half thought out trash of an ABI for the
>> > meta-data.
>>
>> You don't have to enable the metadata if you don't want to use it, it's
>> an option :)
>
> OK, _that_ argument needs to be stomped out.  It had been used before,
> and it was a deliberate scam.  There is no such thing as optional kernel
> interface, especially when udev/dbus/systemd crowd is nearby.  We'd been
> through that excuse before; remember how devtmpfs was pushed in as "optional"?
>
> This is a huge red flag.  On the level of "I need your account information
> to transfer $200M you might have inherited from my deceased client".
>
> Just to recap how it went the last time around: Kay kept pushing his piece of
> code into the tree, claiming that it was optional, that nobody who doesn't
> like it has to enable it, so what's the problem?  OK, in it went.  And pretty
> soon udev (maintained by the same... meticulously honorable person) had
> stopped working on the kernels that didn't have that enabled.
>
> We had been there before.  To paraphrase another... meticulously honorable
> person, "if you didn't want something relied upon, why have you put it into the
> kernel?" Said person is on the record as having no problem whatsoever with
> adding dependencies to the bottom of userland stack.

It appears that, if kdbus is merged, upstream udev may end up requiring it:

http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html

Grumble.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  1:36       ` Andy Lutomirski
@ 2015-04-15  6:54         ` Richard Weinberger
  2015-04-15  7:31           ` Mike Galbraith
  2015-04-15  8:48           ` Greg Kroah-Hartman
  2015-04-15  8:18         ` Martin Steigerwald
  2015-04-15  8:29         ` Greg Kroah-Hartman
  2 siblings, 2 replies; 316+ messages in thread
From: Richard Weinberger @ 2015-04-15  6:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> We had been there before.  To paraphrase another... meticulously honorable
>> person, "if you didn't want something relied upon, why have you put it into the
>> kernel?" Said person is on the record as having no problem whatsoever with
>> adding dependencies to the bottom of userland stack.
>
> It appears that, if kdbus is merged, upstream udev may end up requiring it:
>
> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html

Why so surprised?
kdbus will be a major hard-dependency for every non-trivial userland.
Like cgroups...

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 22:05                 ` Jiri Kosina
@ 2015-04-15  6:56                   ` Borislav Petkov
  2015-04-15  8:37                   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 316+ messages in thread
From: Borislav Petkov @ 2015-04-15  6:56 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Steven Rostedt, John Stoffel, Greg Kroah-Hartman,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni,
	Paul E. McKenney, James Bottomley

On Wed, Apr 15, 2015 at 12:05:01AM +0200, Jiri Kosina wrote:
> So definitely +1 from me for this discussion to happen, being it
> either LPC (which I will unfortunately probably have to miss due to
> personal reaons this year) or KS. It might help people like me, who
> have trouble understanding why we need it, and LKML discussions don't
> provide enough answers for them.

Oh, and then please do a writeup so that people like me can read about
it and find out the answer to that same question.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  6:54         ` Richard Weinberger
@ 2015-04-15  7:31           ` Mike Galbraith
  2015-04-15 14:48             ` Michal Schmidt
  2015-04-15  8:48           ` Greg Kroah-Hartman
  1 sibling, 1 reply; 316+ messages in thread
From: Mike Galbraith @ 2015-04-15  7:31 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Andy Lutomirski, Al Viro, Greg Kroah-Hartman, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 2015-04-15 at 08:54 +0200, Richard Weinberger wrote:
> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net
> > wrote:
> > > We had been there before.  To paraphrase another... meticulously 
> > > honorable
> > > person, "if you didn't want something relied upon, why have you 
> > > put it into the
> > > kernel?" Said person is on the record as having no problem 
> > > whatsoever with
> > > adding dependencies to the bottom of userland stack.
> > 
> > It appears that, if kdbus is merged, upstream udev may end up 
> > requiring it:
> > 
> > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
> 
> Why so surprised?
> kdbus will be a major hard-dependency for every non-trivial userland.
> Like cgroups...

Heh, makes one wonder how we ever survived.

My openSUSE box is thoroughly infested with latest system-disease, and 
it seems the thing has now mandated group scheduling.  Whether you 
need/want it and its size large overhead or not is immaterial.  I'm 
not seeing an on/off switch anyway.

(shrug, axe should work as substitute, say "byebye tentacle").

        -Mike

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  1:36       ` Andy Lutomirski
  2015-04-15  6:54         ` Richard Weinberger
@ 2015-04-15  8:18         ` Martin Steigerwald
  2015-04-15  8:32           ` Greg Kroah-Hartman
  2015-04-15  8:29         ` Greg Kroah-Hartman
  2 siblings, 1 reply; 316+ messages in thread
From: Martin Steigerwald @ 2015-04-15  8:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski:
> On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> 
wrote:
> > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> >> > I remain opposed to this half thought out trash of an ABI for the
> >> > meta-data.
> >> 
> >> You don't have to enable the metadata if you don't want to use it,
> >> it's
> >> an option :)
> > 
> > OK, _that_ argument needs to be stomped out.  It had been used before,
> > and it was a deliberate scam.  There is no such thing as optional
> > kernel interface, especially when udev/dbus/systemd crowd is nearby. 
> > We'd been through that excuse before; remember how devtmpfs was
> > pushed in as "optional"?
> > 
> > This is a huge red flag.  On the level of "I need your account
> > information to transfer $200M you might have inherited from my
> > deceased client".
> > 
> > Just to recap how it went the last time around: Kay kept pushing his
> > piece of code into the tree, claiming that it was optional, that
> > nobody who doesn't like it has to enable it, so what's the problem? 
> > OK, in it went.  And pretty soon udev (maintained by the same...
> > meticulously honorable person) had stopped working on the kernels
> > that didn't have that enabled.
> > 
> > We had been there before.  To paraphrase another... meticulously
> > honorable person, "if you didn't want something relied upon, why have
> > you put it into the kernel?" Said person is on the record as having
> > no problem whatsoever with adding dependencies to the bottom of
> > userland stack.
> 
> It appears that, if kdbus is merged, upstream udev may end up requiring
> it:
> 
> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
> 
> Grumble.

Honestly, I think that tightly coupling systemd and udev to certain kernel 
versions in lock step is crap.

That you require some minimum version after some reasonable time, sure. 
But in lockstep? Seriously.

I certainly do not want a broken system just cause I have to load an older 
kernel version for some reason.

And yes, I think its good not to force just about any userspace idea into 
the kernel.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  1:36       ` Andy Lutomirski
  2015-04-15  6:54         ` Richard Weinberger
  2015-04-15  8:18         ` Martin Steigerwald
@ 2015-04-15  8:29         ` Greg Kroah-Hartman
  2 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  8:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Tue, Apr 14, 2015 at 06:36:28PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> >> > I remain opposed to this half thought out trash of an ABI for the
> >> > meta-data.
> >>
> >> You don't have to enable the metadata if you don't want to use it, it's
> >> an option :)
> >
> > OK, _that_ argument needs to be stomped out.  It had been used before,
> > and it was a deliberate scam.  There is no such thing as optional kernel
> > interface, especially when udev/dbus/systemd crowd is nearby.  We'd been
> > through that excuse before; remember how devtmpfs was pushed in as "optional"?
> >
> > This is a huge red flag.  On the level of "I need your account information
> > to transfer $200M you might have inherited from my deceased client".
> >
> > Just to recap how it went the last time around: Kay kept pushing his piece of
> > code into the tree, claiming that it was optional, that nobody who doesn't
> > like it has to enable it, so what's the problem?  OK, in it went.  And pretty
> > soon udev (maintained by the same... meticulously honorable person) had
> > stopped working on the kernels that didn't have that enabled.
> >
> > We had been there before.  To paraphrase another... meticulously honorable
> > person, "if you didn't want something relied upon, why have you put it into the
> > kernel?" Said person is on the record as having no problem whatsoever with
> > adding dependencies to the bottom of userland stack.
> 
> It appears that, if kdbus is merged, upstream udev may end up requiring it:
> 
> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html

Why would anyone propose a kernel api if they didn't actually plan to
use it?  Look at the first email in this thread, it shows the
people/projects that want to use this.  This is a crazy argument to try
to make people, "stop using the feature that the kernel provides you!"

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:18         ` Martin Steigerwald
@ 2015-04-15  8:32           ` Greg Kroah-Hartman
  2015-04-15  8:52             ` Martin Steigerwald
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  8:32 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 10:18:46AM +0200, Martin Steigerwald wrote:
> Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski:
> > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk> 
> wrote:
> > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> > >> > I remain opposed to this half thought out trash of an ABI for the
> > >> > meta-data.
> > >> 
> > >> You don't have to enable the metadata if you don't want to use it,
> > >> it's
> > >> an option :)
> > > 
> > > OK, _that_ argument needs to be stomped out.  It had been used before,
> > > and it was a deliberate scam.  There is no such thing as optional
> > > kernel interface, especially when udev/dbus/systemd crowd is nearby. 
> > > We'd been through that excuse before; remember how devtmpfs was
> > > pushed in as "optional"?
> > > 
> > > This is a huge red flag.  On the level of "I need your account
> > > information to transfer $200M you might have inherited from my
> > > deceased client".
> > > 
> > > Just to recap how it went the last time around: Kay kept pushing his
> > > piece of code into the tree, claiming that it was optional, that
> > > nobody who doesn't like it has to enable it, so what's the problem? 
> > > OK, in it went.  And pretty soon udev (maintained by the same...
> > > meticulously honorable person) had stopped working on the kernels
> > > that didn't have that enabled.
> > > 
> > > We had been there before.  To paraphrase another... meticulously
> > > honorable person, "if you didn't want something relied upon, why have
> > > you put it into the kernel?" Said person is on the record as having
> > > no problem whatsoever with adding dependencies to the bottom of
> > > userland stack.
> > 
> > It appears that, if kdbus is merged, upstream udev may end up requiring
> > it:
> > 
> > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
> > 
> > Grumble.
> 
> Honestly, I think that tightly coupling systemd and udev to certain kernel 
> versions in lock step is crap.

Where do you see that happening?

> That you require some minimum version after some reasonable time, sure. 
> But in lockstep? Seriously.

Has that happened in the past?  Look at the minimum requirements of
systemd/udev today, something like the 3.7 kernel release, many years
old.

> I certainly do not want a broken system just cause I have to load an older 
> kernel version for some reason.

No one does.  But, work with your distribution if you end up with
something like this.  Remember, the goal is that you can always run
newer kernels on older userspace, as that is something that we kernel
developers can enforce.  Userspace programs have other requirements /
communities, it's up to them to decide what their oldest kernel version
they wish to support.  Hint, even glibc makes these kinds of
requirements, it's nothing new at all here, so why is this even an
issue?

> And yes, I think its good not to force just about any userspace idea into 
> the kernel.

Do you have any technical objections to the patch as proposed?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 20:14             ` John Stoffel
  2015-04-14 21:51               ` Steven Rostedt
@ 2015-04-15  8:35               ` Greg Kroah-Hartman
  1 sibling, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  8:35 UTC (permalink / raw)
  To: John Stoffel
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 14, 2015 at 04:14:34PM -0400, John Stoffel wrote:
> >>>>> "Greg" == Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:
> 
> Greg> On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote:
> >> On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman
> >> <gregkh@linuxfoundation.org> wrote:
> >> > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
> >> >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
> >> >> <gregkh@linuxfoundation.org> wrote:
> >> >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
> >> >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
> >> >> >> <gregkh@linuxfoundation.org> wrote:
> >> >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >> >> >> >
> >> >> >> >   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >> >> >> >
> >> >> >> > are available in the git repository at:
> >> >> >> >
> >> >> >> >   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >> >> >> >
> >> >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >> >> >> >
> >> >> >> >   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >> >> >> >
> >> >> >> > ----------------------------------------------------------------
> >> >> >> > kdbus for 4.1-rc1
> >> >> >> >
> >> >> >> > Here's the kdbus pull request for 4.1-rc1.
> >> >> >> >
> >> >> >> > It's been under development for many years now, and been in linux-next
> >> >> >> > for many months, and has undergone loads of testing a review and even a few
> >> >> >> > good arguments.  It comes with full documentation and tests.
> >> >> >> >
> >> >> >> > There has been a few complaints about the code, notably from people who
> >> >> >> > don't like the use of metadata in the bus messages.  That is actually
> >> >> >> > one of the main features here, as we can get this data in a secure and
> >> >> >> > reliable way, and it's something that userspace requires today.  So
> >> >> >> > while it does look "odd" to people who are not familiar with dbus, this
> >> >> >> > is something that finally fixes a number of almost unfixable races in
> >> >> >> > the current dbus implementations.
> >> >> >>
> >> >> >> While I generally like the concept of having a better in-kernel IPC
> >> >> >> mechanism, after some consideration I don't think this belongs in the
> >> >> >> kernel in its current form.  Here's why.
> >> >> >>
> >> >> >> First, the naming is counterintuitive.  There are "endpoints", but you
> >> >> >> don't send messages to endpoints.  In fact, an basic kdbus setup will
> >> >> >> have exactly one endpoint AFAICT.  Wtf?  This makes talking about it
> >> >> >> awkward.
> >> >> >
> >> >> > Did you read the documentation?  We've been over this before, and it
> >> >> > should all be addressed in the documentation based on this coming up.
> >> >> >
> >> >> >> A lot of the design seems to be to violate the concept of "mechanism,
> >> >> >> not policy".  Kdbus is very much a port of userspace dbus to the
> >> >> >> kernel, and it appears to be a port designed to preserve some
> >> >> >> questionable design decisions instead of learning from them.
> >> >> >>
> >> >> >> For example, kdbus sticks a whole policy database in the kernel, but
> >> >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
> >> >> >> *not* a simple set of rules like "if A then allow B".  Instead it has
> >> >> >> really weird dependencies not on what name you're sending to but on
> >> >> >> what *other* names the thing you're sending to has.  Sorry, but this
> >> >> >> way lies (a) the inability for a large set of developers to understand
> >> >> >> what's going on and (b) security bugs.  Also, the result probably
> >> >> >> can't be reused as part of a non-legacy-filled sensible design
> >> >> >
> >> >> > What policy database?  Matching messages to subscribers?  That's the
> >> >> > same type of "database" that other ipc subsystems need/want, there's
> >> >> > nothing radical here.
> >> >>
> >> >> Let me quote from the latest version of the kdbus docs:
> >> >>
> >> >>       Note that TALK access is checked against all names of a connection. For
> >> >>       example, if a connection owns both <constant>'org.foo.bar'</constant> and
> >> >>       <constant>'org.blah.baz'</constant>, and the policy database allows
> >> >>       <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
> >> >>       permission is also granted to <constant>'org.foo.bar'</constant>. That
> >> >>       might sound illogical, but after all, we allow messages to be directed to
> >> >>       either the ID or a well-known name, and policy is applied to the
> >> >>       connection, not the name. In other words, the effective TALK policy for a
> >> >>       connection is the most permissive of all names the connection owns.
> >> >>
> >> >> In my humble opinion, this paragraph speaks for itself.  The design is
> >> >> bad, full stop.
> >> >
> >> > First off, thanks for reading the docs, I appreciate that.  But realize
> >> > also, that this is straight from the D-Bus spec.  We aren't doing
> >> > anything "radical" here, this is what your desktop uses that you are
> >> > typing your email from.
> >> >
> >> > Yes, it's an unfortunate design, but one that we are all stuck with
> >> > (think of it as having to implement code for horrid hardware that you
> >> > have to get to work properly.)
> >> 
> >> I agree.  You've sent a pull request for an unfortunate design.  I
> >> don't think that unfortunate design belongs in the kernel.  If it says
> >> in userspace, then user programmers could potentially fix it some day.
> 
> Greg> You might not like the design, but it is a valid design.  Again, we
> Greg> don't refuse to support hardware that is designed badly.  Or support
> Greg> protocols we don't necessarily like, that's not the job of a kernel or
> Greg> operating system.
> 
> Greg> And here's Havoc's response as to why actually, this is a good design:
> Greg> 	http://lists.freedesktop.org/archives/dbus/2015-April/016651.html
> 
> This is an interesting discussion, and one thing that sticks out to me
> is the comments in the URL above talking about how clients are
> supposed to use a generic name to bind to a resource, but actually do
> a lookup to get the specific name, and then bind to THAT.
> 
> So the security concerns raised by Andy do seem to make sense, in that
> either security needs to be the same across all names of a service, so
> that you don't have problems with varying levels once people have
> connected.  In terms of the X11 analogy, if I have someone connect,
> and then I do 'xhost -' it removes all access.  It's not dependent on
> whether I'm bound to a specific or general service.  
> 
> So the security aspect really needs to be that the most restrictive
> takes precedence, not the other way around.  

But look at how dbus handles this, isn't this done in the correct way?

> And after having read a bunch of the docs, looked at the FAQ, etc;
> it's still no clearer to me what DBUS and KDBUS provides that's all so
> important or critical.  Sure, it might be nice to have, but that's ok.

The first email I wrote here explains all of this, are those not valid
uses for such a service that the kernel can provide?

> So I think that's the steps people need to take, give concrete example
> of how DBUS is better than anything else out there and won't cause
> more problems down the line.

D-Bus has been around for over 10 years now, and was the result of many
failed attempts to do something much like this (COM, DCOM, CORBA, and a
few others).  The developers involved had lots of experience in this
area, and created a solution that ended up working very well for the
problem domain.  So well that all other competing technologies in that
area were obsoleted and abondonded and everyone has moved to D-Bus as it
solves the problems they have in a correct manner.

The reason nothing else has come along might just be because nothing
else _needs_ to come along, D-Bus solves the need.

So unless you see a technical reason why the proposed code is somehow
not correct, I don't understand your complaint.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 22:05                 ` Jiri Kosina
  2015-04-15  6:56                   ` Borislav Petkov
@ 2015-04-15  8:37                   ` Greg Kroah-Hartman
  2015-04-15 18:12                     ` James Bottomley
  1 sibling, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  8:37 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Steven Rostedt, John Stoffel, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni, Paul E. McKenney,
	James Bottomley

On Wed, Apr 15, 2015 at 12:05:01AM +0200, Jiri Kosina wrote:
> On Tue, 14 Apr 2015, Steven Rostedt wrote:
> 
> > I believe that Linux Plumbers is still accepting MicroConferences. I 
> > wonder if this would be a good one to have. Try to get everyone face to 
> > face and talk about how exactly kdbus should be implemented in the 
> > kernel.
> 
> I personally would even put more emphasis on a session that would first 
> focus on "why", before we look at "how".
> 
> I have already asked about this during the earlier RFC submissions, but 
> the only "take-home message" I took from that discussion was "because it's 
> faster than what we currently have". I don't find that a sufficient 
> justification by itself for something so complex (with potential 
> implications all over the place for the whole Linux ecosystem), especially 
> given the fact we already have sealed memfds zerocopy etc (and I am not 
> even talking about the "infinite set-in-stone userspace API" implications 
> this has).

I wrote many many lines of "why" in the patch submissions, and in the
first email in this thread.  Are any of those specific solutions and
"why" reasons not correct in your opinion?  If so, great, please let me
know.

But to say that no one is focusing on "why" is a slight to those of us
who have been providing just that.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 22:39                     ` Jiri Kosina
@ 2015-04-15  8:38                       ` Greg Kroah-Hartman
  2015-04-15 10:37                       ` One Thousand Gnomes
  1 sibling, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  8:38 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Al Viro, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 12:39:22AM +0200, Jiri Kosina wrote:
> On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:
> 
> > I don't understand.  You can not like the D-Bus model (and accordingly
> > the X11 model), 
> 
> I thought that the general hatred level of the X11 "model" and the 
> protocol lead to al the efforts to reimplement this properly ... in 
> userspace (for example Wayland, right?).
> 
> I don't think anyone was ever seriously suggesting "X11 model is broken, 
> so let's push it to kernel" ... ?

Ok, fine, it's a broken metaphore, see Havoc's email for why I brought
that up here.  It's the issue that a stateful bus is required for
applications that is the main point I'm trying to get across.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:53                     ` Borislav Petkov
@ 2015-04-15  8:44                       ` Greg Kroah-Hartman
  2015-04-15  8:54                         ` Jiri Kosina
  2015-04-15  9:35                         ` Borislav Petkov
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  8:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Tue, Apr 14, 2015 at 09:53:36PM +0200, Borislav Petkov wrote:
> On Tue, Apr 14, 2015 at 09:48:04PM +0200, Greg Kroah-Hartman wrote:
> > It's not going to stop anything from working, it's just going to stop
> > some programs from being able to do things they really want to do (see
> > the first email for examples.)
> 
> Until it is made "mandatory" as Al said earlier.

If you really don't like userspace using features the kernel provides
you, well, there's nothing I can say that will change that odd feeling,
sorry.

If we don't want to make the metadata thing optional because everyone
will end up always using it, great, we will go make that change, that's
not an issue at all.  It will then end up looking like the first
proposal that was made many months ago :)

> > Yes, we could make this live outside the kernel tree, but that's not the
> > way we work anymore.
> 
> > We merge things that are useful, that match our
> > security and coding requirements, and are going to be maintained by
> > people we trust.
> 
> We trust? I'm not going to even comment on that.

Really?  Who in that MAINTAINERS file entry do you not trust?
Seriously, if that's the issue here, please let me know.  Do you not
trust me?  Daniel?  David?  Djalal?  All of us have been long-time
kernel developers and maintainers of other portions of the kernel stack
that you rely on every day.  If you have objections to any of us
maintaining this code, let me know.  Otherwise, stop making foolish
statements.

> And frankly, merging a useful piece of code sounds completely different
> to me than this serious backlash I'm reading from the sidelines.

I don't understand what this means.  If you have a technical reason for
why this code shouldn't be merged, great, please let me know and we can
work to address that.  Andy and Al have spent time reviewing and giving
us comments, and that's wonderful and valuable and is why I treat their
comments seriously.  If you are interested in the code, please review
it, otherwise I don't see what this adds to the conversation at all, do
you?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  6:54         ` Richard Weinberger
  2015-04-15  7:31           ` Mike Galbraith
@ 2015-04-15  8:48           ` Greg Kroah-Hartman
  2015-04-15  9:00             ` Richard Weinberger
  2015-04-15 11:25             ` One Thousand Gnomes
  1 sibling, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  8:48 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote:
> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> We had been there before.  To paraphrase another... meticulously honorable
> >> person, "if you didn't want something relied upon, why have you put it into the
> >> kernel?" Said person is on the record as having no problem whatsoever with
> >> adding dependencies to the bottom of userland stack.
> >
> > It appears that, if kdbus is merged, upstream udev may end up requiring it:
> >
> > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
> 
> Why so surprised?
> kdbus will be a major hard-dependency for every non-trivial userland.
> Like cgroups...

Maybe because things like cgroups, and kdbus in the future, solves a
need that the developers in that area have to solve problems and
provide functionality that their users require?

Look, us kernel developers only work on one huge, multithreaded, global
state binary.  Our experience in multi-application interactions with
shared state and permission requirements is usually quite limited.  If
you don't trust the developers of those programs outside the kernel,
don't use them, there are still distros out there that don't require
them.

But if you do trust them, then don't make snide comments about how they
don't know what they are doing, because that's just flat out rude.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:32           ` Greg Kroah-Hartman
@ 2015-04-15  8:52             ` Martin Steigerwald
  2015-04-15  9:02               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 316+ messages in thread
From: Martin Steigerwald @ 2015-04-15  8:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

Am Mittwoch, 15. April 2015, 10:32:19 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 10:18:46AM +0200, Martin Steigerwald wrote:
> > Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski:
> > > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk>
> > 
> > wrote:
> > > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman 
wrote:
> > > >> > I remain opposed to this half thought out trash of an ABI for
> > > >> > the
> > > >> > meta-data.
> > > >> 
> > > >> You don't have to enable the metadata if you don't want to use
> > > >> it,
> > > >> it's
> > > >> an option :)
> > > > 
> > > > OK, _that_ argument needs to be stomped out.  It had been used
> > > > before,
> > > > and it was a deliberate scam.  There is no such thing as optional
> > > > kernel interface, especially when udev/dbus/systemd crowd is
> > > > nearby.
> > > > We'd been through that excuse before; remember how devtmpfs was
> > > > pushed in as "optional"?
> > > > 
> > > > This is a huge red flag.  On the level of "I need your account
> > > > information to transfer $200M you might have inherited from my
> > > > deceased client".
> > > > 
> > > > Just to recap how it went the last time around: Kay kept pushing
> > > > his
> > > > piece of code into the tree, claiming that it was optional, that
> > > > nobody who doesn't like it has to enable it, so what's the
> > > > problem?
> > > > OK, in it went.  And pretty soon udev (maintained by the same...
> > > > meticulously honorable person) had stopped working on the kernels
> > > > that didn't have that enabled.
> > > > 
> > > > We had been there before.  To paraphrase another... meticulously
> > > > honorable person, "if you didn't want something relied upon, why
> > > > have
> > > > you put it into the kernel?" Said person is on the record as
> > > > having
> > > > no problem whatsoever with adding dependencies to the bottom of
> > > > userland stack.
> > > 
> > > It appears that, if kdbus is merged, upstream udev may end up
> > > requiring
> > > it:
> > > 
> > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.
> > > html
> > > 
> > > Grumble.
> > 
> > Honestly, I think that tightly coupling systemd and udev to certain
> > kernel versions in lock step is crap.
> 
> Where do you see that happening?
> 
> > That you require some minimum version after some reasonable time,
> > sure.
> > But in lockstep? Seriously.
> 
> Has that happened in the past?  Look at the minimum requirements of
> systemd/udev today, something like the 3.7 kernel release, many years
> old.

I refer to the linked mailing list post from Lennart as I quote here:

> To make this clear, we expect that systemd and kernels are updated in
> lockstep. We explicitly do not support really old kernels with really
> (which means 3.4 right now), but even that should be taken with a grain
> of salt, as we already made clear that soon after kdbus is merged into
> the kernel we'll probably make a hard requirement on it from the systemd
> side.

Thats plenty clear, isn´t it? As soond as kdbus is merged into kernel, 
systemd will depend on it, and then… if I need to go back to older kernel, 
I have to downgrade systemd as well?

> > I certainly do not want a broken system just cause I have to load an
> > older kernel version for some reason.
> 
> No one does.  But, work with your distribution if you end up with
> something like this.  Remember, the goal is that you can always run
> newer kernels on older userspace, as that is something that we kernel
> developers can enforce.  Userspace programs have other requirements /
> communities, it's up to them to decide what their oldest kernel version
> they wish to support.  Hint, even glibc makes these kinds of
> requirements, it's nothing new at all here, so why is this even an
> issue?

Its no issue for me that systemd required kernel 3.7. But… what Lennart 
announces above regarding kdbus reads quite differently.

> > And yes, I think its good not to force just about any userspace idea
> > into the kernel.
> 
> Do you have any technical objections to the patch as proposed?

If I had, I would have written it. I explained already that I see that 
kernel developers have strong technical objections with kdbus. And that I 
think it is important to acknowledge it, instead of telling them, that the 
API is required from userspace, userspace people know what they do, and 
they should just go away with their concerns.

Thats at least how I received quite some of your responses.

Well and I raised an eyebrow on the busname matching rules and the 
capability stuff. Yet, I didn´t comment on it, cause I didn´t look at it 
in-depth. I just ask you to take those seriously who did.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:44                       ` Greg Kroah-Hartman
@ 2015-04-15  8:54                         ` Jiri Kosina
  2015-04-15  9:09                           ` Greg Kroah-Hartman
  2015-04-15  9:35                         ` Borislav Petkov
  1 sibling, 1 reply; 316+ messages in thread
From: Jiri Kosina @ 2015-04-15  8:54 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Borislav Petkov, Al Viro, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:

> If you have a technical reason for why this code shouldn't be merged, 
> great, please let me know and we can work to address that.  Andy and Al 
> have spent time reviewing and giving us comments, and that's wonderful 
> and valuable and is why I treat their comments seriously.  If you are 
> interested in the code, please review it, otherwise I don't see what 
> this adds to the conversation at all, do you?

You've actually touched another issue I see here, and that is -- the code 
is complex like crazy.

I've spent big part of past two days trying to get my head around it, but 
I am still far away from getting at least the 1000 miles overview of how 
exactly the message passing is designed.

I understand that the primary reason for this complexity is probably the 
dbus protocol specification itself.

But the problem really is that I don't think you've received even a single 
Reviewed-by: from someone who hasn't been directly involved in developing 
the code, right?

For something that's potentially such a core mechanism as a completely 
new, massively-adopted IPC, this does send a warning singal.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 22:33         ` Jiri Kosina
@ 2015-04-15  8:56           ` Greg Kroah-Hartman
  2015-04-15 11:06             ` One Thousand Gnomes
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  8:56 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 12:33:30AM +0200, Jiri Kosina wrote:
> On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:
> 
> > Yes, it's an unfortunate design, but one that we are all stuck with 
> > (think of it as having to implement code for horrid hardware that you 
> > have to get to work properly.)  
> 
> Greg, I personally consider this a rather defunct analogy. Broken hardware 
> comes from "outter space" we just have to live with somehow, and 
> eventually try to gradually improve by working with vendors (and you 
> yourself have of course made huge improvements in this very area).
> 
> Linux userspace is coming, well, from Linux developers. The sole fact that 
> someone wrote a daemon that runs on Linux seems like a very poor 
> justification for sucking the daemon into kernel "because we have to live 
> with it". 
> Userspace has to live with it somehow (and eventually fix itself if 
> necessary), yes. Why should kernel just contribute to this "unfortunate 
> design" if it really isn't, in any way, obliged or forced to do so?

I retract my "unfortunate design" statement, as Havoc pointed out
exactly why that design is the way it is, and it makes sense to me.

To quote the email that he wrote:
	The reason is that dbus views the world in a stateful way
	assuming that connections, and name ownership, can be tracked
	reliably.  This is different from say http, and it's one reason
	that people used to Internet-oriented protocols find dbus
	strange.

I'm one of those "people used to internet-oriented protocols", and I bet
that almost all of us kernel developers also fall into that category, as
the kernel for the most part, is one big tool to help implement those
Internet-oriented protocols :)

The very history of D-Bus, where it came from, who is now using it, what
happened to all of the other proposed solutions in this area, is worth
examining if you are interested in it.  This type of protocol solves a
real problem in this area, one that everyone has congregated on as the
best-known solution for that issue.  It's used everywhere, on servers,
embedded systems, desktops, you name it.  All languages have bindings
for it, and it's the underpinning of a modern Linux stack.  For us to
somehow say that it's a "horible protocol" is terribly unfair, and
unkind, to all of the people who have worked to make it the best
possible solution for this problem space.

And honestly, I don't have a better proposal.  And I seriously doubt
that anyone here does either.  In the many years I've spent working on
this, dbus has seemed to be odd, and strange, to the way that the kernel
has normally worked, because it is.  And that's not a bad thing, it's
just different, and for us to support real needs and requirements of our
users, is the requirement of the Linux kernel.

Now if there are technical problems or insecurities in the proposed code
submission, wonderful, please let me know and I'll be glad to work to
address them.  But let's just drop the whole "oooh, look, D-Bus is
horrible looking, we can't support that!", is not a valid justification.

And I'll defer back to the old AF_DBUS proposal, which was looked at
from a technical point of view of the network developers who said that
they didn't think that putting the D-Bus model into a network stack made
any sense from a technical point of view, and outligned their
objectsions.  And they were right, hence this different proposal many
years later based on their insight and suggestions.

If you have objections like that, great, please let me know.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:48           ` Greg Kroah-Hartman
@ 2015-04-15  9:00             ` Richard Weinberger
  2015-04-15  9:20               ` Greg Kroah-Hartman
  2015-04-15 11:25             ` One Thousand Gnomes
  1 sibling, 1 reply; 316+ messages in thread
From: Richard Weinberger @ 2015-04-15  9:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

Am 15.04.2015 um 10:48 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote:
>> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>> We had been there before.  To paraphrase another... meticulously honorable
>>>> person, "if you didn't want something relied upon, why have you put it into the
>>>> kernel?" Said person is on the record as having no problem whatsoever with
>>>> adding dependencies to the bottom of userland stack.
>>>
>>> It appears that, if kdbus is merged, upstream udev may end up requiring it:
>>>
>>> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
>>
>> Why so surprised?
>> kdbus will be a major hard-dependency for every non-trivial userland.
>> Like cgroups...
> 
> Maybe because things like cgroups, and kdbus in the future, solves a
> need that the developers in that area have to solve problems and
> provide functionality that their users require?

I agree that a high level bus is needed and dbus is not perfect.
But this does not mean that we need a in-kernel dbus in any case.

> Look, us kernel developers only work on one huge, multithreaded, global
> state binary.  Our experience in multi-application interactions with
> shared state and permission requirements is usually quite limited.  If
> you don't trust the developers of those programs outside the kernel,
> don't use them, there are still distros out there that don't require
> them.

We're all forced to use cgroups, systemd, udev unless we want to have busybox
as userland. That's a fact.
systemd and its dependencies are not a bad thing per se.
But we have to be very sure that new hard-dependencies are
in well shape before we push them into the kernel.
IMHO this is also Andy and Eris's point.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:52             ` Martin Steigerwald
@ 2015-04-15  9:02               ` Greg Kroah-Hartman
  2015-04-15  9:28                 ` Martin Steigerwald
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  9:02 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 10:52:37AM +0200, Martin Steigerwald wrote:
> Am Mittwoch, 15. April 2015, 10:32:19 schrieb Greg Kroah-Hartman:
> > On Wed, Apr 15, 2015 at 10:18:46AM +0200, Martin Steigerwald wrote:
> > > Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski:
> > > > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <viro@zeniv.linux.org.uk>
> > > 
> > > wrote:
> > > > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman 
> wrote:
> > > > >> > I remain opposed to this half thought out trash of an ABI for
> > > > >> > the
> > > > >> > meta-data.
> > > > >> 
> > > > >> You don't have to enable the metadata if you don't want to use
> > > > >> it,
> > > > >> it's
> > > > >> an option :)
> > > > > 
> > > > > OK, _that_ argument needs to be stomped out.  It had been used
> > > > > before,
> > > > > and it was a deliberate scam.  There is no such thing as optional
> > > > > kernel interface, especially when udev/dbus/systemd crowd is
> > > > > nearby.
> > > > > We'd been through that excuse before; remember how devtmpfs was
> > > > > pushed in as "optional"?
> > > > > 
> > > > > This is a huge red flag.  On the level of "I need your account
> > > > > information to transfer $200M you might have inherited from my
> > > > > deceased client".
> > > > > 
> > > > > Just to recap how it went the last time around: Kay kept pushing
> > > > > his
> > > > > piece of code into the tree, claiming that it was optional, that
> > > > > nobody who doesn't like it has to enable it, so what's the
> > > > > problem?
> > > > > OK, in it went.  And pretty soon udev (maintained by the same...
> > > > > meticulously honorable person) had stopped working on the kernels
> > > > > that didn't have that enabled.
> > > > > 
> > > > > We had been there before.  To paraphrase another... meticulously
> > > > > honorable person, "if you didn't want something relied upon, why
> > > > > have
> > > > > you put it into the kernel?" Said person is on the record as
> > > > > having
> > > > > no problem whatsoever with adding dependencies to the bottom of
> > > > > userland stack.
> > > > 
> > > > It appears that, if kdbus is merged, upstream udev may end up
> > > > requiring
> > > > it:
> > > > 
> > > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.
> > > > html
> > > > 
> > > > Grumble.
> > > 
> > > Honestly, I think that tightly coupling systemd and udev to certain
> > > kernel versions in lock step is crap.
> > 
> > Where do you see that happening?
> > 
> > > That you require some minimum version after some reasonable time,
> > > sure.
> > > But in lockstep? Seriously.
> > 
> > Has that happened in the past?  Look at the minimum requirements of
> > systemd/udev today, something like the 3.7 kernel release, many years
> > old.
> 
> I refer to the linked mailing list post from Lennart as I quote here:
> 
> > To make this clear, we expect that systemd and kernels are updated in
> > lockstep. We explicitly do not support really old kernels with really
> > (which means 3.4 right now), but even that should be taken with a grain
> > of salt, as we already made clear that soon after kdbus is merged into
> > the kernel we'll probably make a hard requirement on it from the systemd
> > side.
> 
> Thats plenty clear, isn´t it? As soond as kdbus is merged into kernel, 
> systemd will depend on it, and then… if I need to go back to older kernel, 
> I have to downgrade systemd as well?
> 
> > > I certainly do not want a broken system just cause I have to load an
> > > older kernel version for some reason.
> > 
> > No one does.  But, work with your distribution if you end up with
> > something like this.  Remember, the goal is that you can always run
> > newer kernels on older userspace, as that is something that we kernel
> > developers can enforce.  Userspace programs have other requirements /
> > communities, it's up to them to decide what their oldest kernel version
> > they wish to support.  Hint, even glibc makes these kinds of
> > requirements, it's nothing new at all here, so why is this even an
> > issue?
> 
> Its no issue for me that systemd required kernel 3.7. But… what Lennart 
> announces above regarding kdbus reads quite differently.

Adding features to the systemd repo, and then having those releases make
it out to your distro is a multi-year timeframe normally, and
multi-month at the least.  If a distro made such a decision to not
support old kernels by accepting such a userspace requirement, take it
up with them.

And there are forks of systemd that keep around older kernel support,
and distros use them for this very reason.  Because they want to use old
kernel versions, and that's great.

It's the same for any kernel feature, programs are free to use them if
they want to.  If glibc were to make the requirement tomorrow that they
are going to use memfd for their internal use and require that everyone
update their kernels for their new release, we would all laugh that that
is pretty funny and their user base would suffer.

But again, that's nothing that the kernel has any control over, take it
up with that project if you object to that.

Personally, I want people to use the new code/features I provide them in
the kernel, and get upset when people don't.  Otherwise, why would I
have spent so much time creating them and supporting them in the first
place?

> > > And yes, I think its good not to force just about any userspace idea
> > > into the kernel.
> > 
> > Do you have any technical objections to the patch as proposed?
> 
> If I had, I would have written it. I explained already that I see that 
> kernel developers have strong technical objections with kdbus. And that I 
> think it is important to acknowledge it, instead of telling them, that the 
> API is required from userspace, userspace people know what they do, and 
> they should just go away with their concerns.
> 
> Thats at least how I received quite some of your responses.
> 
> Well and I raised an eyebrow on the busname matching rules and the 
> capability stuff. Yet, I didn´t comment on it, cause I didn´t look at it 
> in-depth. I just ask you to take those seriously who did.

I take technical comments very seriously, where have I not?  If you have
technical reasons why the current implementation has problems, please
let me know, and I will be glad to address them.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:54                         ` Jiri Kosina
@ 2015-04-15  9:09                           ` Greg Kroah-Hartman
  2015-04-15 12:36                             ` Al Viro
  2015-04-15 16:47                             ` Steven Rostedt
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  9:09 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Borislav Petkov, Al Viro, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 10:54:41AM +0200, Jiri Kosina wrote:
> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
> 
> > If you have a technical reason for why this code shouldn't be merged, 
> > great, please let me know and we can work to address that.  Andy and Al 
> > have spent time reviewing and giving us comments, and that's wonderful 
> > and valuable and is why I treat their comments seriously.  If you are 
> > interested in the code, please review it, otherwise I don't see what 
> > this adds to the conversation at all, do you?
> 
> You've actually touched another issue I see here, and that is -- the code 
> is complex like crazy.
> 
> I've spent big part of past two days trying to get my head around it, but 
> I am still far away from getting at least the 1000 miles overview of how 
> exactly the message passing is designed.
> 
> I understand that the primary reason for this complexity is probably the 
> dbus protocol specification itself.

Yes it is.

> But the problem really is that I don't think you've received even a single 
> Reviewed-by: from someone who hasn't been directly involved in developing 
> the code, right?

I've asked for it, but finding people to review code is hard, as you
know.  It's only 13k lines long, smaller than a serial port driver (my
unit of code review), so it's not all that big.

It's smaller than the USB3 host controller driver as well, and very few
people ever reviewed that beast :)

> For something that's potentially such a core mechanism as a completely 
> new, massively-adopted IPC, this does send a warning singal.

If you know of a way to force others to review code, please let me know.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:00             ` Richard Weinberger
@ 2015-04-15  9:20               ` Greg Kroah-Hartman
  2015-04-15  9:21                 ` Borislav Petkov
  2015-04-15  9:28                 ` Richard Weinberger
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  9:20 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 11:00:50AM +0200, Richard Weinberger wrote:
> Am 15.04.2015 um 10:48 schrieb Greg Kroah-Hartman:
> > On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote:
> >> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> >>>> We had been there before.  To paraphrase another... meticulously honorable
> >>>> person, "if you didn't want something relied upon, why have you put it into the
> >>>> kernel?" Said person is on the record as having no problem whatsoever with
> >>>> adding dependencies to the bottom of userland stack.
> >>>
> >>> It appears that, if kdbus is merged, upstream udev may end up requiring it:
> >>>
> >>> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
> >>
> >> Why so surprised?
> >> kdbus will be a major hard-dependency for every non-trivial userland.
> >> Like cgroups...
> > 
> > Maybe because things like cgroups, and kdbus in the future, solves a
> > need that the developers in that area have to solve problems and
> > provide functionality that their users require?
> 
> I agree that a high level bus is needed and dbus is not perfect.
> But this does not mean that we need a in-kernel dbus in any case.

So what do you propose to solve the issues presented in my original
email about the usecases that this code addresses?

> > Look, us kernel developers only work on one huge, multithreaded, global
> > state binary.  Our experience in multi-application interactions with
> > shared state and permission requirements is usually quite limited.  If
> > you don't trust the developers of those programs outside the kernel,
> > don't use them, there are still distros out there that don't require
> > them.
> 
> We're all forced to use cgroups, systemd, udev unless we want to have busybox
> as userland. That's a fact.

Is that a problem?

> systemd and its dependencies are not a bad thing per se.
> But we have to be very sure that new hard-dependencies are
> in well shape before we push them into the kernel.

That's fine, and normal, and I expect it.  But please provide technical
reasons why the proposal is not acceptable, like Andy has done in this
thread.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:20               ` Greg Kroah-Hartman
@ 2015-04-15  9:21                 ` Borislav Petkov
  2015-04-15  9:27                   ` Greg Kroah-Hartman
  2015-04-15  9:28                 ` Richard Weinberger
  1 sibling, 1 reply; 316+ messages in thread
From: Borislav Petkov @ 2015-04-15  9:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> > We're all forced to use cgroups, systemd, udev unless we want to have busybox
> > as userland. That's a fact.
> 
> Is that a problem?

I'm amazed that you're really actually asking that question :-(

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:21                 ` Borislav Petkov
@ 2015-04-15  9:27                   ` Greg Kroah-Hartman
  2015-04-15  9:30                     ` Richard Weinberger
  2015-04-15  9:44                     ` Borislav Petkov
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  9:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> > > We're all forced to use cgroups, systemd, udev unless we want to have busybox
> > > as userland. That's a fact.
> > 
> > Is that a problem?
> 
> I'm amazed that you're really actually asking that question :-(

Really?  Why can't userspace rely on the features that the kernel
provides them?  If not, why would the feature be created and supported
by us kernel developers in the first place?

That makes no sense at all, please explain.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:20               ` Greg Kroah-Hartman
  2015-04-15  9:21                 ` Borislav Petkov
@ 2015-04-15  9:28                 ` Richard Weinberger
  1 sibling, 0 replies; 316+ messages in thread
From: Richard Weinberger @ 2015-04-15  9:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

Am 15.04.2015 um 11:20 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 11:00:50AM +0200, Richard Weinberger wrote:
>> Am 15.04.2015 um 10:48 schrieb Greg Kroah-Hartman:
>>> On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote:
>>>> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>>> We had been there before.  To paraphrase another... meticulously honorable
>>>>>> person, "if you didn't want something relied upon, why have you put it into the
>>>>>> kernel?" Said person is on the record as having no problem whatsoever with
>>>>>> adding dependencies to the bottom of userland stack.
>>>>>
>>>>> It appears that, if kdbus is merged, upstream udev may end up requiring it:
>>>>>
>>>>> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
>>>>
>>>> Why so surprised?
>>>> kdbus will be a major hard-dependency for every non-trivial userland.
>>>> Like cgroups...
>>>
>>> Maybe because things like cgroups, and kdbus in the future, solves a
>>> need that the developers in that area have to solve problems and
>>> provide functionality that their users require?
>>
>> I agree that a high level bus is needed and dbus is not perfect.
>> But this does not mean that we need a in-kernel dbus in any case.
> 
> So what do you propose to solve the issues presented in my original
> email about the usecases that this code addresses?
> 
>>> Look, us kernel developers only work on one huge, multithreaded, global
>>> state binary.  Our experience in multi-application interactions with
>>> shared state and permission requirements is usually quite limited.  If
>>> you don't trust the developers of those programs outside the kernel,
>>> don't use them, there are still distros out there that don't require
>>> them.
>>
>> We're all forced to use cgroups, systemd, udev unless we want to have busybox
>> as userland. That's a fact.
> 
> Is that a problem?
> 
>> systemd and its dependencies are not a bad thing per se.
>> But we have to be very sure that new hard-dependencies are
>> in well shape before we push them into the kernel.
> 
> That's fine, and normal, and I expect it.  But please provide technical
> reasons why the proposal is not acceptable, like Andy has done in this
> thread.

I did not state that the proposal is not acceptable.
My statement was that we have to be well aware of the fact that
we will be forced to use kdbus in future as it will become a dependency.

Some developers on IRC said they don't care about kdbus at all as long they
can disable it. This is wrong, we have to use it. And that is fine.
But we're all have be aware of the implications. kdbus will be ABI.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:02               ` Greg Kroah-Hartman
@ 2015-04-15  9:28                 ` Martin Steigerwald
  2015-04-15 11:52                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 316+ messages in thread
From: Martin Steigerwald @ 2015-04-15  9:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

Am Mittwoch, 15. April 2015, 11:02:12 schrieb Greg Kroah-Hartman:
> > > > And yes, I think its good not to force just about any userspace
> > > > idea
> > > > into the kernel.
> > >
> > > 
> > >
> > > Do you have any technical objections to the patch as proposed?
> >
> > 
> >
> > If I had, I would have written it. I explained already that I see
> > that 
> > kernel developers have strong technical objections with kdbus. And
> > that I  think it is important to acknowledge it, instead of telling
> > them, that the API is required from userspace, userspace people know
> > what they do, and they should just go away with their concerns.
> >
> > 
> >
> > Thats at least how I received quite some of your responses.
> >
> > 
> >
> > Well and I raised an eyebrow on the busname matching rules and the 
> > capability stuff. Yet, I didn´t comment on it, cause I didn´t look at
> > it  in-depth. I just ask you to take those seriously who did.
> 
> I take technical comments very seriously, where have I not?  If you have
> technical reasons why the current implementation has problems, please
> let me know, and I will be glad to address them.

>From what I read you basically answered all technical comments like in:

The dbus API is like it is for a very good reason, everyone is using it 
and everyone agrees. Capabilities are used in userspace for good reason 
and so on.

But I see, here, not everyone does. 

Most of your answers didn´t seem to address the concerns raised of having 
this in the *kernel*. Especially the security concerns.

Thats what I meant with "And yes, I think its good not to force just about 
any userspace into the kernel". I think arguing with this is how userspace 
does it pattern, even if it truly is for a very good reason, is not 
sufficient as argument for having it in the kernel.

I am just looking at the argumentative pattern here. If other kernel 
developers complain about how hard it is to review and wrap their mind 
around the kdbus patches… I am scared at just trying to understand the 
patches. So no technical complaints from me. I did not nack it nor do I 
see myself in the position to nack it.

So feel free to do with my argument what you like. I just tried to 
understand why the communication in here works in circles as it does and I 
think will continue to work like that as long as its the userspace does it 
that way argument or this is optional argument only. For the discussion to 
go anywhere its important to acknowledge each other.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:27                   ` Greg Kroah-Hartman
@ 2015-04-15  9:30                     ` Richard Weinberger
  2015-04-15  9:49                       ` Greg Kroah-Hartman
  2015-04-15  9:44                     ` Borislav Petkov
  1 sibling, 1 reply; 316+ messages in thread
From: Richard Weinberger @ 2015-04-15  9:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Borislav Petkov
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

Am 15.04.2015 um 11:27 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
>> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
>>>> We're all forced to use cgroups, systemd, udev unless we want to have busybox
>>>> as userland. That's a fact.
>>>
>>> Is that a problem?
>>
>> I'm amazed that you're really actually asking that question :-(
> 
> Really?  Why can't userspace rely on the features that the kernel
> provides them?  If not, why would the feature be created and supported
> by us kernel developers in the first place?

This IMHO not the problem.
But if we add a new component to the kernel which *will* be used
by almost every userland out there (systemd won the "init wars")
we have to make sure that we're all fine with it.
Andy and Eric have some very valid concerns.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:44                       ` Greg Kroah-Hartman
  2015-04-15  8:54                         ` Jiri Kosina
@ 2015-04-15  9:35                         ` Borislav Petkov
  2015-04-15 11:45                           ` Greg Kroah-Hartman
  1 sibling, 1 reply; 316+ messages in thread
From: Borislav Petkov @ 2015-04-15  9:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 10:44:40AM +0200, Greg Kroah-Hartman wrote:
> If you really don't like userspace using features the kernel provides
> you, well, there's nothing I can say that will change that odd feeling,
> sorry.

Are you even reading what people are saying?

I don't like the mandatory(!) aspect of this, which it will eventually
become. There is this thing called "choice", remember?

> Really? Who in that MAINTAINERS file entry do you not trust?

The fact that you're still pushing for this current design *in the face*
of people pointing out serious design flaws with this makes me not
really trust you.

> I don't understand what this means. If you have a technical reason
> for why this code shouldn't be merged, great, please let me know and
> we can work to address that. Andy and Al have spent time reviewing
> and giving us comments, and that's wonderful and valuable and is
> why I treat their comments seriously. If you are interested in the
> code, please review it,

Yeah, I took a brief look at the code. It is overcomplicated.

If I were to review it properly, I'd ask you to split it in small
patchsets. Hell, I'm pretty sure you would do the same for code you
don't know if you were in my shoes.

Also, considering the complexity of this patchset, it doesn't have
a single Reviewed-by by an external party. If this were any other
submission, it would've been kicked to the curb a long time ago.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:27                   ` Greg Kroah-Hartman
  2015-04-15  9:30                     ` Richard Weinberger
@ 2015-04-15  9:44                     ` Borislav Petkov
  2015-04-15 11:40                       ` Greg Kroah-Hartman
  1 sibling, 1 reply; 316+ messages in thread
From: Borislav Petkov @ 2015-04-15  9:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:27:13AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> > > > We're all forced to use cgroups, systemd, udev unless we want to have busybox
> > > > as userland. That's a fact.
> > > 
> > > Is that a problem?
> > 
> > I'm amazed that you're really actually asking that question :-(
> 
> Really?  Why can't userspace rely on the features that the kernel
> provides them?

Userspace can do whatever it wants. As long as I'm not being *forced* to
do what userspace thinks is the right thing.

It seems to me that since that whole systemd* debacle started, we're
forgetting the choice aspect.

And dammit, I want my choice. I want to be able to choose what I'm
running. Not run what someone else thought what would be good for me to
run. If I wanted that, I'd long switched to windoze or äbble.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:30                     ` Richard Weinberger
@ 2015-04-15  9:49                       ` Greg Kroah-Hartman
  2015-04-15  9:53                         ` Richard Weinberger
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15  9:49 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Borislav Petkov, Andy Lutomirski, Al Viro, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:30:52AM +0200, Richard Weinberger wrote:
> Am 15.04.2015 um 11:27 schrieb Greg Kroah-Hartman:
> > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> >> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> >>>> We're all forced to use cgroups, systemd, udev unless we want to have busybox
> >>>> as userland. That's a fact.
> >>>
> >>> Is that a problem?
> >>
> >> I'm amazed that you're really actually asking that question :-(
> > 
> > Really?  Why can't userspace rely on the features that the kernel
> > provides them?  If not, why would the feature be created and supported
> > by us kernel developers in the first place?
> 
> This IMHO not the problem.
> But if we add a new component to the kernel which *will* be used
> by almost every userland out there (systemd won the "init wars")
> we have to make sure that we're all fine with it.

Sure, but why would this be different from any other kernel feature that
we add?  We have to be sure we are fine with everything we merge, as we
are saying we are going to maintain this stuff for forever.

> Andy and Eric have some very valid concerns.

I've tried to address Andy's concerns, Eric is not being very specific,
so there's nothing I can do there :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:49                       ` Greg Kroah-Hartman
@ 2015-04-15  9:53                         ` Richard Weinberger
  0 siblings, 0 replies; 316+ messages in thread
From: Richard Weinberger @ 2015-04-15  9:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Borislav Petkov, Andy Lutomirski, Al Viro, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

Am 15.04.2015 um 11:49 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 11:30:52AM +0200, Richard Weinberger wrote:
>> Am 15.04.2015 um 11:27 schrieb Greg Kroah-Hartman:
>>> On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
>>>> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
>>>>>> We're all forced to use cgroups, systemd, udev unless we want to have busybox
>>>>>> as userland. That's a fact.
>>>>>
>>>>> Is that a problem?
>>>>
>>>> I'm amazed that you're really actually asking that question :-(
>>>
>>> Really?  Why can't userspace rely on the features that the kernel
>>> provides them?  If not, why would the feature be created and supported
>>> by us kernel developers in the first place?
>>
>> This IMHO not the problem.
>> But if we add a new component to the kernel which *will* be used
>> by almost every userland out there (systemd won the "init wars")
>> we have to make sure that we're all fine with it.
> 
> Sure, but why would this be different from any other kernel feature that
> we add?  We have to be sure we are fine with everything we merge, as we
> are saying we are going to maintain this stuff for forever.

There is nothing different. The series has currently two NACKs,
0 ACKs and 0 Reviews.
I don't think that any other series would get merged in such a state.

>> Andy and Eric have some very valid concerns.
> 
> I've tried to address Andy's concerns, Eric is not being very specific,
> so there's nothing I can do there :)

What about Stevens proposal to talk at Plumbers?
I fear the discussion is at a dead end and needs a face to face
resolution.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 22:39                     ` Jiri Kosina
  2015-04-15  8:38                       ` Greg Kroah-Hartman
@ 2015-04-15 10:37                       ` One Thousand Gnomes
  2015-04-15 11:49                         ` Greg Kroah-Hartman
  1 sibling, 1 reply; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 10:37 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Greg Kroah-Hartman, Al Viro, Borislav Petkov, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, 15 Apr 2015 00:39:22 +0200 (CEST)
Jiri Kosina <jkosina@suse.cz> wrote:

> On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:
> 
> > I don't understand.  You can not like the D-Bus model (and accordingly
> > the X11 model), 
> 
> I thought that the general hatred level of the X11 "model" and the 
> protocol lead to al the efforts to reimplement this properly ... in 
> userspace (for example Wayland, right?).
> 
> I don't think anyone was ever seriously suggesting "X11 model is broken, 
> so let's push it to kernel" ... ?

The X11 model is *nothing* to do with the dbus/kdbus model. X11 does
properties by attaching them to windows. Those properties can be
monitored for changes and they can be queried. Setting them is
asynchronous, querying them is sync or with the newer event based
libraries can be async. X11 properties are network safe, handled through
the same X11 authority as everything else. Two apps can happily run on
different systems sharing a display over the network and sharing and
responding to changes in X11 properties - and it just works.

The Gnome people tried to re-invent X11 properties and embedding badly
with CORBA, then with dbus, despite the fact the Andrew system could
already do it really fast and cleanly even before Gnome was thought of.

There is no comparison between the elegance of X11 property setting and a
chunk of proposed kernel code that is half the size of a tiny X server!

The dbus model is also flawed in a load of other ways in user space
because message handling in the hands of people with no concept of
systemic performance analysis just leads to disaster. One of the big
reasons dbus is so "slow" isn't that dbus is "slow", it's that the
crapware on top of it makes *thousands* of dbus queries.

If you must do it in kernel why not use the Android binder - it's awful,
broken, and dubiously secure, but at least we'd still only have one awful,
broken dubiously secure rpc/property layer in kernel.

"It's the issue that a stateful bus is required for
applications that is the main point I'm trying to get across."

That would be the "if dbus crashes I have to reboot" design flaw of
Gnome and friends. The only state you need is beyond the endpoints. It's a
message passing system. If you think message passing needs state then I'd
take a look at the internet. State belongs in the end points.

It's telling that I can lose and recover my internet connection without
rebooting but not my desktops internal messaging.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:56           ` Greg Kroah-Hartman
@ 2015-04-15 11:06             ` One Thousand Gnomes
  2015-04-15 16:00               ` Rik van Riel
  0 siblings, 1 reply; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 11:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

> To quote the email that he wrote:
> 	The reason is that dbus views the world in a stateful way
> 	assuming that connections, and name ownership, can be tracked
> 	reliably.  This is different from say http, and it's one reason
> 	that people used to Internet-oriented protocols find dbus
> 	strange.
> 
> I'm one of those "people used to internet-oriented protocols", and I bet
> that almost all of us kernel developers also fall into that category, as
> the kernel for the most part, is one big tool to help implement those
> Internet-oriented protocols :)

I worked on protocols with state. I suffered X.25, X.29, coloured book,
ISDN. It's a completely *crap* model. It has unfixable reliability
problems. It has unfixable flow control problems. The only thing it buys
you is the ability to have more traffic in flight between end points than
you have transient memory for at the endpoints.

You don't need a grand unified state to track service locations and access
(ie names), which is fortunate or we'd be rebooting the internet and
all attached computers all the time.

> The very history of D-Bus, where it came from, who is now using it, what
> happened to all of the other proposed solutions in this area, is worth
> examining if you are interested in it.  This type of protocol solves a

History is why you got where you did. The history of Windows 98 explains
how they got there. It doesn't mean that continuing the same mistake is a
good idea.

> embedded systems, desktops, you name it.  All languages have bindings
> for it, and it's the underpinning of a modern Linux stack.  For us to

Everything used to have just a choice of COBOL or FORTRAN bindings. That
was not a good reason to continue to program the world in either of them.

> that anyone here does either.  In the many years I've spent working on
> this, dbus has seemed to be odd, and strange, to the way that the kernel
> has normally worked, because it is.  And that's not a bad thing, it's
> just different, and for us to support real needs and requirements of our
> users, is the requirement of the Linux kernel.

There are I think a set of intertwined problems here

- An efficient delivery system for multicast messages delivered locally
  (be that MPI, dbus whatever - it's not "dbus or nothing")

- A kernel side dynamic namespace to describe what goes where

- A kernel side security model to describe who may receive what, and
  which additional information/tags/cred info

- Something that provides state to stuff that needs it (and probably
  belongs in userspace - dbus name service etc)

- Something that maps dbus and other models onto the kernel security
  model (and we have tools like EBPF which are very powerful)

- Something that maps the kernel layer onto models like MPI-3

> Now if there are technical problems or insecurities in the proposed code
> submission, wonderful, please let me know and I'll be glad to work to
> address them.  But let's just drop the whole "oooh, look, D-Bus is
> horrible looking, we can't support that!", is not a valid justification.

We can however leave it in userspace until we understand the right small
clean way to support it and other needs. At the moment for example
cluster people can't really use this stuff because its not network aware,
and HPC people can't use it because it's got dbus hardwired into it so
can't speak MPI-3 and the like even though MPI 3 has similar concepts
around DPM, as well as having proper models for parallelism and
collective operations that are lacking in dbus.

If the userspace folks choose to continue to implement dbust over it but
the kernel layer is clean and generic then all is good, because someone
can replace dbust with something better. If its got dbust hard wired into
it then its a complete mess.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:48           ` Greg Kroah-Hartman
  2015-04-15  9:00             ` Richard Weinberger
@ 2015-04-15 11:25             ` One Thousand Gnomes
  2015-04-15 13:20               ` Borislav Petkov
  2015-04-15 15:45               ` Steven Rostedt
  1 sibling, 2 replies; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 11:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

> Look, us kernel developers only work on one huge, multithreaded, global
> state binary.  Our experience in multi-application interactions with
> shared state and permission requirements is usually quite limited.  If
> you don't trust the developers of those programs outside the kernel,
> don't use them, there are still distros out there that don't require
> them.

Speak for yourself. There are a lot of us here who work and have worked
on low level messaging, on networking, on clusters and on things like
distributed shared memory, infiniband etc. I've worked on networks,
including broken stateful protocols, I've maintined and developed
internet and ISDN router code, I've worked with message passing realtime
systems.

Equally the folks who wrote dbus generally also know sweet fa about
writing a kernel and maintaining it for 25 years. Gtk is on its 3rd
completely incompatible instance (and has incompatibilities even within
major versions), Gnome is on its third major incompatible release -
closer would be to say at least the "second project with the same name",
and neither are as old as the kernel.

dbus is not an appropriate design for a kernel messaging layer for a
variety of reasons. That's not to say dbus shouldn't be able to use a
fast kernel messaging layer, or that one shouldn't exist.

dbus is basically a very large very specialized and somewhat flawed
policy engine on top of what should be simple messaging. The two need
splitting apart.

Abstract low level messaging layers are not a new concept. V7 unix had
one experimentally. It's about getting the separation right.

IMHO that probably involves getting the right people in the right place
together - dbus designers, MPI and realtime people, kernel folks and
possibly also some of the hardware messaging folk.

In filesystem terms

- stop writing a dbus only file system
- figure out what a messaging "vfs" looks like
- figure out what an clean low level kernel model looks like
- figure out what has to be where to put the policy in userspace

What might also be worth review is how much dbus traffic actually ought to
be an object store implemented say with tmpfs and inotify type
functionality (or extensions of that) so that you can
set/read/enumerate/get change notifications on properties.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:44                     ` Borislav Petkov
@ 2015-04-15 11:40                       ` Greg Kroah-Hartman
  2015-04-15 13:03                         ` Borislav Petkov
                                           ` (2 more replies)
  0 siblings, 3 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 11:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:44:11AM +0200, Borislav Petkov wrote:
> On Wed, Apr 15, 2015 at 11:27:13AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> > > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> > > > > We're all forced to use cgroups, systemd, udev unless we want to have busybox
> > > > > as userland. That's a fact.
> > > > 
> > > > Is that a problem?
> > > 
> > > I'm amazed that you're really actually asking that question :-(
> > 
> > Really?  Why can't userspace rely on the features that the kernel
> > provides them?
> 
> Userspace can do whatever it wants. As long as I'm not being *forced* to
> do what userspace thinks is the right thing.
> 
> It seems to me that since that whole systemd* debacle started, we're
> forgetting the choice aspect.

What "choice" aspect?  Surely you aren't going to make the "Linux is
about choice" argument are you?

> And dammit, I want my choice. I want to be able to choose what I'm
> running. Not run what someone else thought what would be good for me to
> run. If I wanted that, I'd long switched to windoze or äbble.

Oh crap, you went there :)

Take a look at http://www.islinuxaboutchoice.com/ please.

And yes, you can take Linux (the kernel) and do whatever you want with
it (look at Android for an example of no existing userspace code, just
the kernel and everything else new for a "choice".)

You have to trust someone to help make your system work together in a
unified way.  If you can't trust your distro's engineers, then either
start your own distro, or only run busybox on top of a kernel.  You
really don't have much other "choice" than that :)

So stop making this discussion be about "oh those horrid systemd
developers, I don't want their code as my init system" as that's not
what any of this is about at all.  It's about the patches being
proposed, and the API involved in it.  Please stick to that.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:35                         ` Borislav Petkov
@ 2015-04-15 11:45                           ` Greg Kroah-Hartman
  0 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 11:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Al Viro, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:35:07AM +0200, Borislav Petkov wrote:
> On Wed, Apr 15, 2015 at 10:44:40AM +0200, Greg Kroah-Hartman wrote:
> > If you really don't like userspace using features the kernel provides
> > you, well, there's nothing I can say that will change that odd feeling,
> > sorry.
> 
> Are you even reading what people are saying?

You aren't reading the patches :)

> I don't like the mandatory(!) aspect of this, which it will eventually
> become. There is this thing called "choice", remember?

See my other response about that.

> > Really? Who in that MAINTAINERS file entry do you not trust?
> 
> The fact that you're still pushing for this current design *in the face*
> of people pointing out serious design flaws with this makes me not
> really trust you.

Please discuss these "serious design flaws".  I have responded to all of
the ones that I have seen so far in this thread.  And in all of the
other threads since this patch series was first posted months ago.  I
would love to discuss the code, so please, let's do that.

> > I don't understand what this means. If you have a technical reason
> > for why this code shouldn't be merged, great, please let me know and
> > we can work to address that. Andy and Al have spent time reviewing
> > and giving us comments, and that's wonderful and valuable and is
> > why I treat their comments seriously. If you are interested in the
> > code, please review it,
> 
> Yeah, I took a brief look at the code. It is overcomplicated.
> 
> If I were to review it properly, I'd ask you to split it in small
> patchsets. Hell, I'm pretty sure you would do the same for code you
> don't know if you were in my shoes.

It has been split into small patchsets, see the original postings.

And really, 13k lines of code is not all that big.  We review driver
submissions larger than that all the time.  Remember, your USB host
controller driver is bigger than that.

> Also, considering the complexity of this patchset, it doesn't have
> a single Reviewed-by by an external party. If this were any other
> submission, it would've been kicked to the curb a long time ago.

Please, review it, I would love for others to do so, and have been
asking for that since the beginning of this whole process months ago.

And I'd like to thank Andy and others for doing that.  Based on their
review comments we have changed the api, redone the infrastructure, and
modified lots of different things.  The code has massively changed for
the better because of this process.  I'm not asking for it to stop, I'm
asking for it to be merged now as everyone seems to have not had any
more comments on the code anymore, other than Andy's specific comments,
and everyone else's vague rants.

I'm addressing Andy's comments, and I would love to address yours, if
you actually made any technical ones here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 10:37                       ` One Thousand Gnomes
@ 2015-04-15 11:49                         ` Greg Kroah-Hartman
  2015-04-15 12:03                           ` One Thousand Gnomes
                                             ` (2 more replies)
  0 siblings, 3 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 11:49 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 11:37:27AM +0100, One Thousand Gnomes wrote:
> On Wed, 15 Apr 2015 00:39:22 +0200 (CEST)
> Jiri Kosina <jkosina@suse.cz> wrote:
> 
> > On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:
> > 
> > > I don't understand.  You can not like the D-Bus model (and accordingly
> > > the X11 model), 
> > 
> > I thought that the general hatred level of the X11 "model" and the 
> > protocol lead to al the efforts to reimplement this properly ... in 
> > userspace (for example Wayland, right?).
> > 
> > I don't think anyone was ever seriously suggesting "X11 model is broken, 
> > so let's push it to kernel" ... ?
> 
> The X11 model is *nothing* to do with the dbus/kdbus model. X11 does
> properties by attaching them to windows. Those properties can be
> monitored for changes and they can be queried. Setting them is
> asynchronous, querying them is sync or with the newer event based
> libraries can be async. X11 properties are network safe, handled through
> the same X11 authority as everything else. Two apps can happily run on
> different systems sharing a display over the network and sharing and
> responding to changes in X11 properties - and it just works.
> 
> The Gnome people tried to re-invent X11 properties and embedding badly
> with CORBA, then with dbus, despite the fact the Andrew system could
> already do it really fast and cleanly even before Gnome was thought of.
> 
> There is no comparison between the elegance of X11 property setting and a
> chunk of proposed kernel code that is half the size of a tiny X server!

Hey, take that up with Havoc, he made the comparison :)

> The dbus model is also flawed in a load of other ways in user space
> because message handling in the hands of people with no concept of
> systemic performance analysis just leads to disaster. One of the big
> reasons dbus is so "slow" isn't that dbus is "slow", it's that the
> crapware on top of it makes *thousands* of dbus queries.

There's the issue of thousands of dbus queries, and then there's the
issue that making those queries takes a measurable amount of time.  We
can fix the later one, the first one, well, not so much, but we can
provide the resources for them to make a faster system if they want to.

> If you must do it in kernel why not use the Android binder - it's awful,
> broken, and dubiously secure, but at least we'd still only have one awful,
> broken dubiously secure rpc/property layer in kernel.

Binder does not match up to the dbus model at all, I've written about
this in the past, and can dig it up again if you want.  And, there is
active research in moving the binder userspace library onto the kdbus
code base, allowing the binder kernel driver to be removed one day.
That would be a good thing to have happen, but I'm not holding my
breath for it.  Using it the other way around isn't going to work.

> "It's the issue that a stateful bus is required for
> applications that is the main point I'm trying to get across."
> 
> That would be the "if dbus crashes I have to reboot" design flaw of
> Gnome and friends. The only state you need is beyond the endpoints. It's a
> message passing system. If you think message passing needs state then I'd
> take a look at the internet. State belongs in the end points.

The internet model with state in the endpoints doesn't always transfer
properly to local applications, see Havoc's email for the details about
that.

> It's telling that I can lose and recover my internet connection without
> rebooting but not my desktops internal messaging.

Yes, as those are totally different things, let's not mix the issue up
here please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:28                 ` Martin Steigerwald
@ 2015-04-15 11:52                   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 11:52 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 11:28:36AM +0200, Martin Steigerwald wrote:
> Am Mittwoch, 15. April 2015, 11:02:12 schrieb Greg Kroah-Hartman:
> > > > > And yes, I think its good not to force just about any userspace
> > > > > idea
> > > > > into the kernel.
> > > >
> > > > 
> > > >
> > > > Do you have any technical objections to the patch as proposed?
> > >
> > > 
> > >
> > > If I had, I would have written it. I explained already that I see
> > > that 
> > > kernel developers have strong technical objections with kdbus. And
> > > that I  think it is important to acknowledge it, instead of telling
> > > them, that the API is required from userspace, userspace people know
> > > what they do, and they should just go away with their concerns.
> > >
> > > 
> > >
> > > Thats at least how I received quite some of your responses.
> > >
> > > 
> > >
> > > Well and I raised an eyebrow on the busname matching rules and the 
> > > capability stuff. Yet, I didn´t comment on it, cause I didn´t look at
> > > it  in-depth. I just ask you to take those seriously who did.
> > 
> > I take technical comments very seriously, where have I not?  If you have
> > technical reasons why the current implementation has problems, please
> > let me know, and I will be glad to address them.
> 
> >From what I read you basically answered all technical comments like in:
> 
> The dbus API is like it is for a very good reason, everyone is using it 
> and everyone agrees. Capabilities are used in userspace for good reason 
> and so on.
> 
> But I see, here, not everyone does. 
> 
> Most of your answers didn´t seem to address the concerns raised of having 
> this in the *kernel*. Especially the security concerns.

I have responded to the security concerns, please don't say that I did not.

> Thats what I meant with "And yes, I think its good not to force just about 
> any userspace into the kernel". I think arguing with this is how userspace 
> does it pattern, even if it truly is for a very good reason, is not 
> sufficient as argument for having it in the kernel.
> 
> I am just looking at the argumentative pattern here. If other kernel 
> developers complain about how hard it is to review and wrap their mind 
> around the kdbus patches… I am scared at just trying to understand the 
> patches. So no technical complaints from me. I did not nack it nor do I 
> see myself in the position to nack it.

Please take the time to read it, 13k lines isn't much.  To not read the
code and yet complain about the code is total nonsense.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 18:57         ` Andy Lutomirski
  2015-04-14 19:23           ` Greg Kroah-Hartman
@ 2015-04-15 12:00           ` Greg Kroah-Hartman
  2015-04-15 12:09             ` Jiri Kosina
  1 sibling, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 12:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

[Back to the capability discussion]

On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote:
> >> Then I'll have to find a way to embolden my NACK further.  My point is
> >> that capturing garbage like cmdline and capabilities (again, that
> >> latter part is completely unacceptable under any circumstances
> >> whatsoever) on behalf of *all* senders is a disaster.  If it's
> >> optional, then I can at least hope that userspace will honor the
> >> optionality and let everything turn it off.  If it's mandatory, then
> >> kdbus is just unsafe to use to send messages to untrusted parties.
> >
> > It's opted in by the receiving peer if the task implementing a service
> > wants to access these pieces of information.  It is optional, and the
> > documentation clearly states that userspace should cope with this, and
> > also, when they are available we make sure to provide the correct
> > race-free information.
> >
> > As said many times before, an application can do so already today with
> > information from other API file systems, so why is this suddenly a
> > problem when kdbus optionally offers the exact same information along
> > with each transmitted message?  Yes, we all "hate" capabilities, but
> > userspace uses them, and gets access to them all the time through the
> > POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through
> > /proc/pid/status.  They are something that we have to support and handle
> > properly.
> >
> > In the very first submission of kdbus, we stated that we want to allow
> > userspace methods to access these same bits to be able to make decisions
> > about permissions.  And to do so in a race-free manner, which is very
> > hard, if not almost impossible, to do so from userspace alone.
> >
> > For instance, if a task has CAP_NET_ADMIN set, we can use that
> > information in order to allow or disallow certain actions to be taken by
> > a privileged process.  Or, if a client that has the capability to call
> > reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the
> > system, the system daemon listening for that message knows that yes, at
> > the time that the client made that call, it really did have that
> > capability so it is ok to actually reboot the system.
> >
> > Instead of trying to use SCM_CREDENTIALS to get the pid and another
> > round of cap_get_pid() and the like, all of which are susceptable to
> > racing and all sorts of other horrors, that are insecure, we can provide
> > this information in an atomic, and secure way.
> 
> /me suppresses a long string of expletives.
> 
> Please point me at the code that does this with caps.  It's WRONG in
> userspace and it's WRONG in the kernel.  I want to know what code that
> runs on my system does this so I can send the appropriate bug reports
> and get it fixed.  I think the RHEL crowd at least will take it
> seriously when I tell them that this is a security hole.

Look at how polkit and login manager work.  Or anything that uses
SCM_CREDENTIALS.  Also I think PAM does odd things with credentials, but
it's been a long time since I looked at any PAM code, I could be wrong.
Also look at users of SO_PEERCRED, as those are used in places as well,
but you know all about those.

Also look at programs that make those capability calls, they are
obviously using them for some reason, right?  Nothing we can do about
them, and it's not the main issue here at all, sorry for the
side-discussion.

> > The kernel today, and userspace, relies on capabilities all the time
> > (i.e. almost every syscall), how are they something that is somehow not
> > valid to use and support?
> 
> No.  The *kernel* relies on caps.  Userspace should not.

Userspace uses caps to have the kernel do things.  Or not do things.  If
not, why do we have things like SCM_CREDINTIALS in the first place?

> > And of course, as Eric will point out, capabailities are not
> > translatable across user namespaces, which is a problem.  Because of
> > this, we dispose of that piece of metadata information when a message
> > crosses a user namespace boundry.  This is the right thing to do, which
> > is not the case for almost all other kernel apis which report bogus
> > capabilies when user namespaces are crossed.
> 
> The right thing to do is to not use capabilities for userspace stuff.

Again, userspace needs them in order to have the kernel do things for
userspace as needed.  Look at the Tizen example in the first email,
where they had to use SCM_CREDENTIALS, and all of the speed/latency
issues that this resulted in.

> > So we implemented this correctly, and somehow that is a feature so bad
> > that both you and Eric think the whole baby should be thrown out?  How
> > else should this be implemented?
> 
> It shouldn't be implemented.

Great, so can we also drop those POSIX functions and the /proc/
information as well?  I didn't think so :)

> > As documented in the original email on this thread, Tizen wants to use
> > this, as it solves a real need that they have.  Their workarounds
> > involve using custom UDS sockets, but the latency involved is horrid and
> > unacceptable.  Using a kdbus message solves this issue for them,
> > allowing UI rendering to work properly/quickly.
> >
> > Again, capabilities are something we all require and rely on today,
> > passing the current capability on to a recipient isn't a way to raise
> > privileges at all, but rather, properly determine if they are present
> > at sending time, if wanted.  How does that create an insecure system?
> > What am I missing that is so bad here with the design we have?
> 
> That, even if the implementation could be made to be useful and
> correct, capabilities refer to privileges wrt the kernel, not
> userspace.  They're not the right bit of policy to look at here.

So what is the right bit of policy to look at then?

> For example, the thing that should make it possible to run 'systemctl
> reboot' or whatever is not CAP_SYS_BOOT, because CAP_SYS_BOOT is the
> permission to hard reboot the system immediately, and that's not what
> 'systemctl reboot' is for.

'systemctl reboot' calls a bunch of other things to determine if you
have local access to the machine, or permissions to reboot the machine
(i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
and then, it decides to reboot or not.  That happens today, right?  I
don't understand the argument here.

confused,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 11:49                         ` Greg Kroah-Hartman
@ 2015-04-15 12:03                           ` One Thousand Gnomes
  2015-04-15 12:41                             ` Greg Kroah-Hartman
  2015-04-15 12:55                           ` Al Viro
  2015-04-15 17:33                           ` Steven Rostedt
  2 siblings, 1 reply; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 12:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

> > There is no comparison between the elegance of X11 property setting and a
> > chunk of proposed kernel code that is half the size of a tiny X server!
> 
> Hey, take that up with Havoc, he made the comparison :)

And it concerns me you blindly repeat it without realising its wrong.

> > The dbus model is also flawed in a load of other ways in user space
> > because message handling in the hands of people with no concept of
> > systemic performance analysis just leads to disaster. One of the big
> > reasons dbus is so "slow" isn't that dbus is "slow", it's that the
> > crapware on top of it makes *thousands* of dbus queries.
> 
> There's the issue of thousands of dbus queries, and then there's the
> issue that making those queries takes a measurable amount of time.  We
> can fix the later one, the first one, well, not so much, but we can
> provide the resources for them to make a faster system if they want to.

If you fix the thousands of queries problem do you need kernel help at
all.

> The internet model with state in the endpoints doesn't always transfer
> properly to local applications, see Havoc's email for the details about
> that.

URL ?

(note how beautifully btw the stateless network and the URL string will
become a reference to state)

> > It's telling that I can lose and recover my internet connection without
> > rebooting but not my desktops internal messaging.
> 
> Yes, as those are totally different things, let's not mix the issue up
> here please.

They are *NOT* different things. They are fundamental properties of the
underlying architecture. I worked on stateful networks and still have
the scars. It is a fundamental property of stateful network that every
time any key component goes castors up you lose the lot. It is a fairly
fundamental property of stateless networks that equipment going castors
up has no material impact on the network

The internet is built upon three fundamental breakthroughs in technology

- That stateless networks scale and can be reliable while stateful ones
  cannot scale and cannot be fixed to do so

- That flow control is possible over a stateless network

- That efficient data routing is possible over a stateless network

Those are absolutely critical parts of any network or messaging
implementation.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 12:00           ` Greg Kroah-Hartman
@ 2015-04-15 12:09             ` Jiri Kosina
  2015-04-15 12:18               ` One Thousand Gnomes
  2015-04-15 12:27               ` Greg Kroah-Hartman
  0 siblings, 2 replies; 316+ messages in thread
From: Jiri Kosina @ 2015-04-15 12:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:

> 'systemctl reboot' calls a bunch of other things to determine if you
> have local access to the machine, or permissions to reboot the machine
> (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
> and then, it decides to reboot or not.  That happens today, right?  I
> don't understand the argument here.

And what exactly is the argument that this is the way it should be 
implemnted?

Why can't it just rely on the kernel to provide final answer to "to reboot 
or not to reboot, that is the question"?

At the end of the day, it's the kernel that decides whether it will really 
ultimately ask the platform to reboot.

If, for whatever reason (which might be completely invisible to userspace) 
kernel decides not to do so, userspace has to be able to recover from such 
failure in any case.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 12:09             ` Jiri Kosina
@ 2015-04-15 12:18               ` One Thousand Gnomes
  2015-04-15 12:30                 ` Greg Kroah-Hartman
  2015-04-15 12:27               ` Greg Kroah-Hartman
  1 sibling, 1 reply; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 12:18 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015 14:09:24 +0200 (CEST)
Jiri Kosina <jkosina@suse.cz> wrote:

> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
> 
> > 'systemctl reboot' calls a bunch of other things to determine if you
> > have local access to the machine, or permissions to reboot the machine
> > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
> > and then, it decides to reboot or not.  That happens today, right?  I
> > don't understand the argument here.

The first problem with that is that if you run the capability model in
the kernel combined with our distributions through any kind of formal
analysis it'll come out with more holes than a roll of wire netting.

There are lots of capability handling bugs that allow you to get one
capability from another where it should not be possible.  Linux
capabilities were a little ad-hoc and a "neat idea" in their day.

It's not how anyone would do them now. At best they are ok for little
things like network raw access in ping/traceroute.

Thats an implementation detail. If we were to adopt something like
capsicum the stuff you pass would look way different and the model would
potentially work.

> And what exactly is the argument that this is the way it should be 
> implemnted?

For me the fact that capabilities are known legacy and broken, and the
model will change. Better would be to just pass some "cookie" that can be
used to ask "is the sender allowed to X" via the LSM modules.

That futureproofs the portability I think - and is also actually more
powerful anyway.
 
> Why can't it just rely on the kernel to provide final answer to "to reboot 
> or not to reboot, that is the question"?

It can, however you may want userspace to assert privileges and reboot
even though the user doesn't have the right powers directly (think about
mundane things like ctrl-alt-del or the reboot button on a desktop).

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 12:09             ` Jiri Kosina
  2015-04-15 12:18               ` One Thousand Gnomes
@ 2015-04-15 12:27               ` Greg Kroah-Hartman
  1 sibling, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 12:27 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 02:09:24PM +0200, Jiri Kosina wrote:
> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
> 
> > 'systemctl reboot' calls a bunch of other things to determine if you
> > have local access to the machine, or permissions to reboot the machine
> > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
> > and then, it decides to reboot or not.  That happens today, right?  I
> > don't understand the argument here.
> 
> And what exactly is the argument that this is the way it should be 
> implemnted?

I can't answer that, discuss it with the developers of that userspace
code please.

> Why can't it just rely on the kernel to provide final answer to "to reboot 
> or not to reboot, that is the question"?

Usually you want to do a few things before telling the kernel to reboot,
like unmount all filesystems and the like :)

Anyway, we are getting away from the code at hand, please, let's discuss
that.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 12:18               ` One Thousand Gnomes
@ 2015-04-15 12:30                 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 12:30 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:18:28PM +0100, One Thousand Gnomes wrote:
> On Wed, 15 Apr 2015 14:09:24 +0200 (CEST)
> Jiri Kosina <jkosina@suse.cz> wrote:
> 
> > On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
> > 
> > > 'systemctl reboot' calls a bunch of other things to determine if you
> > > have local access to the machine, or permissions to reboot the machine
> > > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
> > > and then, it decides to reboot or not.  That happens today, right?  I
> > > don't understand the argument here.
> 
> The first problem with that is that if you run the capability model in
> the kernel combined with our distributions through any kind of formal
> analysis it'll come out with more holes than a roll of wire netting.
> 
> There are lots of capability handling bugs that allow you to get one
> capability from another where it should not be possible.  Linux
> capabilities were a little ad-hoc and a "neat idea" in their day.

"formal analysis"?  Heh, yeah, I know all about that, and really, that's
not anything we can do about here.

> It's not how anyone would do them now. At best they are ok for little
> things like network raw access in ping/traceroute.
> 
> Thats an implementation detail. If we were to adopt something like
> capsicum the stuff you pass would look way different and the model would
> potentially work.

True, the capsicum developers seem to have gone quiet on us :(

> > And what exactly is the argument that this is the way it should be 
> > implemnted?
> 
> For me the fact that capabilities are known legacy and broken, and the
> model will change. Better would be to just pass some "cookie" that can be
> used to ask "is the sender allowed to X" via the LSM modules.
> 
> That futureproofs the portability I think - and is also actually more
> powerful anyway.

Yes, that would work, but that kind of sounds like the same thing we
have today, just with a different name :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:09                           ` Greg Kroah-Hartman
@ 2015-04-15 12:36                             ` Al Viro
  2015-04-15 13:13                               ` Greg Kroah-Hartman
  2015-04-15 16:47                             ` Steven Rostedt
  1 sibling, 1 reply; 316+ messages in thread
From: Al Viro @ 2015-04-15 12:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote:

> I've asked for it, but finding people to review code is hard, as you
> know.  It's only 13k lines long, smaller than a serial port driver (my
> unit of code review), so it's not all that big.
> 
> It's smaller than the USB3 host controller driver as well, and very few
> people ever reviewed that beast :)
> 
> > For something that's potentially such a core mechanism as a completely 
> > new, massively-adopted IPC, this does send a warning singal.
> 
> If you know of a way to force others to review code, please let me know.

Have it in a less nasty state, perhaps?  Random question:

al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock
ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock);
ipc/kdbus/node.c:340:   down_write(&kdbus_node_idr_lock);
ipc/kdbus/node.c:344:   up_write(&kdbus_node_idr_lock);
ipc/kdbus/node.c:444:           down_write(&kdbus_node_idr_lock);
ipc/kdbus/node.c:452:           up_write(&kdbus_node_idr_lock);

Do you see anything wrong with that?  Or with things like that:
                mutex_lock(&pos->lock);
                v_pre = atomic_read(&pos->active);
                if (v_pre >= 0)
                        atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
                else if (v_pre == KDBUS_NODE_NEW)
                        atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
                mutex_unlock(&pos->lock);
What are the locking rules for ->active/->waitq/->lock?  Are those the
outermost thing in the hierarchy?  Or is that dependent on the node location?
It sure as hell is outside of (at least) ->mmap_sem (by way of
kdbus_conn_connect() establishing that ->active/->waitq is outside of
->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything
taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone
brings in a lot...

Document your goddamn locking, would you?  It *IS* new code, and you, as you
say, had very few people working on it, so you don't have the excuses for
the mess existing in older parts of the tree.

Locking complexity in there is easily as bad as that of VFS sans the RCU fun;
sure, I can spend a week and (hopefully) document it for you, but I would
really prefer if you guys had done that.  And I *do* appreciate the comments
in node.c, but they are nowhere near enough.

Tracking the call chains in there and trying to derive the locking ordering
from those is quite a bit of work; _verifying_ that it matches the claimed
one would be expected from reviewers, but as it is you are asking to spend
a lot of efforts to close the gaps in your documentation.  Sheesh...

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 12:03                           ` One Thousand Gnomes
@ 2015-04-15 12:41                             ` Greg Kroah-Hartman
  2015-04-15 14:06                               ` One Thousand Gnomes
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 12:41 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 01:03:54PM +0100, One Thousand Gnomes wrote:
> > > There is no comparison between the elegance of X11 property setting and a
> > > chunk of proposed kernel code that is half the size of a tiny X server!
> > 
> > Hey, take that up with Havoc, he made the comparison :)
> 
> And it concerns me you blindly repeat it without realising its wrong.

It's a metaphor that makes sense to me given my limited knowledge of the
x11 protocol.  If it's wrong, ok, I'm willing to learn, but I think it's
still relevant here.

> > > The dbus model is also flawed in a load of other ways in user space
> > > because message handling in the hands of people with no concept of
> > > systemic performance analysis just leads to disaster. One of the big
> > > reasons dbus is so "slow" isn't that dbus is "slow", it's that the
> > > crapware on top of it makes *thousands* of dbus queries.
> > 
> > There's the issue of thousands of dbus queries, and then there's the
> > issue that making those queries takes a measurable amount of time.  We
> > can fix the later one, the first one, well, not so much, but we can
> > provide the resources for them to make a faster system if they want to.
> 
> If you fix the thousands of queries problem do you need kernel help at
> all.

I've worked with developers of such systems, and no, they can't fix that
problem.  They are using "legacy" applications that they have to run on
some type of operating system, and really don't want to use legacy
operating systems anymore.  Those "legacy" oses provide a system bus
that allows them to send thousands of queries just fine, but when moving
to Linux, we don't have anything other than D-Bus, so their library is
ported to use it, and they have to handle their old applications that
need/want the zillions of messages.

Then they thow the thing on a very underpowered ARM processor and
complain about boot time being so slow, but that's a different issue...

> > The internet model with state in the endpoints doesn't always transfer
> > properly to local applications, see Havoc's email for the details about
> > that.
> 
> URL ?
> 
> (note how beautifully btw the stateless network and the URL string will
> become a reference to state)

Heh, yes, but there's very little state here:
	http://lists.freedesktop.org/archives/dbus/2015-April/016651.html

There's also a follow-on message from the current D-Bus maintainer:
	http://lists.freedesktop.org/archives/dbus/2015-April/016653.html

> > > It's telling that I can lose and recover my internet connection without
> > > rebooting but not my desktops internal messaging.
> > 
> > Yes, as those are totally different things, let's not mix the issue up
> > here please.
> 
> They are *NOT* different things. They are fundamental properties of the
> underlying architecture. I worked on stateful networks and still have
> the scars. It is a fundamental property of stateful network that every
> time any key component goes castors up you lose the lot. It is a fairly
> fundamental property of stateless networks that equipment going castors
> up has no material impact on the network
> 
> The internet is built upon three fundamental breakthroughs in technology
> 
> - That stateless networks scale and can be reliable while stateful ones
>   cannot scale and cannot be fixed to do so
> 
> - That flow control is possible over a stateless network
> 
> - That efficient data routing is possible over a stateless network
> 
> Those are absolutely critical parts of any network or messaging
> implementation.

People take those stateless models and build stateful ones on top of
them, yes, it's great.  But you still need a stateful model somewhere in
order to be able to achieve many things (think a shopping cart
application).

Anyway, this is getting off-topic, there is very little "state" in the
kdbus kernel code here, other than a naming database that Havoc and
Simon explain the need for, and the normal lifecycle of kdbus "nodes"
(new, linked, active, inactive, drained, freed).

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 11:49                         ` Greg Kroah-Hartman
  2015-04-15 12:03                           ` One Thousand Gnomes
@ 2015-04-15 12:55                           ` Al Viro
  2015-04-15 17:33                           ` Steven Rostedt
  2 siblings, 0 replies; 316+ messages in thread
From: Al Viro @ 2015-04-15 12:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: One Thousand Gnomes, Jiri Kosina, Borislav Petkov,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:49:36PM +0200, Greg Kroah-Hartman wrote:
> > There is no comparison between the elegance of X11 property setting and a
> > chunk of proposed kernel code that is half the size of a tiny X server!
> 
> Hey, take that up with Havoc, he made the comparison :)

Let me get it straight - you swing the reference to his posting as damn nearly
the main argument, and yet you make _this_ reply when it gets questioned?
Seriously?

There's nothing wrong with "go read $PAPER, a lot of your questions are
addressed there", but only if you are ready to answer the questions and
objections from those who have read it.  "Hey, take that up with
$AUTHOR" doesn't cut it; try anything even remotely similar with e.g.
reviewers of academic paper and see where it ends up.

Havoc isn't submitting that thing; you are.  If you are not qualified to
defend your design and he is, try to talk him into doing that.

Frankly, the longer it goes, the less I like the picture.  It will be up to
Linus, of course, but IMO the whole situation seriously stinks. ;-/

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 11:40                       ` Greg Kroah-Hartman
@ 2015-04-15 13:03                         ` Borislav Petkov
  2015-04-15 15:41                         ` Steven Rostedt
  2015-04-15 19:04                         ` Martin Steigerwald
  2 siblings, 0 replies; 316+ messages in thread
From: Borislav Petkov @ 2015-04-15 13:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:40:36PM +0200, Greg Kroah-Hartman wrote:
> So stop making this discussion be about "oh those horrid systemd
> developers, I don't want their code as my init system" as that's not
> what any of this is about at all.  It's about the patches being
> proposed, and the API involved in it.  Please stick to that.

Well, you went there by saying that I should simply accept systemd and
whatever other crap people are producing just because Linux is not about
choice. And I'm still amazed that you really and seriously think that -
you must've been drinking the systemd cool aid for too long.

So to get back to kdbust: the design of this thing is flawed, it clearly
needs a lot more discussing and changes and it *absolutely* has no place
upstream in its current form as *no* *one* has reviewed that pile except
Andy and Eric to a certain degree. Oh and I haven't seen them lift their
NAKs yet...

You and I know that's not how stuff is upstreamed.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 12:36                             ` Al Viro
@ 2015-04-15 13:13                               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 13:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Jiri Kosina, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:36:33PM +0100, Al Viro wrote:
> al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock
> ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock);
> ipc/kdbus/node.c:340:   down_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:344:   up_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:444:           down_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:452:           up_write(&kdbus_node_idr_lock);

Heh, that's a leftover from an older version, I'll go fix that up to be
a simple mutex, which is all that this is doing here anyway.

> Do you see anything wrong with that?  Or with things like that:
>                 mutex_lock(&pos->lock);
>                 v_pre = atomic_read(&pos->active);
>                 if (v_pre >= 0)
>                         atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
>                 else if (v_pre == KDBUS_NODE_NEW)
>                         atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
>                 mutex_unlock(&pos->lock);
> What are the locking rules for ->active/->waitq/->lock?  Are those the
> outermost thing in the hierarchy?  Or is that dependent on the node location?
> It sure as hell is outside of (at least) ->mmap_sem (by way of
> kdbus_conn_connect() establishing that ->active/->waitq is outside of
> ->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything
> taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone
> brings in a lot...
> 
> Document your goddamn locking, would you?  It *IS* new code, and you, as you
> say, had very few people working on it, so you don't have the excuses for
> the mess existing in older parts of the tree.

Fair enough, documenting the locking is a good thing, that will make
reviewing this easier, I'll go work on that.

> Locking complexity in there is easily as bad as that of VFS sans the RCU fun;
> sure, I can spend a week and (hopefully) document it for you, but I would
> really prefer if you guys had done that.  And I *do* appreciate the comments
> in node.c, but they are nowhere near enough.

Thanks, it's hard to balance the comment/code level at times.  And yes,
it is complex and should be explained better, will work on that.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 11:25             ` One Thousand Gnomes
@ 2015-04-15 13:20               ` Borislav Petkov
  2015-04-15 15:45               ` Steven Rostedt
  1 sibling, 0 replies; 316+ messages in thread
From: Borislav Petkov @ 2015-04-15 13:20 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Greg Kroah-Hartman, Richard Weinberger, Andy Lutomirski, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote:
> dbus is not an appropriate design for a kernel messaging layer for a
> variety of reasons. That's not to say dbus shouldn't be able to use a
> fast kernel messaging layer, or that one shouldn't exist.
> 
> dbus is basically a very large very specialized and somewhat flawed
> policy engine on top of what should be simple messaging. The two need
> splitting apart.
> 
> Abstract low level messaging layers are not a new concept. V7 unix had
> one experimentally. It's about getting the separation right.
> 
> IMHO that probably involves getting the right people in the right place
> together - dbus designers, MPI and realtime people, kernel folks and
> possibly also some of the hardware messaging folk.
> 
> In filesystem terms
> 
> - stop writing a dbus only file system
> - figure out what a messaging "vfs" looks like
> - figure out what an clean low level kernel model looks like
> - figure out what has to be where to put the policy in userspace
> 
> What might also be worth review is how much dbus traffic actually ought to
> be an object store implemented say with tmpfs and inotify type
> functionality (or extensions of that) so that you can
> set/read/enumerate/get change notifications on properties.

FWIW, this sounds really sane and makes a lot of sense to me. I'd be
willing to give it some review cycles, as far as I can, when done this
way.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 12:41                             ` Greg Kroah-Hartman
@ 2015-04-15 14:06                               ` One Thousand Gnomes
  2015-04-15 16:27                                 ` Havoc Pennington
  0 siblings, 1 reply; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 14:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Al Viro, Borislav Petkov, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

> operating systems anymore.  Those "legacy" oses provide a system bus
> that allows them to send thousands of queries just fine, but when moving
> to Linux, we don't have anything other than D-Bus, so their library is
> ported to use it, and they have to handle their old applications that
> need/want the zillions of messages.

And if you look at those systems btw many of them have a very compact,
very clean very simple message passing interface,often in the hundreds
not tens of thousands of lines of code.

> People take those stateless models and build stateful ones on top of
> them, yes, it's great.  But you still need a stateful model somewhere in
> order to be able to achieve many things (think a shopping cart
> application).

We put the IP stack in the kernel not the shopping cart. A good shopping
cart of course only has state on the client.

> Anyway, this is getting off-topic, there is very little "state" in the
> kdbus kernel code here, other than a naming database that Havoc and
> Simon explain the need for, and the normal lifecycle of kdbus "nodes"
> (new, linked, active, inactive, drained, freed).

I'm not convinced the naming data belongs in kernel beyond the simplest
of "node 147". I'd offer a sort of proof by armwaving of this that if you
have


/dev/dbus/014   /dev/dbus/027 etc

you can add a symlink to /dev/dbus/014 of

/dev/dbus-by-name/gnome-wombat-grooming-daemon

or whatever

and we do that today for every other naming database and static
allocation we've spent the past 15 years evicting from the kernel.

That state isn't then held in a daemon that can crash nor is it invisible
to debuggers, user tools and admins.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  7:31           ` Mike Galbraith
@ 2015-04-15 14:48             ` Michal Schmidt
  2015-04-15 15:34               ` Mike Galbraith
                                 ` (2 more replies)
  0 siblings, 3 replies; 316+ messages in thread
From: Michal Schmidt @ 2015-04-15 14:48 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Greg Kroah-Hartman,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On 04/15/2015 09:31 AM, Mike Galbraith wrote:
> it seems [systemd] has now mandated group scheduling.

What makes you think so? Was it the fact that by default you have a
populated /sys/fs/cgroup/cpu/ hierarchy? This is either because some
unit requests the use of the cpu controller using one of the CPU*=
directives from systemd.resource-control(5), or (perhaps more likely)
because there is a privileged unit with Delegate=yes. The most likely
candidate is user@0.service, and so you could try preventing it from
starting:
  systemctl mask user@0.service

Note that systemd still works without group scheduling or any cgroup
subsystems enabled in the kernel:

  $ grep GROUP .config
  CONFIG_CGROUPS=y
  # CONFIG_CGROUP_DEBUG is not set
  # CONFIG_CGROUP_FREEZER is not set
  # CONFIG_CGROUP_DEVICE is not set
  # CONFIG_CGROUP_CPUACCT is not set
  # CONFIG_CGROUP_HUGETLB is not set
  # CONFIG_CGROUP_PERF is not set
  # CONFIG_CGROUP_SCHED is not set
  # CONFIG_BLK_CGROUP is not set
  # CONFIG_SCHED_AUTOGROUP is not set
  # CONFIG_NETFILTER_XT_MATCH_CGROUP is not set
  # CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set
  # CONFIG_NET_CLS_CGROUP is not set
  # CONFIG_CGROUP_NET_PRIO is not set
  # CONFIG_CGROUP_NET_CLASSID is not set

Michal

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 14:48             ` Michal Schmidt
@ 2015-04-15 15:34               ` Mike Galbraith
  2015-04-15 16:42               ` Mike Galbraith
  2015-04-17 16:53               ` Mike Galbraith
  2 siblings, 0 replies; 316+ messages in thread
From: Mike Galbraith @ 2015-04-15 15:34 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Greg Kroah-Hartman,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 2015-04-15 at 16:48 +0200, Michal Schmidt wrote:
> On 04/15/2015 09:31 AM, Mike Galbraith wrote:
> > it seems [systemd] has now mandated group scheduling.
> 
> What makes you think so?

If group sched is available, systemd decides on its own to use it, 
thus making the decision to eat that overhead for me should I happen 
to boot say an enterprise kernel to do some performance measurements. 

Perhaps there is a way to beg it to please not do that, but if so, I 
didn't find it in time.  The service that started group scheduling was 
explicitly disabled by me, but systemd started it at boot despite 
that.  Perhaps I didn't express my wishes clearly enough, or I need to 
burn a virgin or something to become worthy of its attention, dunno.

Applying my axe to its tentacles fixed the communication issue.

        -Mike

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 11:40                       ` Greg Kroah-Hartman
  2015-04-15 13:03                         ` Borislav Petkov
@ 2015-04-15 15:41                         ` Steven Rostedt
  2015-04-15 16:40                           ` Greg Kroah-Hartman
  2015-04-15 19:04                         ` Martin Steigerwald
  2 siblings, 1 reply; 316+ messages in thread
From: Steven Rostedt @ 2015-04-15 15:41 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:40:36PM +0200, Greg Kroah-Hartman wrote:
> 
> You have to trust someone to help make your system work together in a
> unified way.  If you can't trust your distro's engineers, then either
> start your own distro, or only run busybox on top of a kernel.  You
> really don't have much other "choice" than that :)

And obviously there is a lack of trust. And once kdbus is in, we must use
it, or support our own distro where we just do not have the time.

Personally, I'm fine with getting something in that will help userspace
tools work better. The issue I see, mostly from the side lines as I haven't
totally submerged myself into the dbus protocol (I think I should spend
some time to do just that), this is going too fast. Once it is in the kernel,
whatever ABI we expose is locked in stone. There's no changing it. We need
to make sure that this is well thought out. People seem to be of the impression
that the current dbus design has flaws, but because everything relies on it
we must still push it into the kernel because it mimics what is out there
in user space. I disagree.

As others have said. We do not need to follow the dbus design. If we can supply
a better transport layer than what the kernel supplies today, then tools will
eventually merge to it away from dbus. Perhaps the kernel can supply just enough
to have dbus improve its speed, but not with the entire complex solution that
kdbus is presenting today.

This isn't a case of Republicans vs Democrats pushing a health care system within
a window that was rushed. Now the US has a health care system that somewhat works
but due to politics its not being fixed (the ABI is solidified). I don't want
to have the same thing with kdbus. We are technical people here, lets solve it
with a technical solution, and not rush into things. dbus works today, what's
the rush to put something into the kernel that must be supported forever. Lets
make sure we do it right.

I'm serious about my Linux Plumbers proposal. If you can make it, and get the dbus
authors there too, and hopefully, Andy, Al and Eric can make it too. We should
really sit down and talk about it. Any other kernel developer that wants to
participate should, as a prerequisite, sit down and write a dbus interface, such
that they have an idea of how it works. I plan to. And I hope that I can learn
more about the interface and productively join in this discussion.

I'm willing to moderate the kdbus microconference. I think I'll add it now.

Thoughts?

-- Steve


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 11:25             ` One Thousand Gnomes
  2015-04-15 13:20               ` Borislav Petkov
@ 2015-04-15 15:45               ` Steven Rostedt
  2015-04-15 15:46                 ` Andy Lutomirski
  2015-04-15 16:35                 ` Greg Kroah-Hartman
  1 sibling, 2 replies; 316+ messages in thread
From: Steven Rostedt @ 2015-04-15 15:45 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Greg Kroah-Hartman, Richard Weinberger, Andy Lutomirski, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote:
> 
> IMHO that probably involves getting the right people in the right place
> together - dbus designers, MPI and realtime people, kernel folks and
> possibly also some of the hardware messaging folk.

/me continues on as a broken record

I suggest that we can do this at Linux Plumbers, and then follow up at
Kernel Summit, for those that can (or wont) attend plumbers.

-- Steve


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 15:45               ` Steven Rostedt
@ 2015-04-15 15:46                 ` Andy Lutomirski
  2015-04-15 16:35                 ` Greg Kroah-Hartman
  1 sibling, 0 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-15 15:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Richard Weinberger,
	Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 8:45 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote:
>>
>> IMHO that probably involves getting the right people in the right place
>> together - dbus designers, MPI and realtime people, kernel folks and
>> possibly also some of the hardware messaging folk.
>
> /me continues on as a broken record
>
> I suggest that we can do this at Linux Plumbers, and then follow up at
> Kernel Summit, for those that can (or wont) attend plumbers.

I'm definitely available for KS.  I'm not sure about Plumbers.

--Andy

>
> -- Steve
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 11:06             ` One Thousand Gnomes
@ 2015-04-15 16:00               ` Rik van Riel
  2015-04-15 16:44                 ` Havoc Pennington
  0 siblings, 1 reply; 316+ messages in thread
From: Rik van Riel @ 2015-04-15 16:00 UTC (permalink / raw)
  To: One Thousand Gnomes, Greg Kroah-Hartman
  Cc: Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On 04/15/2015 07:06 AM, One Thousand Gnomes wrote:

>> that anyone here does either.  In the many years I've spent working on
>> this, dbus has seemed to be odd, and strange, to the way that the kernel
>> has normally worked, because it is.  And that's not a bad thing, it's
>> just different, and for us to support real needs and requirements of our
>> users, is the requirement of the Linux kernel.
> 
> There are I think a set of intertwined problems here
> 
> - An efficient delivery system for multicast messages delivered locally
>   (be that MPI, dbus whatever - it's not "dbus or nothing")
> 
> - A kernel side dynamic namespace to describe what goes where
> 
> - A kernel side security model to describe who may receive what, and
>   which additional information/tags/cred info
> 
> - Something that provides state to stuff that needs it (and probably
>   belongs in userspace - dbus name service etc)
> 
> - Something that maps dbus and other models onto the kernel security
>   model (and we have tools like EBPF which are very powerful)
> 
> - Something that maps the kernel layer onto models like MPI-3

It is not clear to me why user space applications would
have to change if the kernel bus used for dbus behaves
differently from the userspace dbus daemon.

Can't libdbus take care of the differences, and remove
some of the problems highlighted by Alan (eg. the possibility
of the protocol requiring the kernel to keep more messages
in flight than we have memory for) ?


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 14:06                               ` One Thousand Gnomes
@ 2015-04-15 16:27                                 ` Havoc Pennington
  0 siblings, 0 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-15 16:27 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Greg Kroah-Hartman, Jiri Kosina, Al Viro, Borislav Petkov,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

Hi,

I'm temporarily joining the list if anyone has questions about why
dbus was originally the way it is. If you would like answers about its
latest usage, systemd, or the kernel implementation, those are best
answered by others.

I "led" the original design but I was hardly the only person involved.
I was sort of synthesizing previous efforts, lots of ideas from other
people, and mediating the politics of the time.

What I'd like to see in this conversation is: understanding what
exists, and why it exists.

If people understand that then I think they can make good decisions,
using whatever process or timeline you like; I don't pretend to know
much about kdbus, but I see a lot of confusion here about the use-case
and design of dbus itself.

No one should take the design on faith. To improve and maintain
something it must be understood.

Why should you bother to understand dbus as it exists? It's pretty
successful, and I think for a reason. Hundreds of programs are using
dbus, it's become (over a decade) foundational to the most-used Linux
userspaces, there are many different implementations of it, and it's
been quite a stable design over that time without any major changes. I
don't think that's because it's perfect; I do think it's because some
things are right, in ways that previous designs were not. The Linux
userspace community went through a lot of alternatives before dbus,
and dbus was the one that lasted.

The worst-case scenario in my mind would be for the kernel to merge
something dbus-like, but with ill-informed changes that render it
worse. Then you would have a new ABI that nobody wants to use. We have
a design in the wild that's been very successful. People using it for
its intended use-case seem to like it. Step 1 is to try to understand
why that is.

I will try to give my take on some of the reasons.

I can't emphasize enough that the success of dbus was *because of*
many "obvious" criticisms people may have. Why? Tradeoffs. Given
infinite time and resources, many of those tradeoffs can be mitigated
or avoided - and I see kdbus as part of an effort to do so.

The first and most important tradeoff: the central daemon (the hub in
the wheel). A central daemon has several disadvantages. The success of
dbus happened because those disadvantages, in this context, are not as
important as the advantages.

The advantages include:

 * ability to send a broadcast message to all interested processes
 * tracking/discovering well-known and unique names
 * crossing security domains (system-daemon-to-per-user-UIs, in
particular) in an orderly fashion
 * reducing the number of file descriptors needed for N apps to all
talk to each other
 * relatively simple model for application developers to get right

The disadvantages include:

 * performance (extra context switches, copies, and validations)
 * it's difficult to handle killing/restarting the central daemon;
dbus actually gives clients all the tools to do this, but in practice
if you restart the daemon you are gambling that a hundred clients
connected to it have implemented bug-free restart handling.
 * not a distributed cluster (it's a single bottleneck and point of
failure running on a single machine - the daemon is a source of truth,
which is also its virtue of course)

For dbus to be as useful as it has been, these disadvantages, while
not desirable, were acceptable tradeoffs. So it would be a mistake to
solve any of these disadvantages by breaking the advantages.

Message passing or IPC isn't really the most important part of dbus.
Process lifecycle tracking and discovery are more important. However,
by integrating the IPC system with the lifecycle tracking you can
simplify the overall system and avoid race conditions. For example,
you can have processes that auto-launch race-free when you send them a
message, or more generally you can have an ordering between lifecycle
events and other messages. For example if I send out a broadcast
message and then disconnect, other clients will see first the
broadcast and then the disconnect and won't have to handle the
out-of-order case.

dbus has a lot of semantic guarantees, such as message ordering, that
reduce application complexity and therefore reduce code and reduce
bugs.

When implementing a Linux workstation userspace, ideally you have lots
of little processes that do one thing each; but the tradeoff is that
multi-process adds complexity. If your model for a multi-process
program is that it has to solve a lot of hard distributed system
problems, then it adds a LOT of complexity. But when everyone's on a
single machine, it is not necessary to solve (all of) those problems,
and in fact trying to solve non-problems creates bugs by adding
tricky, rarely-touched codepaths. It is overengineering to treat "tray
icon talking to NetworkManager" the same way you would treat IPC and
shared state within a distributed cluster.

Multi-process is valuable though; an alternative userspace design
could be like Eclipse or Emacs, i.e. one enormous process with
plugins, which would be a mess.

There was some debate over my X11 analogy. One of the "thought
experiments" while figuring out dbus was "why does CORBA seem to be at
the root of endless bug reports, while X11 isn't?"

Here are some things I think dbus has in common with X11:

 * it's a hub-and-spoke design (a central server that all apps connect
to) rather than a design where every process talks directly to every
other process
 * dbus names are directly modeled on X selections (see ICCCM)
 * designed to allow race-free asynchronous usage and minimize the
need for round trips (though apps can certainly design bad APIs, see
http://dbus.freedesktop.org/doc/dbus-api-design.html for advice on
avoiding that)
 * binary protocol rather than text
 * generally assumes a reliable network - assumes all messages will
arrive, as long as the connection is live
 * similar model for discovering and authenticating to the server
 * allows clients to track each other's lifecycle
 * it is stateful; clients connect, fetch the current state, then
track changes to the state via events.

Some differences from X11 of course:

 * X11 is a domain-specific server (about sharing the graphics and
input hardware among multiple clients), while with dbus the
domain-specific API will be in some client and the bus is only an
intermediary.
 * X11 therefore has a bunch more server state than dbus; dbus only
has to track clients, not track the state of the window system.
 * IPC on X11 is sort of bolted on in an ugly way (client messages)
while dbus cleanly maps to the OO model people are used to in the rest
of their code.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 15:45               ` Steven Rostedt
  2015-04-15 15:46                 ` Andy Lutomirski
@ 2015-04-15 16:35                 ` Greg Kroah-Hartman
  2015-04-15 17:06                   ` Steven Rostedt
  1 sibling, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 16:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: One Thousand Gnomes, Richard Weinberger, Andy Lutomirski,
	Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:45:52AM -0400, Steven Rostedt wrote:
> On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote:
> > 
> > IMHO that probably involves getting the right people in the right place
> > together - dbus designers, MPI and realtime people, kernel folks and
> > possibly also some of the hardware messaging folk.
> 
> /me continues on as a broken record
> 
> I suggest that we can do this at Linux Plumbers, and then follow up at
> Kernel Summit, for those that can (or wont) attend plumbers.

I really doubt this will work for Plumbers, sorry.  And technical things
don't work well, if at all, at Kernel Summit.

We have had meetings about this at the past two Plumbers conferences,
where none of these things came up (i.e. dislike of the D-Bus model).

I'll be glad to discuss this at both places, but let's try to work
through the technical things through email, as really, that's the best
place for it.

Al just proved this by pointing out some issues to be resolved (RW lock
only used as a W lock, odd atomic values and locking without documenting
the lifecycles, etc.)  And that's the way this is supposed to work,
nothing new/different here that I can see.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 15:41                         ` Steven Rostedt
@ 2015-04-15 16:40                           ` Greg Kroah-Hartman
  2015-04-15 16:48                             ` Jiri Kosina
  2015-04-15 17:20                             ` Steven Rostedt
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 16:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:41:53AM -0400, Steven Rostedt wrote:
> 
> And obviously there is a lack of trust. And once kdbus is in, we must use
> it, or support our own distro where we just do not have the time.

Just like cgroups, and ftrace :)

> Personally, I'm fine with getting something in that will help userspace
> tools work better. The issue I see, mostly from the side lines as I haven't
> totally submerged myself into the dbus protocol (I think I should spend
> some time to do just that), this is going too fast. Once it is in the kernel,
> whatever ABI we expose is locked in stone. There's no changing it. We need
> to make sure that this is well thought out. People seem to be of the impression
> that the current dbus design has flaws, but because everything relies on it
> we must still push it into the kernel because it mimics what is out there
> in user space. I disagree.

"fast"?  Are you kidding me?  This stuff has been under active, public,
development for over two years.  We have been posting public patches,
asking for review and comments for _months_ now.  Given that there were
no more specific review comments on the patch set, and its success in
linux-next for almost the entire 4.0 development cycle, I asked it to be
merged.

I don't know too many other kernel features/drivers that have taken this
long, or done this "slowly", do you?

> As others have said. We do not need to follow the dbus design. If we can supply
> a better transport layer than what the kernel supplies today, then tools will
> eventually merge to it away from dbus. Perhaps the kernel can supply just enough
> to have dbus improve its speed, but not with the entire complex solution that
> kdbus is presenting today.

I originally thought this would work too.  8 months of work later, I was
proven wrong, that will not work.  Or it imposes too much additional
work on userspace that really makes no sense at all.  The in-kernel code
isn't a lot (again, 13k lines, smaller than almost all of the drivers
you are using today on an individual basis)  It's also really fast, but
with benchmarks, David and Andy have found some minor bottlenecks that
can make things faster.

Yes it seems complex, but read the documentation to get an idea of what
is happening here.  I think you will get a better appreciation of what
is going on.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 14:48             ` Michal Schmidt
  2015-04-15 15:34               ` Mike Galbraith
@ 2015-04-15 16:42               ` Mike Galbraith
  2015-04-17 16:53               ` Mike Galbraith
  2 siblings, 0 replies; 316+ messages in thread
From: Mike Galbraith @ 2015-04-15 16:42 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Greg Kroah-Hartman,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 2015-04-15 at 16:48 +0200, Michal Schmidt wrote:

>   systemctl mask user@0.service

That off switch may work better, I'll try it when I have time to 
squabble with the thing again, thanks.

user@0 was disabled in yast (suse admin tool) by me, yet found to be 
in state disabled+active upon every boot. Just as yast did, systemctl 
status reported it as being both disabled and active, which led me to 
the conclusion that someone other than me controls this service.

        -Mike

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 16:00               ` Rik van Riel
@ 2015-04-15 16:44                 ` Havoc Pennington
  2015-04-15 18:16                   ` Steven Rostedt
                                     ` (2 more replies)
  0 siblings, 3 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-15 16:44 UTC (permalink / raw)
  To: Rik van Riel
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 12:00 PM, Rik van Riel <riel@redhat.com> wrote:
> On 04/15/2015 07:06 AM, One Thousand Gnomes wrote:
>
>>> that anyone here does either.  In the many years I've spent working on
>>> this, dbus has seemed to be odd, and strange, to the way that the kernel
>>> has normally worked, because it is.  And that's not a bad thing, it's
>>> just different, and for us to support real needs and requirements of our
>>> users, is the requirement of the Linux kernel.
>>
>> There are I think a set of intertwined problems here
>>
>> - An efficient delivery system for multicast messages delivered locally
>>   (be that MPI, dbus whatever - it's not "dbus or nothing")
>>
>> - A kernel side dynamic namespace to describe what goes where
>>
>> - A kernel side security model to describe who may receive what, and
>>   which additional information/tags/cred info
>>
>> - Something that provides state to stuff that needs it (and probably
>>   belongs in userspace - dbus name service etc)
>>
>> - Something that maps dbus and other models onto the kernel security
>>   model (and we have tools like EBPF which are very powerful)
>>
>> - Something that maps the kernel layer onto models like MPI-3

When trying to split apart problems, for dbus it's important to keep
ordering guarantees.

That is, with dbus if I send a broadcast message, then send a unicast
request to another client, then drop the connection causing the bus to
broadcast that I've dropped; then the other client will see those
things in that order - the broadcast, then the request, and then that
I've dropped the connection.

If you have separate facilities for these things, it could get hard to
keep them in order. dbus uses the simple model that they stay in order
because the bus conceptually has a single dispatch queue.

By pushing everything through one queue, dbus is trying to reduce the
number of codepaths in applications. Apps have a lot of new problems
to solve if messages get their order scrambled.

(dbus does NOT guarantee order across multiple clients, of course -
there's no guarantee that all clients get the broadcast, before anyone
gets the next message - each client has its own buffer on both read
and write. The ordering is only with respect to each client's message
stream.)

Ordering is vital for tracking state, because if you're sending out
events to describe changes in state, the order of those changes is
important.

Of course there are more complex ways to handle this over in
distributed-systems-world.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  9:09                           ` Greg Kroah-Hartman
  2015-04-15 12:36                             ` Al Viro
@ 2015-04-15 16:47                             ` Steven Rostedt
  1 sibling, 0 replies; 316+ messages in thread
From: Steven Rostedt @ 2015-04-15 16:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Borislav Petkov, Al Viro, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote:
> 
> > But the problem really is that I don't think you've received even a single 
> > Reviewed-by: from someone who hasn't been directly involved in developing 
> > the code, right?
> 
> I've asked for it, but finding people to review code is hard, as you

Perhaps try harder. You know more kernel developers than I do. You don't have
anyone you can say "hey, I need this code reviewed, can you spend some time
to review it for me"? I have a few developers that are willing to do that
for me, and I wont push some code (if it is complex) until they give their
review-by for it. I did that with the latest TRACE_DEFINE_ENUM() code, as
well as my ftrace trampoline code and the multi buffer code. None of that
went in until I had their reviewed-by tags.

> know.  It's only 13k lines long, smaller than a serial port driver (my
> unit of code review), so it's not all that big.

Length of code does not determine the complexity of it.

> 
> It's smaller than the USB3 host controller driver as well, and very few
> people ever reviewed that beast :)
> 
> > For something that's potentially such a core mechanism as a completely 
> > new, massively-adopted IPC, this does send a warning singal.
> 
> If you know of a way to force others to review code, please let me know.

Keep asking, that's the best way. That's what I do. Also, I really like Alan's
approach to this. Let me requote it here:

  - stop writing a dbus only file system
  - figure out what a messaging "vfs" looks like
  - figure out what an clean low level kernel model looks like
  - figure out what has to be where to put the policy in userspace

-- Steve


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 16:40                           ` Greg Kroah-Hartman
@ 2015-04-15 16:48                             ` Jiri Kosina
  2015-04-15 17:33                               ` Greg Kroah-Hartman
  2015-04-15 17:20                             ` Steven Rostedt
  1 sibling, 1 reply; 316+ messages in thread
From: Jiri Kosina @ 2015-04-15 16:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Steven Rostedt, Borislav Petkov, Richard Weinberger,
	Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:

> The in-kernel code isn't a lot (again, 13k lines, smaller than almost 
> all of the drivers you are using today on an individual basis)  It's 

I originally didn't want to comment on this, but now that you are making 
this argument for 3rd or 4th time, I can't really resist. What exactly are 
you trying to "prove" by the 13k-lines argument?

mm/vmscan.c is less that 4k lines. Does that sole fact mean that the whole 
memory reclaim is trivial to review?

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 16:35                 ` Greg Kroah-Hartman
@ 2015-04-15 17:06                   ` Steven Rostedt
  2015-04-15 17:31                     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 316+ messages in thread
From: Steven Rostedt @ 2015-04-15 17:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: One Thousand Gnomes, Richard Weinberger, Andy Lutomirski,
	Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015 18:35:20 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:


> > I suggest that we can do this at Linux Plumbers, and then follow up at
> > Kernel Summit, for those that can (or wont) attend plumbers.
> 
> I really doubt this will work for Plumbers, sorry.  And technical things
> don't work well, if at all, at Kernel Summit.
> 
> We have had meetings about this at the past two Plumbers conferences,
> where none of these things came up (i.e. dislike of the D-Bus model).

But were the people that are not liking it at those conference sessions?


> 
> I'll be glad to discuss this at both places, but let's try to work
> through the technical things through email, as really, that's the best
> place for it.
> 
> Al just proved this by pointing out some issues to be resolved (RW lock
> only used as a W lock, odd atomic values and locking without documenting
> the lifecycles, etc.)  And that's the way this is supposed to work,
> nothing new/different here that I can see.

But you are missing one of the complaints that I'm reading from
people. The proposed ABI is too complex. Do we really want to jump into
having to support another tty layer?

One thing that I think may be really worth doing is that everyone on
this thread that has not yet done so, write a simple dbus application
to try to understand its design. Break it down to the requirements that
are needed, and discuss that.

Is there a reason that this patch must go in this merge window? Having
something this controversial take place during the merge window
suggests its a bit premature to push in now. Especially since it
creates a new user space interface. I think we need to really think
hard and long before we add something that can not be modified at a
later date.

I personally think face to face may help, even if it's just hallway
tracks. But at a minimum, I think more kernel developers need to play
with dbus to understand this more. And then be able to give a better
feedback. I'm also thinking that the bare minimum for a transport layer
should go in. Find out the exact requirements (as Alan suggested) and
implement that, instead of just implementing the full layer that is
happening in userspace today.

-- Steve


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 16:40                           ` Greg Kroah-Hartman
  2015-04-15 16:48                             ` Jiri Kosina
@ 2015-04-15 17:20                             ` Steven Rostedt
  2015-04-15 17:41                               ` Havoc Pennington
                                                 ` (2 more replies)
  1 sibling, 3 replies; 316+ messages in thread
From: Steven Rostedt @ 2015-04-15 17:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015 18:40:33 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> On Wed, Apr 15, 2015 at 11:41:53AM -0400, Steven Rostedt wrote:
> > 
> > And obviously there is a lack of trust. And once kdbus is in, we must use
> > it, or support our own distro where we just do not have the time.
> 
> Just like cgroups, and ftrace :)

Exactly.

> 
> > Personally, I'm fine with getting something in that will help userspace
> > tools work better. The issue I see, mostly from the side lines as I haven't
> > totally submerged myself into the dbus protocol (I think I should spend
> > some time to do just that), this is going too fast. Once it is in the kernel,
> > whatever ABI we expose is locked in stone. There's no changing it. We need
> > to make sure that this is well thought out. People seem to be of the impression
> > that the current dbus design has flaws, but because everything relies on it
> > we must still push it into the kernel because it mimics what is out there
> > in user space. I disagree.
> 
> "fast"?  Are you kidding me?  This stuff has been under active, public,
> development for over two years.  We have been posting public patches,
> asking for review and comments for _months_ now.  Given that there were
> no more specific review comments on the patch set, and its success in
> linux-next for almost the entire 4.0 development cycle, I asked it to be
> merged.
> 
> I don't know too many other kernel features/drivers that have taken this
> long, or done this "slowly", do you?

What other features/drivers that you know introduce a major new IPC
user space interface that will be a core component of the system?

> 
> > As others have said. We do not need to follow the dbus design. If we can supply
> > a better transport layer than what the kernel supplies today, then tools will
> > eventually merge to it away from dbus. Perhaps the kernel can supply just enough
> > to have dbus improve its speed, but not with the entire complex solution that
> > kdbus is presenting today.
> 
> I originally thought this would work too.  8 months of work later, I was
> proven wrong, that will not work.  Or it imposes too much additional
> work on userspace that really makes no sense at all.  The in-kernel code
> isn't a lot (again, 13k lines, smaller than almost all of the drivers
> you are using today on an individual basis)  It's also really fast, but
> with benchmarks, David and Andy have found some minor bottlenecks that
> can make things faster.
> 
> Yes it seems complex, but read the documentation to get an idea of what
> is happening here.  I think you will get a better appreciation of what
> is going on.

I read a bit of the documentation, but not enough. I really need to sit
down and play with code. That's the way I learn and understand.

-- Steve

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:06                   ` Steven Rostedt
@ 2015-04-15 17:31                     ` Greg Kroah-Hartman
  2015-04-15 18:04                       ` Steven Rostedt
  2015-04-15 21:56                       ` One Thousand Gnomes
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 17:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: One Thousand Gnomes, Richard Weinberger, Andy Lutomirski,
	Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:06:49PM -0400, Steven Rostedt wrote:
> On Wed, 15 Apr 2015 18:35:20 +0200
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> 
> 
> > > I suggest that we can do this at Linux Plumbers, and then follow up at
> > > Kernel Summit, for those that can (or wont) attend plumbers.
> > 
> > I really doubt this will work for Plumbers, sorry.  And technical things
> > don't work well, if at all, at Kernel Summit.
> > 
> > We have had meetings about this at the past two Plumbers conferences,
> > where none of these things came up (i.e. dislike of the D-Bus model).
> 
> But were the people that are not liking it at those conference sessions?

People who don't like a topic, usually go to a session about it, why
would they? :)

> > I'll be glad to discuss this at both places, but let's try to work
> > through the technical things through email, as really, that's the best
> > place for it.
> > 
> > Al just proved this by pointing out some issues to be resolved (RW lock
> > only used as a W lock, odd atomic values and locking without documenting
> > the lifecycles, etc.)  And that's the way this is supposed to work,
> > nothing new/different here that I can see.
> 
> But you are missing one of the complaints that I'm reading from
> people. The proposed ABI is too complex. Do we really want to jump into
> having to support another tty layer?

Don't make idle comments, the tty layer is far more complex and larger
than the kdbus code, with much nastier issues and problems.  And we
handle that just fine :)

As far as the "support" issue, we have 4 people who are all experienced,
senior kernel developers who are signed up to maintain this.  There's
more experience here for this one MAINTAINERS entry per line of code
than I have seen in quite some time.

Are people somehow worried that all 4 of us are going to run away?  Do
people not trust the 4 of us to stick around and maintain this and deal
with any issues found for the next few decades?  If so, please let us
know, as it seems like people feel we are dumping this code on them to
maintain, which is anything but true.

> One thing that I think may be really worth doing is that everyone on
> this thread that has not yet done so, write a simple dbus application
> to try to understand its design. Break it down to the requirements that
> are needed, and discuss that.

I've done that, it's hard, use the gdbus interface instead, it makes
your life much easier.

I'll again refer to ALSA here, no one writes a "raw" ALSA program, they
all use the library to interact with the kernel.  Do that here, there
are wonderful dbus libraries out there, for all languages.  Use them
instead.

> Is there a reason that this patch must go in this merge window?

What makes this merge window any different from any other?  Again, I
explained why I asked it to be merged at this point in time.  If people
have technical issues with it, I'll be more than glad to work them out
and merge it later, there's no "hard and fast deadline" anyone is asking
for here.

> Having something this controversial take place during the merge window
> suggests its a bit premature to push in now.

"take place"?  Have you been ignoring these patches posted numerous
times for many months?  This is the point in time to ask for code to be
merged, just like any other code, nothing is special here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 11:49                         ` Greg Kroah-Hartman
  2015-04-15 12:03                           ` One Thousand Gnomes
  2015-04-15 12:55                           ` Al Viro
@ 2015-04-15 17:33                           ` Steven Rostedt
  2015-04-15 18:11                             ` Greg Kroah-Hartman
  2 siblings, 1 reply; 316+ messages in thread
From: Steven Rostedt @ 2015-04-15 17:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:49:36PM +0200, Greg Kroah-Hartman wrote:
> 
> There's the issue of thousands of dbus queries, and then there's the
> issue that making those queries takes a measurable amount of time.  We
> can fix the later one, the first one, well, not so much, but we can
> provide the resources for them to make a faster system if they want to.

I'll argue that you can't fix the later one. One thing that I've observed over
the years of having faster computers is, as soon as you make it faster, people
will write slower software.

Currently the issue is that we have thousands of dbus queries, you make dbus
10x faster, I guarantee that people will write software with 10 thousand dbus
queries and we are no better off than we are today.

-- Steve


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 16:48                             ` Jiri Kosina
@ 2015-04-15 17:33                               ` Greg Kroah-Hartman
  2015-04-15 18:06                                 ` Steven Rostedt
  2015-04-16  8:43                                 ` Jiri Kosina
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 17:33 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Steven Rostedt, Borislav Petkov, Richard Weinberger,
	Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 06:48:46PM +0200, Jiri Kosina wrote:
> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
> 
> > The in-kernel code isn't a lot (again, 13k lines, smaller than almost 
> > all of the drivers you are using today on an individual basis)  It's 
> 
> I originally didn't want to comment on this, but now that you are making 
> this argument for 3rd or 4th time, I can't really resist. What exactly are 
> you trying to "prove" by the 13k-lines argument?
> 
> mm/vmscan.c is less that 4k lines. Does that sole fact mean that the whole 
> memory reclaim is trivial to review?

I'm trying to say that it's not a ton of code.  lines of code are of
course not a valid way to judge complexity, and I'm not trying to say
that.  I am trying to point out that it isn't "huge" by comparing it to
other chunks of code that we all know and love.

We merge subsystems with new userspace apis that are large than this all
the time.  I'm trying to say this isn't something "unusual" at all.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:20                             ` Steven Rostedt
@ 2015-04-15 17:41                               ` Havoc Pennington
  2015-04-15 17:55                               ` Greg Kroah-Hartman
  2015-04-15 18:12                               ` Greg Kroah-Hartman
  2 siblings, 0 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-15 17:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Greg Kroah-Hartman, Borislav Petkov, Richard Weinberger,
	Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 1:20 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> I read a bit of the documentation, but not enough. I really need to sit
> down and play with code. That's the way I learn and understand.
>

It might be useful for some of the current devs to post about the best
APIs to play with these days - my old libdbus is pretty painful,
compared to some of the newer stuff.

gdbus nicely shows a callback-based way to handle owning a service,
using a function like g_bus_own_name:

https://developer.gnome.org/gio/stable/gio-Owning-Bus-Names.html#g-bus-own-name

The callback-based approach means the library can handle
reconnection/restart on behalf of the app.

The flip side (the way you use rather than provide a service) looks similar:
 https://developer.gnome.org/gio/stable/gio-Watching-Bus-Names.html#g-bus-watch-name

Here the library can deal with complexities of a service being
restarted, the app only has to write the callbacks so they can be
called more than once (with alternating appeared/vanished handlers).

You can see in those API docs more of the ordering guarantees, in this
case on callback invocation - less for apps to screw up.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:20                             ` Steven Rostedt
  2015-04-15 17:41                               ` Havoc Pennington
@ 2015-04-15 17:55                               ` Greg Kroah-Hartman
  2015-04-15 21:55                                 ` One Thousand Gnomes
  2015-04-15 18:12                               ` Greg Kroah-Hartman
  2 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 17:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:20:37PM -0400, Steven Rostedt wrote:
> > I don't know too many other kernel features/drivers that have taken this
> > long, or done this "slowly", do you?
> 
> What other features/drivers that you know introduce a major new IPC
> user space interface that will be a core component of the system?

We've been merging these about one every other kernel release for a
while now.  Look at the drivers/misc/mic/ for one such example, there
are many others like this that are dealing with distributed systems and
having the kernel communicate between them through some custom userspace
api.  Usually ioctls :)

We merge a lot of stuff, and unfortunately it's hard to get a view of
everything that happens all the time.  I suggest reading at least the
shortlog summary of every commit if people are curious, I know I do.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-14 19:43               ` Greg Kroah-Hartman
@ 2015-04-15 17:59                 ` Austin S Hemmelgarn
  2015-04-15 18:04                   ` Rik van Riel
                                     ` (2 more replies)
  0 siblings, 3 replies; 316+ messages in thread
From: Austin S Hemmelgarn @ 2015-04-15 17:59 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Al Viro
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]

On 2015-04-14 15:43, Greg Kroah-Hartman wrote:
> On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote:
>> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
>>
>>>> I agree.  You've sent a pull request for an unfortunate design.  I
>>>> don't think that unfortunate design belongs in the kernel.  If it says
>>>> in userspace, then user programmers could potentially fix it some day.
>>>
>>> You might not like the design, but it is a valid design.  Again, we
>>> don't refuse to support hardware that is designed badly.  Or support
>>> protocols we don't necessarily like, that's not the job of a kernel or
>>> operating system.
>>
>> And no, "the sole consumer of that API knows better, so bend over" is not
>> a good idea.  We have shitloads of examples when single-consumer APIs
>> turned into screaming horrors; taking that in over the objections to API
>> design, merely on "they do it that way, who the hell we are to say they
>> are wrong?" is insane.
>
> Again, in this domain, the design is sound.  So much so that everyone
> who works in that area moved toward it (KDE, Qt, Go, etc.)  We might not
> think it makes sense, and it did take me a while to wrap my head around
> it, but to call it "crap" is unfair, sorry.
>

The reason that 'everyone who works in this area' adopted is not as much 
that the design is sound (I'm not arguing whether it is or isn't in this 
case) as it is that none of them could come up with anything better.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:59                 ` Austin S Hemmelgarn
@ 2015-04-15 18:04                   ` Rik van Riel
  2015-04-15 22:22                   ` One Thousand Gnomes
  2015-04-21 16:54                   ` Diego Viola
  2 siblings, 0 replies; 316+ messages in thread
From: Rik van Riel @ 2015-04-15 18:04 UTC (permalink / raw)
  To: Austin S Hemmelgarn, Greg Kroah-Hartman, Al Viro
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On 04/15/2015 01:59 PM, Austin S Hemmelgarn wrote:
> On 2015-04-14 15:43, Greg Kroah-Hartman wrote:
>> On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote:
>>> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
>>>
>>>>> I agree.  You've sent a pull request for an unfortunate design.  I
>>>>> don't think that unfortunate design belongs in the kernel.  If it says
>>>>> in userspace, then user programmers could potentially fix it some day.
>>>>
>>>> You might not like the design, but it is a valid design.  Again, we
>>>> don't refuse to support hardware that is designed badly.  Or support
>>>> protocols we don't necessarily like, that's not the job of a kernel or
>>>> operating system.
>>>
>>> And no, "the sole consumer of that API knows better, so bend over" is
>>> not
>>> a good idea.  We have shitloads of examples when single-consumer APIs
>>> turned into screaming horrors; taking that in over the objections to API
>>> design, merely on "they do it that way, who the hell we are to say they
>>> are wrong?" is insane.
>>
>> Again, in this domain, the design is sound.  So much so that everyone
>> who works in that area moved toward it (KDE, Qt, Go, etc.)  We might not
>> think it makes sense, and it did take me a while to wrap my head around
>> it, but to call it "crap" is unfair, sorry.
>>
> 
> The reason that 'everyone who works in this area' adopted is not as much
> that the design is sound (I'm not arguing whether it is or isn't in this
> case) as it is that none of them could come up with anything better.

They are smart people, and I would not underestimate
the usefulness of the user space API (above the
dbus library) that they came up with.

That does not mean the actual in-kernel implementation
needs to follow the same design criteria. It may make
sense to have part of the implementation in kernel space,
part in user space, and allow the userspace part to be
switched out to accommodate other protocols over the same
in-kernel bus...

Moving some of the policy bits into a user space daemon
may make sense. Storing messages that cannot be delivered
right now in user space could make sense, too.

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:31                     ` Greg Kroah-Hartman
@ 2015-04-15 18:04                       ` Steven Rostedt
  2015-04-15 21:56                       ` One Thousand Gnomes
  1 sibling, 0 replies; 316+ messages in thread
From: Steven Rostedt @ 2015-04-15 18:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: One Thousand Gnomes, Richard Weinberger, Andy Lutomirski,
	Al Viro, Eric W. Biederman, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015 19:31:45 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> > But were the people that are not liking it at those conference sessions?
> 
> People who don't like a topic, usually go to a session about it, why
> would they? :)

Exactly, but if you invite those people, and say "hey, here's your
chance to set us straight" maybe they'll come. I would.

But give them a few weeks notice, so that they can study what's out
there.


> > But you are missing one of the complaints that I'm reading from
> > people. The proposed ABI is too complex. Do we really want to jump into
> > having to support another tty layer?
> 
> Don't make idle comments, the tty layer is far more complex and larger

We are all making our own little exaggerated metaphors. ;-)

> than the kdbus code, with much nastier issues and problems.  And we
> handle that just fine :)
> 
> As far as the "support" issue, we have 4 people who are all experienced,
> senior kernel developers who are signed up to maintain this.  There's
> more experience here for this one MAINTAINERS entry per line of code
> than I have seen in quite some time.

No, but people seems to be worried about the complexity. If everyone
understands that there's no other choice but to have it complex (like
RCU is), then everyone will be fine with it. But right now, people are
questioning why it needs to be complex. But we need more people to
spend time on it to make sure it does.


> > One thing that I think may be really worth doing is that everyone on
> > this thread that has not yet done so, write a simple dbus application
> > to try to understand its design. Break it down to the requirements that
> > are needed, and discuss that.
> 
> I've done that, it's hard, use the gdbus interface instead, it makes
> your life much easier.

I still need to play with the code and see exactly what it does. What
goes into the kernel needs to be the raw interface only. Everything
else should be in a library that takes care of the details. Is that
what is here?


> 
> I'll again refer to ALSA here, no one writes a "raw" ALSA program, they
> all use the library to interact with the kernel.  Do that here, there
> are wonderful dbus libraries out there, for all languages.  Use them
> instead.

Is this what is being proposed (again, I need to go back and read the
original change log. I did it once, but mostly forgot what was in it).

> 
> > Is there a reason that this patch must go in this merge window?
> 
> What makes this merge window any different from any other?  Again, I
> explained why I asked it to be merged at this point in time.  If people
> have technical issues with it, I'll be more than glad to work them out
> and merge it later, there's no "hard and fast deadline" anyone is asking
> for here.

Well, there's been a few minor things that have been pointed out (the
locking), and having something as small as that take place during a
merge window, to me, would be cause to wait another merge window.

> 
> > Having something this controversial take place during the merge window
> > suggests its a bit premature to push in now.
> 
> "take place"?  Have you been ignoring these patches posted numerous
> times for many months?  This is the point in time to ask for code to be
> merged, just like any other code, nothing is special here.

But there are still complaints about it. Perhaps people are just
noticing. We are all busy, and nobody (but perhaps Andrew Morton and
Jon Corbet) reads every LKML message. It's now getting more eyes.
That's a good thing.

I'd like more time to play with it so that I can understand why exactly
it needs to go in as you say it does.

-- Steve

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:33                               ` Greg Kroah-Hartman
@ 2015-04-15 18:06                                 ` Steven Rostedt
  2015-04-16  8:43                                 ` Jiri Kosina
  1 sibling, 0 replies; 316+ messages in thread
From: Steven Rostedt @ 2015-04-15 18:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Borislav Petkov, Richard Weinberger,
	Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015 19:33:57 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> We merge subsystems with new userspace apis that are large than this all
> the time.  I'm trying to say this isn't something "unusual" at all.

I believe the difference is that those subsystems are not part of the
core system infrastructure. If it is, can you please tell me what they
are.

People don't use perf and tracing to run their desktops. People will be
using kdbus though.

-- Steve

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:33                           ` Steven Rostedt
@ 2015-04-15 18:11                             ` Greg Kroah-Hartman
  0 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 18:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: One Thousand Gnomes, Jiri Kosina, Al Viro, Borislav Petkov,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote:
> On Wed, Apr 15, 2015 at 01:49:36PM +0200, Greg Kroah-Hartman wrote:
> > 
> > There's the issue of thousands of dbus queries, and then there's the
> > issue that making those queries takes a measurable amount of time.  We
> > can fix the later one, the first one, well, not so much, but we can
> > provide the resources for them to make a faster system if they want to.
> 
> I'll argue that you can't fix the later one. One thing that I've observed over
> the years of having faster computers is, as soon as you make it faster, people
> will write slower software.
> 
> Currently the issue is that we have thousands of dbus queries, you make dbus
> 10x faster, I guarantee that people will write software with 10 thousand dbus
> queries and we are no better off than we are today.

Then they get to buy a faster machine :)

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:20                             ` Steven Rostedt
  2015-04-15 17:41                               ` Havoc Pennington
  2015-04-15 17:55                               ` Greg Kroah-Hartman
@ 2015-04-15 18:12                               ` Greg Kroah-Hartman
  2 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-15 18:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:20:37PM -0400, Steven Rostedt wrote:
> > Yes it seems complex, but read the documentation to get an idea of what
> > is happening here.  I think you will get a better appreciation of what
> > is going on.
> 
> I read a bit of the documentation, but not enough. I really need to sit
> down and play with code. That's the way I learn and understand.

Here's a good mapping for C developers that Lennart wrote last year:
	https://lwn.net/Articles/619250/
that should give you a good starting point.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15  8:37                   ` Greg Kroah-Hartman
@ 2015-04-15 18:12                     ` James Bottomley
  2015-04-16 12:13                       ` David Herrmann
  0 siblings, 1 reply; 316+ messages in thread
From: James Bottomley @ 2015-04-15 18:12 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Kosina, Steven Rostedt, John Stoffel, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni, Paul E. McKenney

On Wed, 2015-04-15 at 10:37 +0200, Greg Kroah-Hartman wrote:
> On Wed, Apr 15, 2015 at 12:05:01AM +0200, Jiri Kosina wrote:
> > On Tue, 14 Apr 2015, Steven Rostedt wrote:
> > 
> > > I believe that Linux Plumbers is still accepting MicroConferences. I 
> > > wonder if this would be a good one to have. Try to get everyone face to 
> > > face and talk about how exactly kdbus should be implemented in the 
> > > kernel.
> > 
> > I personally would even put more emphasis on a session that would first 
> > focus on "why", before we look at "how".
> > 
> > I have already asked about this during the earlier RFC submissions, but 
> > the only "take-home message" I took from that discussion was "because it's 
> > faster than what we currently have". I don't find that a sufficient 
> > justification by itself for something so complex (with potential 
> > implications all over the place for the whole Linux ecosystem), especially 
> > given the fact we already have sealed memfds zerocopy etc (and I am not 
> > even talking about the "infinite set-in-stone userspace API" implications 
> > this has).
> 
> I wrote many many lines of "why" in the patch submissions, and in the
> first email in this thread.  Are any of those specific solutions and
> "why" reasons not correct in your opinion?  If so, great, please let me
> know.
> 
> But to say that no one is focusing on "why" is a slight to those of us
> who have been providing just that.

Please stop. A debate that degenerates into a disagreement about whether
specific questions have or have not been answered is no debate at all:
it's an ideological show case.  If both sides are going to do the same
at plumbers (or elsewhere) it will be a waste of time (well, except as a
spectator sport).

To make this work, you need (as the plumbers MC templates tell you) a
list of key attendees from all sides of the debate who'll commit to
coming (mostly what I've heard so far is people committing not to
coming) and a list of guiding topics which people will commit to
discussing honestly.

For me the biggest issue is the container problem: it's really hard to
containerise kdbus because of the stateful nature of the protocol and
the fact that it has a well known system bus.  Separation into domains
works for OS containers, but application containers need more fluidity.
It's not unlike the same problem on windows: Windows application
containers are very difficult to do because the global registry means
that OLE handlers all have to run inside your container as well
(effectively making it an OS container).  I'm sure, since we already
have a lot of containers people going to plumbers, that we can get them
to turn up for the discussion.

James



^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 16:44                 ` Havoc Pennington
@ 2015-04-15 18:16                   ` Steven Rostedt
  2015-04-15 18:40                     ` Havoc Pennington
  2015-04-15 20:22                   ` Andy Lutomirski
  2015-04-15 22:08                   ` One Thousand Gnomes
  2 siblings, 1 reply; 316+ messages in thread
From: Steven Rostedt @ 2015-04-15 18:16 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman,
	Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 12:44:44PM -0400, Havoc Pennington wrote:
> 
> By pushing everything through one queue, dbus is trying to reduce the
> number of codepaths in applications. Apps have a lot of new problems
> to solve if messages get their order scrambled.

But can't a dbus library handle this for the apps? Like implementing TCP on
top of UDP. I really doubt the entire dbus protocol needs to be pushed into
the kernel.

I'm going to try to spend some time reading about dbus and playing with the
code (thanks for the links BTW!). Then I can see if I can come up with
something too. Or at least be able to ask the right questions.

-- Steve


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 18:16                   ` Steven Rostedt
@ 2015-04-15 18:40                     ` Havoc Pennington
  0 siblings, 0 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-15 18:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman,
	Jiri Kosina, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 2:16 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> But can't a dbus library handle this for the apps? Like implementing TCP on
> top of UDP. I really doubt the entire dbus protocol needs to be pushed into
> the kernel.

You could probably do something like assign sequence numbers,
temporarily relax ordering, and then reconstruct the order where
needed, but somebody still has to assign the sequence numbers in
order, and the bus has to process requests in order (it can't flip a
subscribe and an unsubscribe, for example). So I don't know whether
you could get anywhere with it or not.

The current model (userspace dbus daemon, don't know about kdbus) is like this:

 - you have a pool of ordered incoming queues from each client, where
each incoming queue conceptually ends with EOF of course

 - the main bus loop does:
    - pick the head message or EOF from any nonempty incoming queue for dispatch
    - route it according to destination address or subscribers
    - if the destination includes the bus itself (e.g. someone wanting
to subscribe or own a name or whatever) then process the request...
note that this will potentially affect how the next message gets
routed
    - for each destination client, write the message to the ordered
outgoing queue for that client
    - if the incoming queue has EOF then send out notifications about
that to interested clients, clean up bus names, etc.

Conceptually, filling and draining the queues could easily be in
separate threads, though the userspace daemon doesn't do that.

> I'm going to try to spend some time reading about dbus and playing with the
> code (thanks for the links BTW!). Then I can see if I can come up with
> something too. Or at least be able to ask the right questions.
>

Greg may be right to point people to Lennart's C binding which has a
lot less "baggage" than the GLib stuff, which assumes knowledge of the
"glib way"

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 11:40                       ` Greg Kroah-Hartman
  2015-04-15 13:03                         ` Borislav Petkov
  2015-04-15 15:41                         ` Steven Rostedt
@ 2015-04-15 19:04                         ` Martin Steigerwald
  2 siblings, 0 replies; 316+ messages in thread
From: Martin Steigerwald @ 2015-04-15 19:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Borislav Petkov, Richard Weinberger, Andy Lutomirski, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

Am Mittwoch, 15. April 2015, 13:40:36 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 11:44:11AM +0200, Borislav Petkov wrote:
> > On Wed, Apr 15, 2015 at 11:27:13AM +0200, Greg Kroah-Hartman wrote:
> > > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> > > > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman 
wrote:
> > > > > > We're all forced to use cgroups, systemd, udev unless we want
> > > > > > to have busybox as userland. That's a fact.
> > > > > 
> > > > > Is that a problem?
> > > > 
> > > > I'm amazed that you're really actually asking that question :-(
> > > 
> > > Really?  Why can't userspace rely on the features that the kernel
> > > provides them?
> > 
> > Userspace can do whatever it wants. As long as I'm not being *forced*
> > to do what userspace thinks is the right thing.
> > 
> > It seems to me that since that whole systemd* debacle started, we're
> > forgetting the choice aspect.
> 
> What "choice" aspect?  Surely you aren't going to make the "Linux is
> about choice" argument are you?
> 
> > And dammit, I want my choice. I want to be able to choose what I'm
> > running. Not run what someone else thought what would be good for me
> > to
> > run. If I wanted that, I'd long switched to windoze or äbble.
> 
> Oh crap, you went there :)
> 
> Take a look at http://www.islinuxaboutchoice.com/ please.

Just one question:

In what way is the post of a single kernel developer authoritative for the 
whole community?

Even if I would make a poster of 200x100 meters or so and stick it onto a 
building, it wouldn´t be.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 16:44                 ` Havoc Pennington
  2015-04-15 18:16                   ` Steven Rostedt
@ 2015-04-15 20:22                   ` Andy Lutomirski
  2015-04-15 20:41                     ` Al Viro
                                       ` (3 more replies)
  2015-04-15 22:08                   ` One Thousand Gnomes
  2 siblings, 4 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-15 20:22 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman,
	Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <hp@pobox.com> wrote:
> On Wed, Apr 15, 2015 at 12:00 PM, Rik van Riel <riel@redhat.com> wrote:
>> On 04/15/2015 07:06 AM, One Thousand Gnomes wrote:
>>
>>>> that anyone here does either.  In the many years I've spent working on
>>>> this, dbus has seemed to be odd, and strange, to the way that the kernel
>>>> has normally worked, because it is.  And that's not a bad thing, it's
>>>> just different, and for us to support real needs and requirements of our
>>>> users, is the requirement of the Linux kernel.
>>>
>>> There are I think a set of intertwined problems here
>>>
>>> - An efficient delivery system for multicast messages delivered locally
>>>   (be that MPI, dbus whatever - it's not "dbus or nothing")
>>>
>>> - A kernel side dynamic namespace to describe what goes where
>>>
>>> - A kernel side security model to describe who may receive what, and
>>>   which additional information/tags/cred info
>>>
>>> - Something that provides state to stuff that needs it (and probably
>>>   belongs in userspace - dbus name service etc)
>>>
>>> - Something that maps dbus and other models onto the kernel security
>>>   model (and we have tools like EBPF which are very powerful)
>>>
>>> - Something that maps the kernel layer onto models like MPI-3
>
> When trying to split apart problems, for dbus it's important to keep
> ordering guarantees.
>
> That is, with dbus if I send a broadcast message, then send a unicast
> request to another client, then drop the connection causing the bus to
> broadcast that I've dropped; then the other client will see those
> things in that order - the broadcast, then the request, and then that
> I've dropped the connection.

This leads me to a potentially interesting question: where's the
buffering?  If there's a bus with lots of untrusted clients and one of
them broadcasts data faster than all receivers can process it, where
does it go?

At least with a userspace solution, it's clear what the OOM killer
should kill when this happens.  Unless it's PID 1.  Sigh.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 20:22                   ` Andy Lutomirski
@ 2015-04-15 20:41                     ` Al Viro
  2015-04-15 21:07                     ` Rik van Riel
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 316+ messages in thread
From: Al Viro @ 2015-04-15 20:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Havoc Pennington, Rik van Riel, One Thousand Gnomes,
	Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 01:22:12PM -0700, Andy Lutomirski wrote:

> This leads me to a potentially interesting question: where's the
> buffering?  If there's a bus with lots of untrusted clients and one of
> them broadcasts data faster than all receivers can process it, where
> does it go?
> 
> At least with a userspace solution, it's clear what the OOM killer
> should kill when this happens.  Unless it's PID 1.  Sigh.

... and there is a PID 1 specimen that really likes to spew over dbus.
A lot.  I had never been able to find out _why_ does systemd feel like
broadcasting all kinds of stuff from PID 1 - maybe somebody in this
thread can answer that.  For example, what's the point of broadcasting
mount table updates, when
	* it can't hope to catch all individual changes - they _can_ get
lumped together, no matter what it tries.
	* any process can just as easily keep track of that data on its
own as it could by watching those broadcasts; parsing /proc/self/mountinfo
isn't harder than parsing notifications.
	* you need to start with obtaining the original state somehow, or
what would you apply those updates to?
	* if one insists on having a daemon doing such broadcasts, what
the hell is the point of having PID 1 do that?  Exact same logics would
do just fine.  Moreover, you could have one running in a namespace of
your session, which is something PID 1 won't see.

Sure, I understand why it wants to be aware of what's mounted and where it's
mounted.  Just as it wants to know what time it is.  Should it broadcast
a dbus message every second, just to tell everyone what had it found about
the time?

I'm somewhat tempted to propose AF_TWITTER - would match the style... ;-/
And frankly, this really looks like a social media braindamage - complete
with status update broadcast every time a plane flies by...

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 20:22                   ` Andy Lutomirski
  2015-04-15 20:41                     ` Al Viro
@ 2015-04-15 21:07                     ` Rik van Riel
  2015-04-16 18:03                       ` Djalal Harouni
  2015-04-15 21:58                     ` Havoc Pennington
  2015-04-16 13:13                     ` Tom Gundersen
  3 siblings, 1 reply; 316+ messages in thread
From: Rik van Riel @ 2015-04-15 21:07 UTC (permalink / raw)
  To: Andy Lutomirski, Havoc Pennington
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On 04/15/2015 04:22 PM, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <hp@pobox.com> wrote:
>> On Wed, Apr 15, 2015 at 12:00 PM, Rik van Riel <riel@redhat.com> wrote:
>>> On 04/15/2015 07:06 AM, One Thousand Gnomes wrote:
>>>
>>>>> that anyone here does either.  In the many years I've spent working on
>>>>> this, dbus has seemed to be odd, and strange, to the way that the kernel
>>>>> has normally worked, because it is.  And that's not a bad thing, it's
>>>>> just different, and for us to support real needs and requirements of our
>>>>> users, is the requirement of the Linux kernel.
>>>>
>>>> There are I think a set of intertwined problems here
>>>>
>>>> - An efficient delivery system for multicast messages delivered locally
>>>>   (be that MPI, dbus whatever - it's not "dbus or nothing")
>>>>
>>>> - A kernel side dynamic namespace to describe what goes where
>>>>
>>>> - A kernel side security model to describe who may receive what, and
>>>>   which additional information/tags/cred info
>>>>
>>>> - Something that provides state to stuff that needs it (and probably
>>>>   belongs in userspace - dbus name service etc)
>>>>
>>>> - Something that maps dbus and other models onto the kernel security
>>>>   model (and we have tools like EBPF which are very powerful)
>>>>
>>>> - Something that maps the kernel layer onto models like MPI-3
>>
>> When trying to split apart problems, for dbus it's important to keep
>> ordering guarantees.
>>
>> That is, with dbus if I send a broadcast message, then send a unicast
>> request to another client, then drop the connection causing the bus to
>> broadcast that I've dropped; then the other client will see those
>> things in that order - the broadcast, then the request, and then that
>> I've dropped the connection.
> 
> This leads me to a potentially interesting question: where's the
> buffering?  If there's a bus with lots of untrusted clients and one of
> them broadcasts data faster than all receivers can process it, where
> does it go?
> 
> At least with a userspace solution, it's clear what the OOM killer
> should kill when this happens.  Unless it's PID 1.  Sigh.

It may be useful to do the buffering (and general interception
of any message that cannot be delivered) in a userspace program.

Not only to get the buffers out of the kernel and into swappable
memory, but also so people could re-use the same infrastructure
for things like cluster communication (or communication between
different containers) - the userspace daemons could take care of
routing messages to and from the outside.

They could also be useful to keep some of the policy stuff
outside of the kernel, if only to ensure that the kernel side
policy is not set in stone, and people can do things differently
in the future if they want to.


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:55                               ` Greg Kroah-Hartman
@ 2015-04-15 21:55                                 ` One Thousand Gnomes
  0 siblings, 0 replies; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 21:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Steven Rostedt, Borislav Petkov, Richard Weinberger,
	Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015 19:55:15 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:

> On Wed, Apr 15, 2015 at 01:20:37PM -0400, Steven Rostedt wrote:
> > > I don't know too many other kernel features/drivers that have taken this
> > > long, or done this "slowly", do you?
> > 
> > What other features/drivers that you know introduce a major new IPC
> > user space interface that will be a core component of the system?
> 
> We've been merging these about one every other kernel release for a
> while now.  Look at the drivers/misc/mic/ 

For a single specific piece of hardware, not a general API that by your
own admission will effectively be mandatory.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:31                     ` Greg Kroah-Hartman
  2015-04-15 18:04                       ` Steven Rostedt
@ 2015-04-15 21:56                       ` One Thousand Gnomes
  2015-04-15 22:11                         ` Andy Lutomirski
  1 sibling, 1 reply; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 21:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Steven Rostedt, Richard Weinberger, Andy Lutomirski, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015 19:31:45 +0200
> Don't make idle comments, the tty layer is far more complex and larger
> than the kdbus code, with much nastier issues and problems.  And we
> handle that just fine :)

The tty layer is the way it is because of design decisions dating back 20
years that were (with hindsight) wrong coupled with the fact that POSIX
took a lot of the behavioural guarantees from an armwaving claim about
what Unix(tm) implemented without thinking about how to implement them
(as far as I can tell - given many of the guarantees are broken in Unix!)

> I'll again refer to ALSA here, no one writes a "raw" ALSA program, they
> all use the library to interact with the kernel.  Do that here, there
> are wonderful dbus libraries out there, for all languages.  Use them
> instead.

Agreed entirely - I don't disagree that we need a fast messaging layer.
The question is what bits belong in kernel. Go wants one, JMS wants one,
porting from stuff like QNX wants one (although they use the POSIX API
on QNX), MPI wants one (but with some useful and subtly different
semantics), various embedded things from tiny uKernels want one.

The question is what the kernel bit should actually look like, and how
many we need.

My guess is that we actually have three of the big use cases covered

- futexes and shared memory cover the tiny uKernel emulation bits (and on
  a lawnmower engine sized ARM thats probably the only way to get the
  speed approaching that of a tiny rtos)
- posix queues cover things like QNX porting
- publish/subscribe - via tmpfs

but we don't cover

- multicasting
- some types of credential and authority passing
- scatter/gather without excessive userspace wakes
 
> > Is there a reason that this patch must go in this merge window?
> 
> What makes this merge window any different from any other?  Again, I
> explained why I asked it to be merged at this point in time.  If people
> have technical issues with it, I'll be more than glad to work them out
> and merge it later, there's no "hard and fast deadline" anyone is asking
> for here.

The problem I have is that every time someone points out a fundamental
design issue you simply say "Why haven't you reviewed 13,000 lines of
code".

I haven't given it an in depth review for the same reason as if someone
posted 13,000 lines of "I've got an awesome new file system which uses a
FAT and 8.3 file names". There's some more pressing concerns to sort
first. The fact it's complex and hard to follow also doesn't encourage
review.

And the fact Al tried to read it and is asking for help really worries
me 8)

> 
> > Having something this controversial take place during the merge window
> > suggests its a bit premature to push in now.
> 
> "take place"?  Have you been ignoring these patches posted numerous
> times for many months?  This is the point in time to ask for code to be
> merged, just like any other code, nothing is special here.

Well - you've asked. I see two NACKs from people with great taste. So I
think the next step is to defer trying to submit it and work through the
fact that Al can't follow the locking, and other people don't believe the
security model is maintainable.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 20:22                   ` Andy Lutomirski
  2015-04-15 20:41                     ` Al Viro
  2015-04-15 21:07                     ` Rik van Riel
@ 2015-04-15 21:58                     ` Havoc Pennington
  2015-04-16 13:13                     ` Tom Gundersen
  3 siblings, 0 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-15 21:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Rik van Riel, One Thousand Gnomes, Greg Kroah-Hartman,
	Jiri Kosina, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 4:22 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> This leads me to a potentially interesting question: where's the
> buffering?  If there's a bus with lots of untrusted clients and one of
> them broadcasts data faster than all receivers can process it, where
> does it go?
>
> At least with a userspace solution, it's clear what the OOM killer
> should kill when this happens.  Unless it's PID 1.  Sigh.
>

There's the history and there's the probably-should-happen. I'm sure
this can be improved.

What I think should probably happen is:

 - if a client is trying to send a message and the bus's incoming
buffer from that client is full, the bus should stop reading (forcing
the client to do its own buffering).
 - if a client is not consuming messages fast enough and the bus's
outgoing buffer to that client fills up, the client should be
disconnected.

This would essentially copy how the X server works (again).

The original userspace implementation has configurable buffer size
limits and also limits on resources (such as number of connections and
match rules) used by a single user, but I don't think it does the
right things when limits are reached.

When the incoming queue is full for a client, I'm not sure whether it
stops reading from that client or sends the client errors, I don't
remember.

When the outgoing-from-the-daemon queue is full (a client isn't
reading messages fast enough), if I remember right messages to that
client are dropped with an error reply to the sender - this error
probably gets ignored much of the time in practice, but in theory the
sender could retry.

A full outgoing queue for one client doesn't affect other clients, who
are still able to receive messages. For broadcast messages, a full
queue means a client will miss those broadcasts.

Disconnecting might be better than this drop-the-message behavior,
because clients could then assume that *either* they got all messages
that were broadcast, *or* they got disconnected - they won't ever
silently miss broadcasts and end up in a weird confused state.

Xserver does this - if I'm reading the code correctly just now
(xserver/os/io.c, FlushClient()), it buffers outgoing messages until
realloc fails, and then it disconnects the client.

If X didn't do this, then clients could miss events and become
confused about the state of the server.  The same will often apply in
dbus scenarios.

In practice right now APIs are designed and limits are configured to
try to avoid ever hitting the limits (unless something is malicious or
badly broken), because if you hit them things go to hell - much like
running out of memory, or hitting file descriptor limits.

Disconnecting slow-reading clients would probably improve this; the
full buffer would be instantly freed, and the client could reconnect
and re-establish all state it cares about, if it wants to. So it might
gracefully recover sometimes, if the problem was transient.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 16:44                 ` Havoc Pennington
  2015-04-15 18:16                   ` Steven Rostedt
  2015-04-15 20:22                   ` Andy Lutomirski
@ 2015-04-15 22:08                   ` One Thousand Gnomes
  2015-04-16 13:14                     ` Daniel Mack
  2 siblings, 1 reply; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 22:08 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Rik van Riel, Greg Kroah-Hartman, Jiri Kosina, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	Tom Gundersen, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

> When trying to split apart problems, for dbus it's important to keep
> ordering guarantees.

Yes I assumed that - minus disconnection/reconnect and running out of
queue space. Some users also want priority queueing (with or without the
guarantee for the same priority). Many of the other systems that can use
a fast multicast messaging system have priority queues - which is one
reason the existing POSIX messaging has priority.

> That is, with dbus if I send a broadcast message, then send a unicast
> request to another client, then drop the connection causing the bus to
> broadcast that I've dropped; then the other client will see those
> things in that order - the broadcast, then the request, and then that
> I've dropped the connection.

That's a simple matter of refcounting the buffers 8). I'm not really
concerned about the low level queue side of things. The proposed
implementation looks horribly convoluted for what the sk_buff layer can
already do standing on one leg. We know how to implement that part
cleanly, and its probably not hard to nail onto AF_UNIX or to expand
posix message queues to provide that service (and maybe then even
convince POSIX about it)

If it was just "here's a general purpose multicast message service" in a
small clean chunk of code I'd be cheering it into the tree. Even if you
need complicated filter rules because we can use EBPF to allow the client
library to do really sophisticated filtering and avoid wakeups for noise.

It's the complexity, the attachment to a lot of state in kernel and the
fact it doesn't appear to solve the general purpose problems that bothers
me.

> By pushing everything through one queue, dbus is trying to reduce the
> number of codepaths in applications. Apps have a lot of new problems
> to solve if messages get their order scrambled.

And I assume any user space solution for that purpose would end up
re-ordering messages if they could get shuffled so its

> (dbus does NOT guarantee order across multiple clients, of course -
> there's no guarantee that all clients get the broadcast, before anyone
> gets the next message - each client has its own buffer on both read
> and write. The ordering is only with respect to each client's message
> stream.)
> 
> Ordering is vital for tracking state, because if you're sending out
> events to describe changes in state, the order of those changes is
> important.

Most of the time IMHO you don't want to listen to changes in state, you
want to notice that the state wasn't the value it was before and adapt.

> Of course there are more complex ways to handle this over in
> distributed-systems-world.

And publish/subscribe models - which for certain uses scale better, are
easier to make reliable and avoid a lot of the mess.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 21:56                       ` One Thousand Gnomes
@ 2015-04-15 22:11                         ` Andy Lutomirski
  2015-04-15 22:18                           ` Al Viro
  2015-04-16 10:31                           ` Daniel Mack
  0 siblings, 2 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-15 22:11 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 2:56 PM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
> On Wed, 15 Apr 2015 19:31:45 +0200
>> Don't make idle comments, the tty layer is far more complex and larger
>> than the kdbus code, with much nastier issues and problems.  And we
>> handle that just fine :)
>
> The tty layer is the way it is because of design decisions dating back 20
> years that were (with hindsight) wrong coupled with the fact that POSIX
> took a lot of the behavioural guarantees from an armwaving claim about
> what Unix(tm) implemented without thinking about how to implement them
> (as far as I can tell - given many of the guarantees are broken in Unix!)
>
>> I'll again refer to ALSA here, no one writes a "raw" ALSA program, they
>> all use the library to interact with the kernel.  Do that here, there
>> are wonderful dbus libraries out there, for all languages.  Use them
>> instead.
>
> Agreed entirely - I don't disagree that we need a fast messaging layer.
> The question is what bits belong in kernel. Go wants one, JMS wants one,
> porting from stuff like QNX wants one (although they use the POSIX API
> on QNX), MPI wants one (but with some useful and subtly different
> semantics), various embedded things from tiny uKernels want one.
>
> The question is what the kernel bit should actually look like, and how
> many we need.
>
> My guess is that we actually have three of the big use cases covered
>
> - futexes and shared memory cover the tiny uKernel emulation bits (and on
>   a lawnmower engine sized ARM thats probably the only way to get the
>   speed approaching that of a tiny rtos)
> - posix queues cover things like QNX porting
> - publish/subscribe - via tmpfs
>
> but we don't cover
>
> - multicasting
> - some types of credential and authority passing
> - scatter/gather without excessive userspace wakes

I would really like to see a very lightweight capability-based
messaging system.  By "capability-based" I don't mean Linux
capabilities.  I mean that a user program could give some very
lightweight token to a peer authorizing that peer to use some service
(by reference to the same token), and the peer could pass it on to
other peers as an introduction mechanism.  (Search for
"capability-based security".)

This is functionally identical to passing AF_UNIX socket fds over
SCM_RIGHTS, but I want something much lighter weight.

Also, getting the really high performance stuff right would be nice.
Binder has one thing going for it (IIRC -- I've talked about it to
some of the authors, but I've never so much as glanced at the code):
it has a primitive to send and wait for a reply.  This reduces the
load on scheduler.

I wish kdbus were blazingly fast, but I don't think it is :(  I think
the bar should be either similar performance to (peer-to-peer) AF_UNIX
or something possibly more complex but considerably faster.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:11                         ` Andy Lutomirski
@ 2015-04-15 22:18                           ` Al Viro
  2015-04-15 22:28                             ` Andy Lutomirski
  2015-04-16 10:31                           ` Daniel Mack
  1 sibling, 1 reply; 316+ messages in thread
From: Al Viro @ 2015-04-15 22:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt,
	Richard Weinberger, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:

> This is functionally identical to passing AF_UNIX socket fds over
> SCM_RIGHTS, but I want something much lighter weight.

Most of the weight in SCM_RIGHTS comes from the fact that you can
pass AF_UNIX sockets over it, which requires a garbage collector.
Exclude that and suddenly it becomes very cheap...

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:59                 ` Austin S Hemmelgarn
  2015-04-15 18:04                   ` Rik van Riel
@ 2015-04-15 22:22                   ` One Thousand Gnomes
  2015-04-16 16:02                     ` Havoc Pennington
  2015-04-16 16:37                     ` Robert Schwebel
  2015-04-21 16:54                   ` Diego Viola
  2 siblings, 2 replies; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-15 22:22 UTC (permalink / raw)
  To: Austin S Hemmelgarn
  Cc: Greg Kroah-Hartman, Al Viro, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

> The reason that 'everyone who works in this area' adopted is not as much 
> that the design is sound (I'm not arguing whether it is or isn't in this 
> case) as it is that none of them could come up with anything better.

Actually most message passing code uses things like JMS and the various
MQ libraries. Most IoT uses things other than dbus, small deep embedded
never uses dbus.

In the desktop space dbus wins because its very very easy to use and by
network effects. Everything else related already talks via dbus, so you
are going to have to talk dbus anyway to get anything done.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:18                           ` Al Viro
@ 2015-04-15 22:28                             ` Andy Lutomirski
  2015-04-15 22:48                               ` Al Viro
  0 siblings, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-15 22:28 UTC (permalink / raw)
  To: Al Viro
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt,
	Richard Weinberger, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:
>
>> This is functionally identical to passing AF_UNIX socket fds over
>> SCM_RIGHTS, but I want something much lighter weight.
>
> Most of the weight in SCM_RIGHTS comes from the fact that you can
> pass AF_UNIX sockets over it, which requires a garbage collector.
> Exclude that and suddenly it becomes very cheap...

I should have been more specific.  I don't mean the performance of
SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds
around, each with their socket data structures and buffers.

I think that dbus could be quite efficiently implemented with a
userspace daemon that just introduces peers to each other, but the fd
explosion could be rather bad for some use cases.

I'll be the first to admit that I don't have a clean API in mind.
There was a lightweight fd proposal way back when, but it never went
anywhere, and it might not be suitable anyway.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:28                             ` Andy Lutomirski
@ 2015-04-15 22:48                               ` Al Viro
  2015-04-15 22:54                                 ` Andy Lutomirski
  2015-04-15 22:56                                 ` Eric Dumazet
  0 siblings, 2 replies; 316+ messages in thread
From: Al Viro @ 2015-04-15 22:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt,
	Richard Weinberger, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:
> >
> >> This is functionally identical to passing AF_UNIX socket fds over
> >> SCM_RIGHTS, but I want something much lighter weight.
> >
> > Most of the weight in SCM_RIGHTS comes from the fact that you can
> > pass AF_UNIX sockets over it, which requires a garbage collector.
> > Exclude that and suddenly it becomes very cheap...
> 
> I should have been more specific.  I don't mean the performance of
> SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds
> around, each with their socket data structures and buffers.
> 
> I think that dbus could be quite efficiently implemented with a
> userspace daemon that just introduces peers to each other, but the fd
> explosion could be rather bad for some use cases.
> 
> I'll be the first to admit that I don't have a clean API in mind.
> There was a lightweight fd proposal way back when, but it never went
> anywhere, and it might not be suitable anyway.

Wait, are you talking about the overhead of descriptors used for capability
tokens (essentially zero - one system-wide struct file per capability +
one pointer in descriptor table of anyone who holds it + two bits in
bitmaps in the sam descriptor tables) or about the overhead of descriptors
used to send/receive those over?  The latter don't have to be sockets
at all - they could bloody well be files on some ipcfs, or character device,
or FIFOs, etc.

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:48                               ` Al Viro
@ 2015-04-15 22:54                                 ` Andy Lutomirski
  2015-04-15 23:27                                   ` Al Viro
  2015-04-15 22:56                                 ` Eric Dumazet
  1 sibling, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-15 22:54 UTC (permalink / raw)
  To: Al Viro
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt,
	Richard Weinberger, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 3:48 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:
>> >
>> >> This is functionally identical to passing AF_UNIX socket fds over
>> >> SCM_RIGHTS, but I want something much lighter weight.
>> >
>> > Most of the weight in SCM_RIGHTS comes from the fact that you can
>> > pass AF_UNIX sockets over it, which requires a garbage collector.
>> > Exclude that and suddenly it becomes very cheap...
>>
>> I should have been more specific.  I don't mean the performance of
>> SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds
>> around, each with their socket data structures and buffers.
>>
>> I think that dbus could be quite efficiently implemented with a
>> userspace daemon that just introduces peers to each other, but the fd
>> explosion could be rather bad for some use cases.
>>
>> I'll be the first to admit that I don't have a clean API in mind.
>> There was a lightweight fd proposal way back when, but it never went
>> anywhere, and it might not be suitable anyway.
>
> Wait, are you talking about the overhead of descriptors used for capability
> tokens (essentially zero - one system-wide struct file per capability +
> one pointer in descriptor table of anyone who holds it + two bits in
> bitmaps in the sam descriptor tables) or about the overhead of descriptors
> used to send/receive those over?  The latter don't have to be sockets
> at all - they could bloody well be files on some ipcfs, or character device,
> or FIFOs, etc.

Huh, interesting.

I was imagining that each of a server's peers (capability holders)
would have a fresh struct file, but maybe this wouldn't be needed at
all.  You'd still need a way to get replies to your request, but the
API could just as easily be:

int send_to_capability(int dest, int source, const void *data, size_t len, ...);

where dest would be the destination's fd and source would be whatever
receive queue I expect the response on.

So maybe this is feasible.  It doesn't solve broadcasts, but dbus
unicast could easily layer over a facility like this and the context
switch problem would go away for unicast.

Heck, I'd use it for my own proprietary stuff, too.  It would be way
easier than the absurd tangle of socketpairs I currently use.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:48                               ` Al Viro
  2015-04-15 22:54                                 ` Andy Lutomirski
@ 2015-04-15 22:56                                 ` Eric Dumazet
  1 sibling, 0 replies; 316+ messages in thread
From: Eric Dumazet @ 2015-04-15 22:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Andy Lutomirski, One Thousand Gnomes, Greg Kroah-Hartman,
	Steven Rostedt, Richard Weinberger, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, 2015-04-15 at 23:48 +0100, Al Viro wrote:
> On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote:
> > On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:
> > >
> > >> This is functionally identical to passing AF_UNIX socket fds over
> > >> SCM_RIGHTS, but I want something much lighter weight.
> > >
> > > Most of the weight in SCM_RIGHTS comes from the fact that you can
> > > pass AF_UNIX sockets over it, which requires a garbage collector.
> > > Exclude that and suddenly it becomes very cheap...
> > 
> > I should have been more specific.  I don't mean the performance of
> > SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds
> > around, each with their socket data structures and buffers.
> > 
> > I think that dbus could be quite efficiently implemented with a
> > userspace daemon that just introduces peers to each other, but the fd
> > explosion could be rather bad for some use cases.
> > 
> > I'll be the first to admit that I don't have a clean API in mind.
> > There was a lightweight fd proposal way back when, but it never went
> > anywhere, and it might not be suitable anyway.
> 
> Wait, are you talking about the overhead of descriptors used for capability
> tokens (essentially zero - one system-wide struct file per capability +
> one pointer in descriptor table of anyone who holds it + two bits in
> bitmaps in the sam descriptor tables) or about the overhead of descriptors
> used to send/receive those over?  The latter don't have to be sockets
> at all - they could bloody well be files on some ipcfs, or character device,
> or FIFOs, etc.

This kind of remind me futex : From an apparent simple idea we got to
the point of having more than 3000 lines of code in kernel/futex.c

It is sad that af_unix was chosen to support fd passing in the first
place. This is serious DOS vector.



^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:54                                 ` Andy Lutomirski
@ 2015-04-15 23:27                                   ` Al Viro
  2015-04-16  0:47                                     ` Andy Lutomirski
  0 siblings, 1 reply; 316+ messages in thread
From: Al Viro @ 2015-04-15 23:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt,
	Richard Weinberger, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 03:54:10PM -0700, Andy Lutomirski wrote:
> Huh, interesting.
> 
> I was imagining that each of a server's peers (capability holders)
> would have a fresh struct file, but maybe this wouldn't be needed at
> all.  You'd still need a way to get replies to your request, but the
> API could just as easily be:
> 
> int send_to_capability(int dest, int source, const void *data, size_t len, ...);
> 
> where dest would be the destination's fd and source would be whatever
> receive queue I expect the response on.
> 
> So maybe this is feasible.  It doesn't solve broadcasts, but dbus
> unicast could easily layer over a facility like this and the context
> switch problem would go away for unicast.
> 
> Heck, I'd use it for my own proprietary stuff, too.  It would be way
> easier than the absurd tangle of socketpairs I currently use.

BTW, the main issue with AF_UNIX passing is that recepient isn't asleep
awaiting for descriptors - they are thrown by sender at whoever's receiving
and sit there until somebody gets around to picking them.

_IF_ we had
client: I want a desciptor <goes to sleep, interruptibly>
kernel: assign it a sequence number
server: sees request (including sequence number)
server: give this fd to originator of request #N
kernel: check if originator is still there, insert the damn thing into their
descriptor table if they still are and return the obtained number
or
server: tell the originator of request #N to fuck off
kernel: check if originator is still there and gleefully pass the "fuck off" if
they still are

we wouldn't have the in-flight state at all, and there goes the garbage
collection shite.  With some elaboration, it could even carry the
authentication traffic - "fuck off" might be "answer this challenge", with
the next "I want a descriptor" carrying reply...

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 23:27                                   ` Al Viro
@ 2015-04-16  0:47                                     ` Andy Lutomirski
  2015-04-16  1:04                                       ` Al Viro
  0 siblings, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-16  0:47 UTC (permalink / raw)
  To: Al Viro
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt,
	Richard Weinberger, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 4:27 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Apr 15, 2015 at 03:54:10PM -0700, Andy Lutomirski wrote:
>> Huh, interesting.
>>
>> I was imagining that each of a server's peers (capability holders)
>> would have a fresh struct file, but maybe this wouldn't be needed at
>> all.  You'd still need a way to get replies to your request, but the
>> API could just as easily be:
>>
>> int send_to_capability(int dest, int source, const void *data, size_t len, ...);
>>
>> where dest would be the destination's fd and source would be whatever
>> receive queue I expect the response on.
>>
>> So maybe this is feasible.  It doesn't solve broadcasts, but dbus
>> unicast could easily layer over a facility like this and the context
>> switch problem would go away for unicast.
>>
>> Heck, I'd use it for my own proprietary stuff, too.  It would be way
>> easier than the absurd tangle of socketpairs I currently use.
>
> BTW, the main issue with AF_UNIX passing is that recepient isn't asleep
> awaiting for descriptors - they are thrown by sender at whoever's receiving
> and sit there until somebody gets around to picking them.
>
> _IF_ we had
> client: I want a desciptor <goes to sleep, interruptibly>
> kernel: assign it a sequence number
> server: sees request (including sequence number)
> server: give this fd to originator of request #N
> kernel: check if originator is still there, insert the damn thing into their
> descriptor table if they still are and return the obtained number
> or
> server: tell the originator of request #N to fuck off
> kernel: check if originator is still there and gleefully pass the "fuck off" if
> they still are
>
> we wouldn't have the in-flight state at all, and there goes the garbage
> collection shite.  With some elaboration, it could even carry the
> authentication traffic - "fuck off" might be "answer this challenge", with
> the next "I want a descriptor" carrying reply...

I wonder if we could get away with having the receiver pre-allocate
some placeholder fds and then have the kernel replace a placeholder
with a passed fd immediately when the fd is sent and enqueue *that* in
the cmsg data.  If you send an fd to someone who hasn't assigned any
placeholders to the receiving socket, then you get an error.

To keep the accounting sane, a placeholder would be a bona fide fd,
presumably a reference to a global placeholder anon_inode.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16  0:47                                     ` Andy Lutomirski
@ 2015-04-16  1:04                                       ` Al Viro
  2015-04-16  5:53                                         ` Andy Lutomirski
  0 siblings, 1 reply; 316+ messages in thread
From: Al Viro @ 2015-04-16  1:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: One Thousand Gnomes, Greg Kroah-Hartman, Steven Rostedt,
	Richard Weinberger, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Tom Gundersen, Jiri Kosina,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 05:47:18PM -0700, Andy Lutomirski wrote:

> I wonder if we could get away with having the receiver pre-allocate
> some placeholder fds and then have the kernel replace a placeholder
> with a passed fd immediately when the fd is sent and enqueue *that* in
> the cmsg data.  If you send an fd to someone who hasn't assigned any
> placeholders to the receiving socket, then you get an error.

*UGH*

It's a really bad idea.  The thing is, descriptor table that isn't shared
is assumed to be unchanged.  So when fdget() looks a file up, it doesn't
have to bump its refcount - the reference in descriptor table itself will
stay.  Conversely, fdput() doesn't have to drop it in such case (we encode
whether we need to drop into struct fd returned by fdget() and passed to
fdput()).

That relies on no third-party modifications of descriptor table and yes,
the effect _is_ noticable - playing with struct file refcounts does result
in considerable overhead.

If recepient sits in "gimme a descriptor", we are fine - if descriptor table
was shared, the other users would be doing full refcount song and dance and
if it wasn't, recepient is the sole user _and_ it isn't betwee fdget() and
fdput() at the moment.  With your "replace the dummies when sending" trick
we break all of that - we don't know what the recepient is doing at the moment
and for all we know they might be in the middle of something like e.g.
fstat() on your placeholder.  With rather unpleasant effects...

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16  1:04                                       ` Al Viro
@ 2015-04-16  5:53                                         ` Andy Lutomirski
  0 siblings, 0 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-16  5:53 UTC (permalink / raw)
  To: Al Viro
  Cc: Arnd Bergmann, linux-kernel, Jiri Kosina, Andrew Morton,
	Daniel Mack, One Thousand Gnomes, Linus Torvalds, Tom Gundersen,
	Richard Weinberger, Steven Rostedt, Greg Kroah-Hartman,
	David Herrmann, Eric W. Biederman, Djalal Harouni

On Apr 15, 2015 6:04 PM, "Al Viro" <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Apr 15, 2015 at 05:47:18PM -0700, Andy Lutomirski wrote:
>
> > I wonder if we could get away with having the receiver pre-allocate
> > some placeholder fds and then have the kernel replace a placeholder
> > with a passed fd immediately when the fd is sent and enqueue *that* in
> > the cmsg data.  If you send an fd to someone who hasn't assigned any
> > placeholders to the receiving socket, then you get an error.
>
> *UGH*
>
> It's a really bad idea.  The thing is, descriptor table that isn't shared
> is assumed to be unchanged.  So when fdget() looks a file up, it doesn't
> have to bump its refcount - the reference in descriptor table itself will
> stay.  Conversely, fdput() doesn't have to drop it in such case (we encode
> whether we need to drop into struct fd returned by fdget() and passed to
> fdput()).
>
> That relies on no third-party modifications of descriptor table and yes,
> the effect _is_ noticable - playing with struct file refcounts does result
> in considerable overhead.
>
> If recepient sits in "gimme a descriptor", we are fine - if descriptor table
> was shared, the other users would be doing full refcount song and dance and
> if it wasn't, recepient is the sole user _and_ it isn't betwee fdget() and
> fdput() at the moment.  With your "replace the dummies when sending" trick
> we break all of that - we don't know what the recepient is doing at the moment
> and for all we know they might be in the middle of something like e.g.
> fstat() on your placeholder.  With rather unpleasant effects...

Hmm.

I don't love the special blocking call either -- it break polling loops.

We could have the existence of a placeholderfd count as an extra
reference to the descriptor table, with the associated performance
hit.  Or we could allow each placeholderfd to collect one received fd
but not actually switch over.  The latter is ugly and still has minor
DoS issues -- we'd have to prevent placeholderfds from being passed
through this mechanism or SCM_RIGHTS.

But wait... what about an evil trick?  What if all placeholderfds are
the *same* struct file and that struct file is never deleted?  Then
fdget on a placeholderfd is safe, since it's implicitly pinned.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:33                               ` Greg Kroah-Hartman
  2015-04-15 18:06                                 ` Steven Rostedt
@ 2015-04-16  8:43                                 ` Jiri Kosina
  1 sibling, 0 replies; 316+ messages in thread
From: Jiri Kosina @ 2015-04-16  8:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Steven Rostedt, Borislav Petkov, Richard Weinberger,
	Andy Lutomirski, Al Viro, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:

> > I originally didn't want to comment on this, but now that you are 
> > making this argument for 3rd or 4th time, I can't really resist. What 
> > exactly are you trying to "prove" by the 13k-lines argument?
> > 
> > mm/vmscan.c is less that 4k lines. Does that sole fact mean that the whole 
> > memory reclaim is trivial to review?
> 
> I'm trying to say that it's not a ton of code.  lines of code are of
> course not a valid way to judge complexity, and I'm not trying to say
> that.  I am trying to point out that it isn't "huge" by comparing it to
> other chunks of code that we all know and love.
> 
> We merge subsystems with new userspace apis that are large than this all
> the time.  I'm trying to say this isn't something "unusual" at all.

I agree with you on that point. Merging 13k lines isn't a big deal, we do 
that all the time.

But I don't think anyone in this (or previous) thread brought up the 
number of lines of kdbus as an unltimate argument for questioning or even 
NACKing it.

So I completely fail to see why this is so relevant that you keep 
repeating it.

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:11                         ` Andy Lutomirski
  2015-04-15 22:18                           ` Al Viro
@ 2015-04-16 10:31                           ` Daniel Mack
  2015-04-16 12:02                             ` Tom Gundersen
  1 sibling, 1 reply; 316+ messages in thread
From: Daniel Mack @ 2015-04-16 10:31 UTC (permalink / raw)
  To: Andy Lutomirski, One Thousand Gnomes
  Cc: Greg Kroah-Hartman, Steven Rostedt, Richard Weinberger, Al Viro,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Tom Gundersen, Jiri Kosina, linux-kernel, David Herrmann,
	Djalal Harouni

On 04/16/2015 12:11 AM, Andy Lutomirski wrote:
> Also, getting the really high performance stuff right would be nice.
> Binder has one thing going for it (IIRC -- I've talked about it to
> some of the authors, but I've never so much as glanced at the code):
> it has a primitive to send and wait for a reply.  This reduces the
> load on scheduler.

kdbus has the same thing, we call it a synchronous reply. That concept
is actually comprehensively explained in kdbus.message(7):

  By default, all calls to kdbus are considered asynchronous,
  non-blocking. However, as there are many use cases that need
  to wait for a remote peer to answer a method call, there's a
  way to send a message and wait for a reply in a synchronous
  fashion. This is what the KDBUS_SEND_SYNC_REPLY controls. The
  KDBUS_CMD_SEND ioctl will block until the reply has arrived,
  the timeout limit is reached, in case the remote connection
  was shut down, or if interrupted by a signal before any reply;
  see signal(7). The offset of the reply message in the sender's
  pool is stored in in offset_reply when the ioctl has returned
  without error. Hence, there is no need for another KDBUS_CMD_RECV
  ioctl or anything else to receive the reply.



Thanks,
Daniel


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 10:31                           ` Daniel Mack
@ 2015-04-16 12:02                             ` Tom Gundersen
  2015-04-16 12:15                               ` Olaf Hering
  2015-04-21 16:36                               ` Eric W. Biederman
  0 siblings, 2 replies; 316+ messages in thread
From: Tom Gundersen @ 2015-04-16 12:02 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 15, 2015 at 2:09 PM, Jiri Kosina <jkosina@suse.cz> wrote:
> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
>
>> 'systemctl reboot' calls a bunch of other things to determine if you
>> have local access to the machine, or permissions to reboot the machine
>> (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
>> and then, it decides to reboot or not.  That happens today, right?  I
>> don't understand the argument here.
>
> And what exactly is the argument that this is the way it should be
> implemnted?
>
> Why can't it just rely on the kernel to provide final answer to "to reboot
> or not to reboot, that is the question"?
>
> At the end of the day, it's the kernel that decides whether it will really
> ultimately ask the platform to reboot.
>
> If, for whatever reason (which might be completely invisible to userspace)
> kernel decides not to do so, userspace has to be able to recover from such
> failure in any case.

This is not how shutting down a general purpose operating system
works. If a system is shut down, all user sessions are terminated, all
services are stopped in the right order, all remaining processes
killed, all file systems are unmounted, all storage devices
disassembled, and so on. All this is implemented entirely in userspace
and involves a number of complex transitions from the normal init
system, to a shutdown PID 1 process and finally a transition back to
the initial ramdisk so that we can unmount the root file system even.
After all that is done, in the right order, following dependencies,
while enforcing timeouts, then the very last step is actually the
reboot() system call that then brings the kernel to a halt, and
possibly turns off power.

Thus I don't see how your suggestion can be applied in any way to how
system shutdown works: the shutdown procedure includes these
non-trivial preparation steps described above, and it is essential
that this preparation is not begun unless the client requesting it
actually has sufficient rights to do so. Or to put this another way:
if the system went all the way down, so that everything is killed,
unmounted, disassembled, to the point even that we transitioned away
from the root file system, then the reboot() system call is really
just the tiniest bit of it. And you should not be able to get there if
you originally didn't even possess the capability to execute that last
step...

Moreover, the daemon performing the shutdown tasks is necessarily
always privileged enough to do so, so calling into the kernel and see
what happens is completely the wrong thing to do (it would simply
succeed). What matters is if the client calling the daemon is
sufficiently privileged. If the client has the capabilites necessary
to call the reboot syscall directly, it makes no sense to disallow
them from doing a clean reboot. It would be like giving someone access
to pull the power plug, but not allow them to shutdown the machine
cleanly.

To conclude, the kernel makes the decision for allowing reboot() to
succeed based on CAP_SYS_BOOT, so when we decide whether or not to
perform the preparation steps, we really must also use CAP_SYS_BOOT.
If we are more restrictive, it does not gain us anything as people
with CAP_SYS_BOOT can just circumvent our logic and "pull the plug" by
calling reboot() directly. If we are less restrictive and for instance
check for uid==0 it would essentially mean that we have added a way to
circumvent the dropping of CAP_SYS_BOOT.

Cheers,

Tom

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 18:12                     ` James Bottomley
@ 2015-04-16 12:13                       ` David Herrmann
  2015-04-17 19:27                         ` James Bottomley
  0 siblings, 1 reply; 316+ messages in thread
From: David Herrmann @ 2015-04-16 12:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt, John Stoffel,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, Djalal Harouni, Paul E. McKenney

Hi

On Wed, Apr 15, 2015 at 8:12 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> For me the biggest issue is the container problem: it's really hard to
> containerise kdbus because of the stateful nature of the protocol and
> the fact that it has a well known system bus.  Separation into domains
> works for OS containers, but application containers need more fluidity.
> It's not unlike the same problem on windows: Windows application
> containers are very difficult to do because the global registry means
> that OLE handlers all have to run inside your container as well
> (effectively making it an OS container).  I'm sure, since we already
> have a lot of containers people going to plumbers, that we can get them
> to turn up for the discussion.

kdbus actually works very well in OS containers that mount a new
kdbusfs inside the container. This new instance of kdbus will be
entirely seperated from any other on the system. We've designed it
that way especially with OS containers in mind. This is explained in
kdbus.fs(7). It's very similar to devpts' container support, where you
mount a new instance of devpts into each container instance you run.

For Docker-style (i.e. app-focused) containers, it's a more complex
story. kdbus will not solve this for you, but at least one thing
deserves being mentioned: for this kind of sandboxing kdbus certainly
makes things *easier*, compared to dbus1. Why? because the kernel
gains a notion of individual messages and method call transactions,
something that is completely unavailable if you stick to dbus1 where
all the kernel sees is a raw stream of AF_UNIX/SOCK_STREAM bytes. In
fact, kdbus as it is right now even contains minimal but explicit
support for sandboxing, by allowing creation of multiple bus endpoints
to the same bus that carry additional, more restrictive policy.

Thanks
David

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 12:02                             ` Tom Gundersen
@ 2015-04-16 12:15                               ` Olaf Hering
  2015-04-16 12:43                                 ` Harald Hoyer
  2015-04-21 16:36                               ` Eric W. Biederman
  1 sibling, 1 reply; 316+ messages in thread
From: Olaf Hering @ 2015-04-16 12:15 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Thu, Apr 16, Tom Gundersen wrote:

> to a shutdown PID 1 process and finally a transition back to
> the initial ramdisk so that we can unmount the root file system even.

Is that wishful thinking or actually implemented somewhere?

Olaf

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 12:15                               ` Olaf Hering
@ 2015-04-16 12:43                                 ` Harald Hoyer
  0 siblings, 0 replies; 316+ messages in thread
From: Harald Hoyer @ 2015-04-16 12:43 UTC (permalink / raw)
  To: Olaf Hering, Tom Gundersen
  Cc: Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

Am 16.04.2015 um 14:15 schrieb Olaf Hering:
> On Thu, Apr 16, Tom Gundersen wrote:
> 
>> to a shutdown PID 1 process and finally a transition back to
>> the initial ramdisk so that we can unmount the root file system even.
> 
> Is that wishful thinking or actually implemented somewhere?

This is done on any system, which uses dracut and systemd for a long time now.

As SUSE switched to dracut recently, it should be the same as on RHEL-7/Fedora
now.

If /run/initramfs/shutdown exists and is executable,
/usr/lib/systemd/systemd-shutdown switches root to /run/initramfs/ and executes
shutdown.

The shutdown script umounts the old real root (after umounting
/oldroot/{proc,sys,run,dev}), then if the old real root was living on an
assembled device, like mdraid, the device is disassembled and waited for the
device to be clean.

See
http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/99shutdown/shutdown.sh
and for example for mdraid:
http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/90mdraid/md-shutdown.sh

This solved quite a lot of problems for unsynced raids.

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 20:22                   ` Andy Lutomirski
                                       ` (2 preceding siblings ...)
  2015-04-15 21:58                     ` Havoc Pennington
@ 2015-04-16 13:13                     ` Tom Gundersen
  2015-04-16 14:34                       ` Andy Lutomirski
  2015-04-16 19:01                       ` Havoc Pennington
  3 siblings, 2 replies; 316+ messages in thread
From: Tom Gundersen @ 2015-04-16 13:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Havoc Pennington, Rik van Riel, One Thousand Gnomes,
	Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On 04/15/2015 10:22 PM, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <hp@pobox.com> wrote:
>> That is, with dbus if I send a broadcast message, then send a unicast
>> request to another client, then drop the connection causing the bus to
>> broadcast that I've dropped; then the other client will see those
>> things in that order - the broadcast, then the request, and then that
>> I've dropped the connection.
>
> This leads me to a potentially interesting question: where's the
> buffering?  If there's a bus with lots of untrusted clients and one of
> them broadcasts data faster than all receivers can process it, where
> does it go?

The concepts implemented in kdbus are actually quite different from dbus1:

Every connection to the bus has a memory pool assigned to store
incoming messages and variably sized runtime data returned by kdbus.
The pool memory is swappable, backed by a shmem file which is
associated with the bus connection.

Also, broadcasts are opt-in, so you only receive them if you
subscribed for the specific signal. It is either sent by another
userspace task, or by the kernel itself for things like name owner
changes. In order to receive those, a connection must install a match.
By default, no-one will receive any broadcasts.

All types of messages (unicast and broadcast) are directly stored into
a pool slice of the receiving connection, and this slice is not reused
by the kernel until userspace is finished with it and frees it. Hence,
a client which doesn't process its incoming messages will, at some
point, run out of pool space. If that happens for unicast messages,
the sender will get an EXFULL error. If it happens for a multicast
message, all we can do is drop the message, and tell the receiver how
many messages have been lost when it issues KDBUS_CMD_RECV the next
time. There's more on that in kdbus.message(7).

Also note that there is a quota logic in kdbus which protects against
a single connection conducting a DOS against another one. Together
with the policy code, this logic prevents one peer from flooding the
pool of another peer. Communication with a 3rd party is not affected
by this, due to the fair allocation scheme of the pool logic.

All this is explained in detail in kdbus.pool(7), but please let us
know if anything there is unclear.

> At least with a userspace solution, it's clear what the OOM killer
> should kill when this happens.  Unless it's PID 1.  Sigh.

No, if the buffering was done in the sender, the OOM killer would
catch the sending peer, which is of course the wrong thing to do,
because one connection could blow up a task simply by not responding
to the messages it sends. This is the reason why the pool concept was
a design principle in kdbus from the very beginning.

Cheers,

Tom

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:08                   ` One Thousand Gnomes
@ 2015-04-16 13:14                     ` Daniel Mack
  2015-04-16 17:15                       ` One Thousand Gnomes
  0 siblings, 1 reply; 316+ messages in thread
From: Daniel Mack @ 2015-04-16 13:14 UTC (permalink / raw)
  To: One Thousand Gnomes, Havoc Pennington
  Cc: Rik van Riel, Greg Kroah-Hartman, Jiri Kosina, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	Tom Gundersen, linux-kernel, David Herrmann, Djalal Harouni

On 04/16/2015 12:08 AM, One Thousand Gnomes wrote:
>> When trying to split apart problems, for dbus it's important to keep
>> ordering guarantees.
> 
> Yes I assumed that - minus disconnection/reconnect and running out of
> queue space. Some users also want priority queueing (with or without the
> guarantee for the same priority). Many of the other systems that can use
> a fast multicast messaging system have priority queues - which is one
> reason the existing POSIX messaging has priority.

And so does kdbus. By default, strict ordering is enforced when messages
are received, but optionally, that action may be constrained to messages
of a minimal priority. This allows for use cases where timing critical
data is interleaved with control data on the same connection. That's
described in kdbus.message(7), and is also covered by test cases.


Thanks,
Daniel


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 13:13                     ` Tom Gundersen
@ 2015-04-16 14:34                       ` Andy Lutomirski
  2015-04-16 15:01                         ` David Herrmann
  2015-04-16 19:01                       ` Havoc Pennington
  1 sibling, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-16 14:34 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: Havoc Pennington, Rik van Riel, One Thousand Gnomes,
	Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Thu, Apr 16, 2015 at 6:13 AM, Tom Gundersen <teg@jklm.no> wrote:
> On 04/15/2015 10:22 PM, Andy Lutomirski wrote:
>> On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <hp@pobox.com> wrote:
>>> That is, with dbus if I send a broadcast message, then send a unicast
>>> request to another client, then drop the connection causing the bus to
>>> broadcast that I've dropped; then the other client will see those
>>> things in that order - the broadcast, then the request, and then that
>>> I've dropped the connection.
>>
>> This leads me to a potentially interesting question: where's the
>> buffering?  If there's a bus with lots of untrusted clients and one of
>> them broadcasts data faster than all receivers can process it, where
>> does it go?
>
> The concepts implemented in kdbus are actually quite different from dbus1:
>
> Every connection to the bus has a memory pool assigned to store
> incoming messages and variably sized runtime data returned by kdbus.
> The pool memory is swappable, backed by a shmem file which is
> associated with the bus connection.
>
> Also, broadcasts are opt-in, so you only receive them if you
> subscribed for the specific signal. It is either sent by another
> userspace task, or by the kernel itself for things like name owner
> changes. In order to receive those, a connection must install a match.
> By default, no-one will receive any broadcasts.
>
> All types of messages (unicast and broadcast) are directly stored into
> a pool slice of the receiving connection, and this slice is not reused
> by the kernel until userspace is finished with it and frees it. Hence,
> a client which doesn't process its incoming messages will, at some
> point, run out of pool space. If that happens for unicast messages,
> the sender will get an EXFULL error. If it happens for a multicast
> message, all we can do is drop the message, and tell the receiver how
> many messages have been lost when it issues KDBUS_CMD_RECV the next
> time. There's more on that in kdbus.message(7).
>
> Also note that there is a quota logic in kdbus which protects against
> a single connection conducting a DOS against another one. Together
> with the policy code, this logic prevents one peer from flooding the
> pool of another peer. Communication with a 3rd party is not affected
> by this, due to the fair allocation scheme of the pool logic.
>
> All this is explained in detail in kdbus.pool(7), but please let us
> know if anything there is unclear.
>

This is neat, but it sounds like it will potentially add large amounts
of latency under even mild memory pressure.

Whose memcg does the pool use?  If it's the receiver's, and if the
receiver can configure a memcg, then it seems that even a single
receiver could probably cause the sender to block for an unlimited
amount of time.

(And yes, I really hope that some day the cgroupns issues get resolved
and some programs really will be able to create their own cgroups,
even on systemd-using systems using the systemd-blessed
configuration.)

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 14:34                       ` Andy Lutomirski
@ 2015-04-16 15:01                         ` David Herrmann
  2015-04-16 17:04                           ` Andy Lutomirski
  0 siblings, 1 reply; 316+ messages in thread
From: David Herrmann @ 2015-04-16 15:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tom Gundersen, Havoc Pennington, Rik van Riel,
	One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	linux-kernel, Daniel Mack, Djalal Harouni

Hi

On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> Whose memcg does the pool use?

The pool-owner's (i.e., the receiver's).

> If it's the receiver's, and if the
> receiver can configure a memcg, then it seems that even a single
> receiver could probably cause the sender to block for an unlimited
> amount of time.

How? Which of those calls can block? I don't see how that can happen.

Thanks
David

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:22                   ` One Thousand Gnomes
@ 2015-04-16 16:02                     ` Havoc Pennington
  2015-04-16 17:31                       ` David Herrmann
  2015-04-16 16:37                     ` Robert Schwebel
  1 sibling, 1 reply; 316+ messages in thread
From: Havoc Pennington @ 2015-04-16 16:02 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Austin S Hemmelgarn, Greg Kroah-Hartman, Al Viro,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 6:22 PM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
> Actually most message passing code uses things like JMS and the various
> MQ libraries. Most IoT uses things other than dbus, small deep embedded
> never uses dbus.

fwiw, to me it's a mistake to think of dbus as "the same space" as
something like JMS, or even small deep embedded uses.
The use cases and appropriate tradeoffs are different enough that it's
hard for me to think about them as one thing.

If different uses can share some common kernel mechanisms then great,
but one does have to be careful about
one-size-fits-all-actually-fits-nobody.

> In the desktop space dbus wins because its very very easy to use and by
> network effects. Everything else related already talks via dbus, so you
> are going to have to talk dbus anyway to get anything done.

You may agree with me, but to me "easy to use" is necessary to dbus's
utility - it's not a cosmetic feature.  I was on the receiving end of
the Linux desktop bug firehose both pre-dbus and post-dbus, and having
IPC that's easy to use *correctly* means there are fewer bugs in that
firehose. At least, fewer bugs caused by IPC.

Of course, the thing that needs to be easy is the library API; it's OK
if an underlying kernel API is hard, as long as it gives the library
developers what they need to implement the easier API.

It is OK to push complexity onto userspace, but it's a mistake to push
it onto apps (as opposed to libraries that can be gotten right once
for all apps). If you push complexity onto apps you get buggier apps,
because application developers are experts in their app domain but
aren't experts in every underlying platform feature.

Why is dbus relatively easy to use? Some important pieces:

 - the semantic guarantees such as ordering that we've already mentioned
 - completeness - solves locating and tracking other processes, solves
both unicast and broadcast, etc.
 - defines a mapping to objects-with-methods OO model

Can it be even better - for sure.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 22:22                   ` One Thousand Gnomes
  2015-04-16 16:02                     ` Havoc Pennington
@ 2015-04-16 16:37                     ` Robert Schwebel
  2015-04-17 13:45                       ` Greg Kroah-Hartman
  1 sibling, 1 reply; 316+ messages in thread
From: Robert Schwebel @ 2015-04-16 16:37 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Austin S Hemmelgarn, Greg Kroah-Hartman, Al Viro,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, Apr 15, 2015 at 11:22:18PM +0100, One Thousand Gnomes wrote:
> > The reason that 'everyone who works in this area' adopted is not as much
> > that the design is sound (I'm not arguing whether it is or isn't in this
> > case) as it is that none of them could come up with anything better.
>
> Actually most message passing code uses things like JMS and the various
> MQ libraries. Most IoT uses things other than dbus, small deep embedded
> never uses dbus.

For what it's worth: we more and more use dbus for small deep embedded
systems, IoT, loosely coupled industrial control applications etc.

rsc
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 15:01                         ` David Herrmann
@ 2015-04-16 17:04                           ` Andy Lutomirski
  2015-04-17  9:19                             ` Michal Hocko
  0 siblings, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-16 17:04 UTC (permalink / raw)
  To: David Herrmann
  Cc: Tom Gundersen, Havoc Pennington, Rik van Riel,
	One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	linux-kernel, Daniel Mack, Djalal Harouni

On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
> Hi
>
> On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> Whose memcg does the pool use?
>
> The pool-owner's (i.e., the receiver's).
>
>> If it's the receiver's, and if the
>> receiver can configure a memcg, then it seems that even a single
>> receiver could probably cause the sender to block for an unlimited
>> amount of time.
>
> How? Which of those calls can block? I don't see how that can happen.

I admit I don't fully understand memcg, but vfs_iter_write is
presumably going to need to get write access to the target pool page,
and that, in turn, will need that page to exist in memory and to be
writable, which may need to page it in and/or allocate a page.  If
that uses the receiver's memcg (as it should), then the receiver can
make it block.  Even if it doesn't use the receiver's memcg, it can
trigger direct reclaim, I think.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 13:14                     ` Daniel Mack
@ 2015-04-16 17:15                       ` One Thousand Gnomes
  0 siblings, 0 replies; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-16 17:15 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Havoc Pennington, Rik van Riel, Greg Kroah-Hartman, Jiri Kosina,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, linux-kernel, David Herrmann,
	Djalal Harouni

> And so does kdbus. By default, strict ordering is enforced when messages
> are received, but optionally, that action may be constrained to messages
> of a minimal priority. This allows for use cases where timing critical
> data is interleaved with control data on the same connection. That's
> described in kdbus.message(7), and is also covered by test cases.

More to the point "and so do POSIX message queues". They are also a
standard, a cross OS feature and relatively cleanly implemented in
kernel, ditto some classes of socket behaviour are similar and SYS5 IPC
(of which we shall not speak further I hope  8) ). I'm not saying that
they solve the problem but they might avoid some of the complexities.

Filtering is generalizable in Linux with a few lines of code, so rather
than hardcoding dbus semantics EBPF can express pretty much any
uni/multi/broadcast filtering policy rule for dbus or anything else.

I agree entirely with Havoc that the ease of use wants to be preserved
and semantics at the top of the dbus library shoudn't change. Dbus does
have the problem of being too easy to use badly, but that's hard to fix
technically 8)

Alan



^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 16:02                     ` Havoc Pennington
@ 2015-04-16 17:31                       ` David Herrmann
  2015-04-16 20:55                         ` Al Viro
  0 siblings, 1 reply; 316+ messages in thread
From: David Herrmann @ 2015-04-16 17:31 UTC (permalink / raw)
  To: Al Viro
  Cc: Greg Kroah-Hartman, Jiri Kosina, Borislav Petkov,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, Djalal Harouni

Hi

On Wed, Apr 15, 2015 at 2:36 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote:
>
>> I've asked for it, but finding people to review code is hard, as you
>> know.  It's only 13k lines long, smaller than a serial port driver (my
>> unit of code review), so it's not all that big.
>>
>> It's smaller than the USB3 host controller driver as well, and very few
>> people ever reviewed that beast :)
>>
>> > For something that's potentially such a core mechanism as a completely
>> > new, massively-adopted IPC, this does send a warning singal.
>>
>> If you know of a way to force others to review code, please let me know.
>
> Have it in a less nasty state, perhaps?  Random question:
>
> al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock
> ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock);
> ipc/kdbus/node.c:340:   down_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:344:   up_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:444:           down_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:452:           up_write(&kdbus_node_idr_lock);

As Greg said, this is a leftover from times we actually needed a
lookup here. Nice catch, I have a local patch to convert the whole IDR
into an IDA and drop the lock entirely (like kernfs does right now,
for kernfs_node->ino).

> Do you see anything wrong with that?  Or with things like that:
>                 mutex_lock(&pos->lock);
>                 v_pre = atomic_read(&pos->active);
>                 if (v_pre >= 0)
>                         atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
>                 else if (v_pre == KDBUS_NODE_NEW)
>                         atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
>                 mutex_unlock(&pos->lock);
> What are the locking rules for ->active/->waitq/->lock?  Are those the
> outermost thing in the hierarchy?  Or is that dependent on the node location?
> It sure as hell is outside of (at least) ->mmap_sem (by way of
> kdbus_conn_connect() establishing that ->active/->waitq is outside of
> ->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything
> taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone
> brings in a lot...

I'm working on patches to add more comments similar to how we did in
node.c. For now, please see my explanations below:

node->lock is the _innermost_ lock. node->active implements revoke
support for nodes. It follows what kernfs->active does and isn't a
lock in particular. We kinda treat it as rwsem, where down_write() is
the outer-most lock in kdbus and _only_ called without any other lock
held (kdbus_node_deactivate()). Read-side, we never ever block on the
"lock", but only use try-lock. If it fails, the node is dead/revoked.
Therefore, the read-side of 'active' nests almost arbitrarily. We hold
'active'-references almost everywhere, to make sure a node is not
destroyed while we use it. However, we never sleep for an indefinite
time while holding it.
Given that the write-side is the outer-most lock in kdbus, it doesn't
dead-lock against the try-lock readers.

> Document your goddamn locking, would you?  It *IS* new code, and you, as you
> say, had very few people working on it, so you don't have the excuses for
> the mess existing in older parts of the tree.

Locking order (outer-most to inner-most):
 1) domain->lock
 2) names->rwlock
 3) endpoint->lock
 4) bus->conn_rwlock
 5) policy->entries_rwlock
 6) connection->lock
 7) metadata->lock

mmap_sem nests below metadata->lock. With the rcu-protected exe_file
patches by Davidlohr Bueso, we can even drop that dependency. They
have kinda stalled, though.

Then we have a bunch of data structure protection, which can be called
from any context:
 * bus->notify_lock
 * pool->lock
 * match->mdb_rwlock
 * node->lock

Lastly, there're 2 locks which nest around everything and must not be
taken with any lock held:
 * handle->rwlock (taken in ioctl-entry)
 * bus->notify_flush_lock (taken in work-queue)

General object stacking is:
domain -> bus -> endpoint -> policy -> connection -> {metadata,pool,match,node}
The conn_rwlock protection of the conn-list locks on kdbus_bus is the
only lock that doesn't follow this ordering.

Thanks
David

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 21:07                     ` Rik van Riel
@ 2015-04-16 18:03                       ` Djalal Harouni
  0 siblings, 0 replies; 316+ messages in thread
From: Djalal Harouni @ 2015-04-16 18:03 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andy Lutomirski, Havoc Pennington, One Thousand Gnomes,
	Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, Tom Gundersen, linux-kernel,
	Daniel Mack, David Herrmann

Hi,

On Wed, Apr 15, 2015 at 05:07:28PM -0400, Rik van Riel wrote:
[...]
> > This leads me to a potentially interesting question: where's the
> > buffering?  If there's a bus with lots of untrusted clients and one of
> > them broadcasts data faster than all receivers can process it, where
> > does it go?
> >
> > At least with a userspace solution, it's clear what the OOM killer
> > should kill when this happens.  Unless it's PID 1.  Sigh
> 
> It may be useful to do the buffering (and general interception
> of any message that cannot be delivered) in a userspace program.
> 
> Not only to get the buffers out of the kernel and into swappable
> memory, but also so people could re-use the same infrastructure
> for things like cluster communication (or communication between
> different containers) - the userspace daemons could take care of
> routing messages to and from the outside.
> 
> They could also be useful to keep some of the policy stuff
> outside of the kernel, if only to ensure that the kernel side
> policy is not set in stone, and people can do things differently
> in the future if they want to.
> 
kdbus connections have memory pools, please check  kdbus.pool(7). The
pool has its own quota accounting to prevent bad scenarios, and the
memory is attributed to the connection.

Messages that can't be delivered are not stored in the pool, but senders
will get an appropriate error code. For further details on how this
works, please see kdbus.message(7). If you are aware of any corner-cases
we overlooked, please let us know.

Regarding the policy, the implementaion is hardly more complex than
traditional UNIX file permissions. Bus names may have multiple
permissions assined, each of which consist of a bit-mask to denote OWN,
TALK and SEE flags which are applied to UIDs, GIDs or "world". This
policy has to be enforced by the kernel, therfore the information it
acts upon also needs to be stored there. For further details, please see
kdbus.policy(7).

The concept of a name policy originates from dbus1 [1], however we
simplified it substantially, removing features which we believe rather
belong into userspace.

[1] http://dbus.freedesktop.org/doc/dbus-daemon.1.html


-- 
Djalal Harouni
http://opendz.org

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 13:13                     ` Tom Gundersen
  2015-04-16 14:34                       ` Andy Lutomirski
@ 2015-04-16 19:01                       ` Havoc Pennington
  2015-04-17 13:23                         ` Daniel Mack
  1 sibling, 1 reply; 316+ messages in thread
From: Havoc Pennington @ 2015-04-16 19:01 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: Andy Lutomirski, Rik van Riel, One Thousand Gnomes,
	Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Thu, Apr 16, 2015 at 9:13 AM, Tom Gundersen <teg@jklm.no> wrote:
> All types of messages (unicast and broadcast) are directly stored into
> a pool slice of the receiving connection, and this slice is not reused
> by the kernel until userspace is finished with it and frees it. Hence,
> a client which doesn't process its incoming messages will, at some
> point, run out of pool space. If that happens for unicast messages,
> the sender will get an EXFULL error. If it happens for a multicast
> message, all we can do is drop the message, and tell the receiver how
> many messages have been lost when it issues KDBUS_CMD_RECV the next
> time. There's more on that in kdbus.message(7).
>

Have you guys already grappled with what libraries/apps should do with
this information?

To handle the knowledge that "N messages have been lost," it seems
like the client must answer "are there any messages that, if lost,
would put any code using this connection into a confused state" and
then the client has to recover from said confused state.

A library probably can't do this - it doesn't know what state matters
or how to recover it - so each app would have to...  and are
connections ever shared between modules of an app? (for example: could
a library such as GTK+ or pulseaudio be using the connection, and then
application code is also using the connection, so none of those code
modules has the whole picture... at that point, none of the modules
knows what to do about lost messages... to try to handle lost messages
in a module, you'd need a private connection(?)... which might be fine
as long as each app having a number of connections isn't too bloated.)

How to handle a send error depends a lot on what's being sent... but
if I were writing a general-purpose library wrapper, I'd be very
tempted to hide EXFULL behind an unbounded (or very-high-bounded)
userspace send buffer, which of course is what you were trying to
avoid, but I am skeptical that the average app will handle this error
sensibly.

The traditional userspace bus isn't any better than what you've
described here, of course - it's even worse - and it works well
enough. The limits are simply set high enough that they won't be hit
unless someone's broken or evil. Which is also the traditional
approach to say file descriptor limits or swap space: set the limit
high and hope you won't reach it. For the case of the X server, the
limit on message buffers appears to be "until malloc fails," so they
have the limit quite high, higher than userspace dbus does. "set high
limits and don't hit them" is a tried-and-true approach.

With either the existing userspace bus or kdbus, I bet you could come
up with ways to use limit exhaustion to get various services and apps
into confused states as they miss messages they were relying on,
simply because this is too hard for apps to reliably get right. The
lower the limits, the easier it would be to cause trouble by forcing
them to be hit.

In a perfect world we could figure out which client is "at fault" for
filling a buffer - the slow receiver or the overzealous sender - so we
could throttle or disconnect the guilty party instead of throwing
errors that won't be handled well ... but not sure that's practical.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 17:31                       ` David Herrmann
@ 2015-04-16 20:55                         ` Al Viro
  2015-04-18 11:44                           ` David Herrmann
  0 siblings, 1 reply; 316+ messages in thread
From: Al Viro @ 2015-04-16 20:55 UTC (permalink / raw)
  To: David Herrmann
  Cc: Greg Kroah-Hartman, Jiri Kosina, Borislav Petkov,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, Djalal Harouni

On Thu, Apr 16, 2015 at 07:31:22PM +0200, David Herrmann wrote:

> I'm working on patches to add more comments similar to how we did in
> node.c. For now, please see my explanations below:
> 
> node->lock is the _innermost_ lock.
> node->active implements revoke
> support for nodes. It follows what kernfs->active does and isn't a
> lock in particular. We kinda treat it as rwsem, where down_write() is
> the outer-most lock in kdbus and _only_ called without any other lock
> held (kdbus_node_deactivate()). Read-side, we never ever block on the
> "lock", but only use try-lock. If it fails, the node is dead/revoked.
> Therefore, the read-side of 'active' nests almost arbitrarily. We hold
> 'active'-references almost everywhere, to make sure a node is not
> destroyed while we use it. However, we never sleep for an indefinite
> time while holding it.

Umm...  Theoretically, but ->mmap_sem being under it means that it might
involve something like an NFS server timing out, so the latency might
suck very badly.

> Given that the write-side is the outer-most lock in kdbus, it doesn't
> dead-lock against the try-lock readers.

Huh?  I see at least this call chain:
kdbus_handle_ioctl_control()
	kdbus_node_acquire()
	kdbus_cmd_bus_make()
		kdbus_node_deactivate()
Granted, it won't be the _same_ node (otherwise you'd deadlock solid
right there and then), but it means that your locking order is sensitive
to something about nodes; it's not entirely determined by the lock type.

> Locking order (outer-most to inner-most):
>  1) domain->lock
>  2) names->rwlock
>  3) endpoint->lock
>  4) bus->conn_rwlock
>  5) policy->entries_rwlock
>  6) connection->lock
>  7) metadata->lock
> 
> mmap_sem nests below metadata->lock. With the rcu-protected exe_file
> patches by Davidlohr Bueso, we can even drop that dependency. They
> have kinda stalled, though.
> 
> Then we have a bunch of data structure protection, which can be called
> from any context:
>  * bus->notify_lock
>  * pool->lock
>  * match->mdb_rwlock
>  * node->lock
> 
> Lastly, there're 2 locks which nest around everything and must not be
> taken with any lock held:
>  * handle->rwlock (taken in ioctl-entry)

as well as in ->poll(), for completeness sake.  The latter, BTW, isn't
nice - kdbus is far from being the only thing that does it, but having
->poll() block can be somewhat surprising...

>  * bus->notify_flush_lock (taken in work-queue)

Hmm...  That needs some care - it means that it nests inside anything held
by callers of cancel_delayed_work_sync() on the corresponding work.  AFAICS,
there's at least one call chain leading to that from kdbus_node_deactivate()
(via ->release_cb == kdbus_ep_release -> kdbus_conn_disconnect ->
cancel_delayed_work_sync(&conn->work)) wait for kdbus_reply_list_scan_work
-> kdbus_notify_flush grabs ->notify_flush_lock).  Tracking back further is
harder - not all call sites of kdbus_node_deactivate() can lead to that...

BTW, it's not only done in wq callbacks - there's a direct chain from
kdbus_conn_disconnect() as well (both through kdbus_name_release_all ->
kdbus_notify_flush and directly through kdbus_notify_flush()).  And from
ioctl(), by many paths, while we are at it, but that only means that it
nests inside handle->rwlock, and _that_ is really the outermost.

What nests inside that one?  It definitely a part of hierarchy - it can't
be excluded from deadlock analysis as effectively outermost.  As for the
stuff under it...  registry->rwlock is obvious, what else?

> General object stacking is:
> domain -> bus -> endpoint -> policy -> connection -> {metadata,pool,match,node}
> The conn_rwlock protection of the conn-list locks on kdbus_bus is the
> only lock that doesn't follow this ordering.

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 17:04                           ` Andy Lutomirski
@ 2015-04-17  9:19                             ` Michal Hocko
  2015-04-17 18:54                               ` Andy Lutomirski
  0 siblings, 1 reply; 316+ messages in thread
From: Michal Hocko @ 2015-04-17  9:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Herrmann, Tom Gundersen, Havoc Pennington, Rik van Riel,
	One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	linux-kernel, Daniel Mack, Djalal Harouni

On Thu 16-04-15 10:04:17, Andy Lutomirski wrote:
> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
> > Hi
> >
> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> Whose memcg does the pool use?
> >
> > The pool-owner's (i.e., the receiver's).
> >
> >> If it's the receiver's, and if the
> >> receiver can configure a memcg, then it seems that even a single
> >> receiver could probably cause the sender to block for an unlimited
> >> amount of time.
> >
> > How? Which of those calls can block? I don't see how that can happen.
> 
> I admit I don't fully understand memcg, but vfs_iter_write is
> presumably going to need to get write access to the target pool page,
> and that, in turn, will need that page to exist in memory and to be
> writable, which may need to page it in and/or allocate a page.  If
> that uses the receiver's memcg (as it should), then the receiver can
> make it block.  Even if it doesn't use the receiver's memcg, it can
> trigger direct reclaim, I think.

Yes, memcg direct reclaim might trigger but we are no longer waiting for
the OOM victim from non page fault paths so the time is bounded. It
still might a quite some time, though, depending on the amount of work
done in the direct reclaim.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 19:01                       ` Havoc Pennington
@ 2015-04-17 13:23                         ` Daniel Mack
  2015-04-17 14:54                           ` Havoc Pennington
  0 siblings, 1 reply; 316+ messages in thread
From: Daniel Mack @ 2015-04-17 13:23 UTC (permalink / raw)
  To: Havoc Pennington, Tom Gundersen
  Cc: Andy Lutomirski, Rik van Riel, One Thousand Gnomes,
	Greg Kroah-Hartman, Jiri Kosina, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, linux-kernel, David Herrmann,
	Djalal Harouni

Hi Havoc,

On 04/16/2015 09:01 PM, Havoc Pennington wrote:
> On Thu, Apr 16, 2015 at 9:13 AM, Tom Gundersen <teg@jklm.no> wrote:
>> All types of messages (unicast and broadcast) are directly stored into
>> a pool slice of the receiving connection, and this slice is not reused
>> by the kernel until userspace is finished with it and frees it. Hence,
>> a client which doesn't process its incoming messages will, at some
>> point, run out of pool space. If that happens for unicast messages,
>> the sender will get an EXFULL error. If it happens for a multicast
>> message, all we can do is drop the message, and tell the receiver how
>> many messages have been lost when it issues KDBUS_CMD_RECV the next
>> time. There's more on that in kdbus.message(7).
> 
> Have you guys already grappled with what libraries/apps should do with
> this information?
> 
> To handle the knowledge that "N messages have been lost," it seems
> like the client must answer "are there any messages that, if lost,
> would put any code using this connection into a confused state" and
> then the client has to recover from said confused state.

This can only happen with user-originated DBus signal messages. For
unicast messages such as method calls, the sender will actually see
-EXFULL, and no part of the message is transmitted, leaving neither side
in a confused state. But yes, for broadcast signal messages, we can't
reject the sender because one single peer is out of buffer space, and we
can't allow boundless allocations on the receiver either, so informing
the other side is the best we can do.

Note that dbus-daemon just drops such signals silently. So with this
counter we simply add a debug mechanism for now. There hasn't been a
consensus on how to react to such errors on the application level. The
easiest way is obviously to re-sync all your state with the peer (which
could be as easy as calling ObjectManager.GetManagedObjects() or
Properties.GetAll()).

> A library probably can't do this - it doesn't know what state matters
> or how to recover it - so each app would have to...  and are
> connections ever shared between modules of an app? (for example: could
> a library such as GTK+ or pulseaudio be using the connection, and then
> application code is also using the connection, so none of those code
> modules has the whole picture... at that point, none of the modules
> knows what to do about lost messages... to try to handle lost messages
> in a module, you'd need a private connection(?)... which might be fine
> as long as each app having a number of connections isn't too bloated.)
> 
> How to handle a send error depends a lot on what's being sent... but
> if I were writing a general-purpose library wrapper, I'd be very
> tempted to hide EXFULL behind an unbounded (or very-high-bounded)
> userspace send buffer, which of course is what you were trying to
> avoid, but I am skeptical that the average app will handle this error
> sensibly.

Actually, we see no real difference between constrained outgoing or
incoming buffers. Even with a very-high-bounded send-buffer, you still
need to deal with it running full.

> The traditional userspace bus isn't any better than what you've
> described here, of course - it's even worse - and it works well
> enough. The limits are simply set high enough that they won't be hit
> unless someone's broken or evil. Which is also the traditional
> approach to say file descriptor limits or swap space: set the limit
> high and hope you won't reach it. For the case of the X server, the
> limit on message buffers appears to be "until malloc fails," so they
> have the limit quite high, higher than userspace dbus does. "set high
> limits and don't hit them" is a tried-and-true approach.
> 
> With either the existing userspace bus or kdbus, I bet you could come
> up with ways to use limit exhaustion to get various services and apps
> into confused states as they miss messages they were relying on,
> simply because this is too hard for apps to reliably get right. The
> lower the limits, the easier it would be to cause trouble by forcing
> them to be hit.
> 
> In a perfect world we could figure out which client is "at fault" for
> filling a buffer - the slow receiver or the overzealous sender - so we
> could throttle or disconnect the guilty party instead of throwing
> errors that won't be handled well ... but not sure that's practical.

Exactly, you need heuristics for that. It's non-trivial to figure out
whether the receiver or sender is to blame.

We've thought about how to address that for a while and came up with a
quota logic that is similar to what dbus-daemon implements in order to
prevent single connections from overflowing the pool of a receiver. The
limits that apply to that are currently hard-coded, and they work well
on our systems. In the future, they can easily be made a bus-wide
property that can be configured at bus creation time.


Thanks,
Daniel


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 16:37                     ` Robert Schwebel
@ 2015-04-17 13:45                       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-17 13:45 UTC (permalink / raw)
  To: Robert Schwebel
  Cc: One Thousand Gnomes, Austin S Hemmelgarn, Al Viro,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Thu, Apr 16, 2015 at 06:37:45PM +0200, Robert Schwebel wrote:
> On Wed, Apr 15, 2015 at 11:22:18PM +0100, One Thousand Gnomes wrote:
> > > The reason that 'everyone who works in this area' adopted is not as much
> > > that the design is sound (I'm not arguing whether it is or isn't in this
> > > case) as it is that none of them could come up with anything better.
> >
> > Actually most message passing code uses things like JMS and the various
> > MQ libraries. Most IoT uses things other than dbus, small deep embedded
> > never uses dbus.
> 
> For what it's worth: we more and more use dbus for small deep embedded
> systems, IoT, loosely coupled industrial control applications etc.

Thanks for confirming this, I thought I had seen it used in IoT devices
already.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-17 13:23                         ` Daniel Mack
@ 2015-04-17 14:54                           ` Havoc Pennington
  0 siblings, 0 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-17 14:54 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Tom Gundersen, Andy Lutomirski, Rik van Riel,
	One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	linux-kernel, David Herrmann, Djalal Harouni

On Fri, Apr 17, 2015 at 9:23 AM, Daniel Mack <daniel@zonque.org> wrote:
>
> This can only happen with user-originated DBus signal messages. For
> unicast messages such as method calls, the sender will actually see
> -EXFULL, and no part of the message is transmitted, leaving neither side
> in a confused state.

Well - big asterisk, * no confused state IF the sender handles EXFULL
in a reasonable way.  Which it probably doesn't most of the time :-)
but as you say it's no worse than it ever was.

> But yes, for broadcast signal messages, we can't
> reject the sender because one single peer is out of buffer space, and we
> can't allow boundless allocations on the receiver either, so informing
> the other side is the best we can do.

If this was ever going to happen (if the limits weren't high), I do
think it would be better to disconnect or throttle/backpressure
somehow, instead of breaking semantics. But the trouble is figuring
out how to do that... I don't know how. So the alternative is to set
high limits.

I think you're fine, it obviously works OK with the current userspace
daemon that punts in a similar way, and unix has a long tradition of
limits like this plus applications sucking at handling the "limit
reached" errors. It'll all work out...

> Note that dbus-daemon just drops such signals silently. So with this
> counter we simply add a debug mechanism for now. There hasn't been a
> consensus on how to react to such errors on the application level. The
> easiest way is obviously to re-sync all your state with the peer (which
> could be as easy as calling ObjectManager.GetManagedObjects() or
> Properties.GetAll()).

It's not realistic to expect the bulk of apps to handle this thing.
Special system services such as pid 1, you probably have the expertise
and time to try to carefully restore all state. Regular old apps will
get confused in practice if limits are hit in practice, but people
will configure the limits such that they're only hit if there's some
pathology going on.

>> How to handle a send error depends a lot on what's being sent... but
>> if I were writing a general-purpose library wrapper, I'd be very
>> tempted to hide EXFULL behind an unbounded (or very-high-bounded)
>> userspace send buffer, which of course is what you were trying to
>> avoid, but I am skeptical that the average app will handle this error
>> sensibly.
>
> Actually, we see no real difference between constrained outgoing or
> incoming buffers. Even with a very-high-bounded send-buffer, you still
> need to deal with it running full.

What I'm saying is that there's a practical difference between limits
low enough to be hit in normal operation, and limits high enough that
someobody has to be evil/broken before you hit them. With the "throw
an error" setup, if you set the limits low enough to be hit in
practice, then userspace will be buggy and break - that's my
prediction at least.

It's not different from say the file descriptor limits. If you crank
down your allowed open descriptors such that a user session actually
hits the limit, pretty much the session isn't usable. That's all I'm
saying.

If you wanted to be able to configure the limits low, where they'd be
hit in practice, then I think you'd want to look at some solution
other than tossing these errors that people will fail to handle
correctly, even if that solution were complex and/or heuristic. If you
set the limits high, it doesn't really matter so you can KISS.

It's sort of an academic point ... tons of kernel features already
have this issue. So carry on, you're good. :-)

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 14:48             ` Michal Schmidt
  2015-04-15 15:34               ` Mike Galbraith
  2015-04-15 16:42               ` Mike Galbraith
@ 2015-04-17 16:53               ` Mike Galbraith
  2 siblings, 0 replies; 316+ messages in thread
From: Mike Galbraith @ 2015-04-17 16:53 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Richard Weinberger, Andy Lutomirski, Al Viro, Greg Kroah-Hartman,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Wed, 2015-04-15 at 16:48 +0200, Michal Schmidt wrote:
> On 04/15/2015 09:31 AM, Mike Galbraith wrote:
> > it seems [systemd] has now mandated group scheduling.
> 
> What makes you think so? Was it the fact that by default you have a
> populated /sys/fs/cgroup/cpu/ hierarchy? This is either because some
> unit requests the use of the cpu controller using one of the CPU*=
> directives from systemd.resource-control(5), or (perhaps more likely)
> because there is a privileged unit with Delegate=yes. The most likely
> candidate is user@0.service, and so you could try preventing it from
> starting:
>   systemctl mask user@0.service

BTW, asking it to symlink it's disabled service to /dev/null, did 
indeed convince it to stop running said disabled service.

> Note that systemd still works without group scheduling or any cgroup
> subsystems enabled in the kernel:
> 
>   $ grep GROUP .config
>   CONFIG_CGROUPS=y

Yup.  CONFIG_CGROUPS=y all by itself isn't useless either, as that 
allows the user to use his box for something other than a doorstop.

Hohum, 'nuff of that ;-)

Thanks for the hint, it seems a tad dainbramaged, but it works.

        -Mike

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-17  9:19                             ` Michal Hocko
@ 2015-04-17 18:54                               ` Andy Lutomirski
  2015-04-20 12:43                                 ` Michal Hocko
  0 siblings, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-17 18:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Herrmann, Tom Gundersen, Havoc Pennington, Rik van Riel,
	One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	linux-kernel, Daniel Mack, Djalal Harouni

On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <mhocko@suse.cz> wrote:
> On Thu 16-04-15 10:04:17, Andy Lutomirski wrote:
>> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
>> > Hi
>> >
>> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> >> Whose memcg does the pool use?
>> >
>> > The pool-owner's (i.e., the receiver's).
>> >
>> >> If it's the receiver's, and if the
>> >> receiver can configure a memcg, then it seems that even a single
>> >> receiver could probably cause the sender to block for an unlimited
>> >> amount of time.
>> >
>> > How? Which of those calls can block? I don't see how that can happen.
>>
>> I admit I don't fully understand memcg, but vfs_iter_write is
>> presumably going to need to get write access to the target pool page,
>> and that, in turn, will need that page to exist in memory and to be
>> writable, which may need to page it in and/or allocate a page.  If
>> that uses the receiver's memcg (as it should), then the receiver can
>> make it block.  Even if it doesn't use the receiver's memcg, it can
>> trigger direct reclaim, I think.
>
> Yes, memcg direct reclaim might trigger but we are no longer waiting for
> the OOM victim from non page fault paths so the time is bounded. It
> still might a quite some time, though, depending on the amount of work
> done in the direct reclaim.

Is that still true if OOM notifiers are involved?  I've lost track of
what changed there.

Any any event, I'm not entirely convinced that having a broadcast send
cause, say, PID 1 to block until an unbounded number of pages in a
potentially unbounded number of memcgs are reclaimed is a good idea.

In the kdbus model's favor, I think that allowing pages of data in the
receive queue to be swapped out is potentially quite nice, but I'm
less convinced about non-full pages in the receive queue.  There's a
resource management tradeoff here, and one nice thing about AF_UNIX is
that sends are genuinely non-blocking.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 12:13                       ` David Herrmann
@ 2015-04-17 19:27                         ` James Bottomley
  2015-04-17 20:27                           ` Havoc Pennington
  0 siblings, 1 reply; 316+ messages in thread
From: James Bottomley @ 2015-04-17 19:27 UTC (permalink / raw)
  To: David Herrmann
  Cc: Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt, John Stoffel,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, Djalal Harouni, Paul E. McKenney

On Thu, 2015-04-16 at 14:13 +0200, David Herrmann wrote:
> Hi
> 
> On Wed, Apr 15, 2015 at 8:12 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > For me the biggest issue is the container problem: it's really hard to
> > containerise kdbus because of the stateful nature of the protocol and
> > the fact that it has a well known system bus.  Separation into domains
> > works for OS containers, but application containers need more fluidity.
> > It's not unlike the same problem on windows: Windows application
> > containers are very difficult to do because the global registry means
> > that OLE handlers all have to run inside your container as well
> > (effectively making it an OS container).  I'm sure, since we already
> > have a lot of containers people going to plumbers, that we can get them
> > to turn up for the discussion.
> 
> kdbus actually works very well in OS containers that mount a new
> kdbusfs inside the container. This new instance of kdbus will be
> entirely seperated from any other on the system. We've designed it
> that way especially with OS containers in mind. This is explained in
> kdbus.fs(7). It's very similar to devpts' container support, where you
> mount a new instance of devpts into each container instance you run.
> 
> For Docker-style (i.e. app-focused) containers, it's a more complex
> story.

Well, no, docker-style is just one flavour of application containers.
I'm actually much more interested in something very different:
applications that use container features (like docker, rocket and
systemd).  Facilitating them is an interesting exercise.

Also, applications inside containers were around long before docker in
the PaaS space at least.

>  kdbus will not solve this for you, but at least one thing
> deserves being mentioned: for this kind of sandboxing kdbus certainly
> makes things *easier*, compared to dbus1.

So slightly better than really difficult isn't terribly useful.

>  Why? because the kernel
> gains a notion of individual messages and method call transactions,
> something that is completely unavailable if you stick to dbus1 where
> all the kernel sees is a raw stream of AF_UNIX/SOCK_STREAM bytes. In
> fact, kdbus as it is right now even contains minimal but explicit
> support for sandboxing, by allowing creation of multiple bus endpoints
> to the same bus that carry additional, more restrictive policy.

Sandboxing is a minor (albeit very useful) use of containers.

You nicely ignored the actual problem I listed, which is the system bus.
And the specific example of what happens.  Let me try again.  Just to
provide the context, Virtuozzo has long supported containers on both
Windows and Linux.  We have been doing application containers on Linux
for a long time, but we've been having issues doing the same thing on
windows (in spite of the fact that our windows container system is very
similar to the Linux one).

In windows, OLE + the global registry is dbus on steroids.  The idea
seems simple and elegant: remote system elements are provided to you via
an IPC interaction instead of being directly dynamically linked into
your virtual address space.  It allows windows applications to deal with
arbitrary objects of unknown type because the type handlers are provided
by the system via OLE.  It's really elegant in a single user desktop
environment because the system's job is to serve and protect only that
user.  In a multi user environment (as MS found with VDI) it's a lot
more problematic because now either the type handlers are global
(meaning local users can't modify them unlike in the single user case)
or they're all local, meaning we're back to OS containers again.  If you
think abstractly of containers as a way to bring multi-user features to
single user environments (essentially that's what OS virtualization is)
you can see immediately why we're having such issues with non-os
containers on Windows because the single bus/global namespace idea
doesn't play well with multi-user.

This is why I think kdbus is a bad idea: it solidifies as a linux kernel
API something which runs counter to granular OS virtualization (and
something which caused Windows to fall behind Linux in the container
space).  Splitting out the acceleration problem and leaving the rest to
user space currently looks fine because the ideas Al and Andy are
kicking around don't cause problems with OS virtualization.

James



^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-17 19:27                         ` James Bottomley
@ 2015-04-17 20:27                           ` Havoc Pennington
  2015-04-17 21:45                             ` Alex Elsayed
  2015-04-20 18:01                             ` James Bottomley
  0 siblings, 2 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-17 20:27 UTC (permalink / raw)
  To: James Bottomley
  Cc: David Herrmann, Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt,
	John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni,
	Paul E. McKenney

Hi,

On Fri, Apr 17, 2015 at 3:27 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> This is why I think kdbus is a bad idea: it solidifies as a linux kernel
> API something which runs counter to granular OS virtualization (and
> something which caused Windows to fall behind Linux in the container
> space).  Splitting out the acceleration problem and leaving the rest to
> user space currently looks fine because the ideas Al and Andy are
> kicking around don't cause problems with OS virtualization.
>

I'm interested in understanding this problem (if only for my own
curiosity) but I'm not confident I understand what you're saying
correctly.

Can I try to explain back / ask questions and see what I have right?

I think you are saying that if an application relies on a system
service (= any other process that runs on the system bus) then to
virtualize that app by itself in a dedicated container, the system bus
and the system service need to also be in the container. So the
container ends up with a bunch of stuff in it beyond only the
application.  Right / wrong / confused?

I also think you're saying that userspace dbus has the same issue
(this isn't a userspace vs. kernel thing per se), the objection to
kdbus is that it makes this issue more solidified / harder to fix?

Do you have ideas on how to go about fixing it, whether in userspace
or kernel dbus?

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-17 20:27                           ` Havoc Pennington
@ 2015-04-17 21:45                             ` Alex Elsayed
  2015-04-20 18:01                             ` James Bottomley
  1 sibling, 0 replies; 316+ messages in thread
From: Alex Elsayed @ 2015-04-17 21:45 UTC (permalink / raw)
  To: linux-kernel

Havoc Pennington wrote:

> Hi,
> 
> On Fri, Apr 17, 2015 at 3:27 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
>>
>> This is why I think kdbus is a bad idea: it solidifies as a linux kernel
>> API something which runs counter to granular OS virtualization (and
>> something which caused Windows to fall behind Linux in the container
>> space).  Splitting out the acceleration problem and leaving the rest to
>> user space currently looks fine because the ideas Al and Andy are
>> kicking around don't cause problems with OS virtualization.
>>
> 
> I'm interested in understanding this problem (if only for my own
> curiosity) but I'm not confident I understand what you're saying
> correctly.
> 
> Can I try to explain back / ask questions and see what I have right?
> 
> I think you are saying that if an application relies on a system
> service (= any other process that runs on the system bus) then to
> virtualize that app by itself in a dedicated container, the system bus
> and the system service need to also be in the container. So the
> container ends up with a bunch of stuff in it beyond only the
> application.  Right / wrong / confused?
> 
> I also think you're saying that userspace dbus has the same issue
> (this isn't a userspace vs. kernel thing per se), the objection to
> kdbus is that it makes this issue more solidified / harder to fix?
> 
> Do you have ideas on how to go about fixing it, whether in userspace
> or kernel dbus?
> 
> Havoc

So far as I understand (and this may be wrong), this is the use case of 
kdbus "endpoints" - you'd create a (constrained) kdbus endpoint on the host, 
and then expose it to the application, such that the application uses it as 
if it were the system bus.


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 20:55                         ` Al Viro
@ 2015-04-18 11:44                           ` David Herrmann
  0 siblings, 0 replies; 316+ messages in thread
From: David Herrmann @ 2015-04-18 11:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Greg Kroah-Hartman, Jiri Kosina, Borislav Petkov,
	Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	linux-kernel, Daniel Mack, Djalal Harouni

Hi

On Thu, Apr 16, 2015 at 10:55 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Apr 16, 2015 at 07:31:22PM +0200, David Herrmann wrote:
>
>> I'm working on patches to add more comments similar to how we did in
>> node.c. For now, please see my explanations below:
>>
>> node->lock is the _innermost_ lock.
>> node->active implements revoke
>> support for nodes. It follows what kernfs->active does and isn't a
>> lock in particular. We kinda treat it as rwsem, where down_write() is
>> the outer-most lock in kdbus and _only_ called without any other lock
>> held (kdbus_node_deactivate()). Read-side, we never ever block on the
>> "lock", but only use try-lock. If it fails, the node is dead/revoked.
>> Therefore, the read-side of 'active' nests almost arbitrarily. We hold
>> 'active'-references almost everywhere, to make sure a node is not
>> destroyed while we use it. However, we never sleep for an indefinite
>> time while holding it.
>
> Umm...  Theoretically, but ->mmap_sem being under it means that it might
> involve something like an NFS server timing out, so the latency might
> suck very badly.

Fixed! [1]
Linus just pulled akpm#3, which includes the rcu-protection for
exe-file. No more direct mmap_sem access in kdbus, anymore.

>> Given that the write-side is the outer-most lock in kdbus, it doesn't
>> dead-lock against the try-lock readers.
>
> Huh?  I see at least this call chain:
> kdbus_handle_ioctl_control()
>         kdbus_node_acquire()
>         kdbus_cmd_bus_make()
>                 kdbus_node_deactivate()
> Granted, it won't be the _same_ node (otherwise you'd deadlock solid
> right there and then), but it means that your locking order is sensitive
> to something about nodes; it's not entirely determined by the lock type.

Indeed. We do allow pinning parent objects when deactivating its
children. I updated my doc-drafts accordingly.

>> Locking order (outer-most to inner-most):
>>  1) domain->lock
>>  2) names->rwlock
>>  3) endpoint->lock
>>  4) bus->conn_rwlock
>>  5) policy->entries_rwlock
>>  6) connection->lock
>>  7) metadata->lock
>>
>> mmap_sem nests below metadata->lock. With the rcu-protected exe_file
>> patches by Davidlohr Bueso, we can even drop that dependency. They
>> have kinda stalled, though.
>>
>> Then we have a bunch of data structure protection, which can be called
>> from any context:
>>  * bus->notify_lock
>>  * pool->lock
>>  * match->mdb_rwlock
>>  * node->lock
>>
>> Lastly, there're 2 locks which nest around everything and must not be
>> taken with any lock held:
>>  * handle->rwlock (taken in ioctl-entry)
>
> as well as in ->poll(), for completeness sake.  The latter, BTW, isn't
> nice - kdbus is far from being the only thing that does it, but having
> ->poll() block can be somewhat surprising...

I have a patch to fix this [2]. But it's more complex than the rwsem,
and requires some more review. However, it reduces the handle-locking
to a minimum, such that we only lock it during setup and can reduce it
to a mutex.

>>  * bus->notify_flush_lock (taken in work-queue)
>
> Hmm...  That needs some care - it means that it nests inside anything held
> by callers of cancel_delayed_work_sync() on the corresponding work.  AFAICS,
> there's at least one call chain leading to that from kdbus_node_deactivate()
> (via ->release_cb == kdbus_ep_release -> kdbus_conn_disconnect ->
> cancel_delayed_work_sync(&conn->work)) wait for kdbus_reply_list_scan_work
> -> kdbus_notify_flush grabs ->notify_flush_lock).  Tracking back further is
> harder - not all call sites of kdbus_node_deactivate() can lead to that...
>
> BTW, it's not only done in wq callbacks - there's a direct chain from
> kdbus_conn_disconnect() as well (both through kdbus_name_release_all ->
> kdbus_notify_flush and directly through kdbus_notify_flush()).  And from
> ioctl(), by many paths, while we are at it, but that only means that it
> nests inside handle->rwlock, and _that_ is really the outermost.

Sorry, this was a mistake on my side. We do call kdbus_notify_flush()
directly quite often. And it nests underneath the handle, correct. I
noted this down.

I did have patches to actually move the kdbus_notify_flush() call to
the end of kdbus_handle_ioctl() and friends. Such so we flush all
collected notifications on return to user-space, which would make the
locking more obvious. However, it didn't make it much simpler, imo, so
it was never applied.

> What nests inside that one?  It definitely a part of hierarchy - it can't
> be excluded from deadlock analysis as effectively outermost.  As for the
> stuff under it...  registry->rwlock is obvious, what else?

(Updated) Data-structure locks:
  * bus->notify_lock
  * pool->lock
  * match->mdb_rwlock
  * node->lock

Updated locking order:
  1) handle->rwlock
  2) bus->notify_flush_lock
  3) domain->lock
  4) names->rwlock
  5) endpoint->lock
  6) bus->conn_rwlock
  7) policy->entries_rwlock
  8) connection->lock
  9) metadata->lock

 * node->active read-side locks arbitrarily underneath handle->rwlock.

 * node->active write-side nests underneath handle->rwlock, and
underneath read-side of any parent-node->active.

Thanks! Much appreciated!
David

[1] http://cgit.freedesktop.org/~dvdhrm/linux/commit/?h=kdbus&id=f396c12ecfda1717e5f76d6b4ab11e4db232e60d
[2] http://cgit.freedesktop.org/~dvdhrm/linux/commit/?h=kdbus&id=61875e1abd38a965c9f7dfca28068dd0a871961c

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-17 18:54                               ` Andy Lutomirski
@ 2015-04-20 12:43                                 ` Michal Hocko
  2015-04-20 20:03                                   ` Andy Lutomirski
  0 siblings, 1 reply; 316+ messages in thread
From: Michal Hocko @ 2015-04-20 12:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Herrmann, Tom Gundersen, Havoc Pennington, Rik van Riel,
	One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	linux-kernel, Daniel Mack, Djalal Harouni

On Fri 17-04-15 11:54:42, Andy Lutomirski wrote:
> On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <mhocko@suse.cz> wrote:
> > On Thu 16-04-15 10:04:17, Andy Lutomirski wrote:
> >> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
> >> > Hi
> >> >
> >> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >> >> Whose memcg does the pool use?
> >> >
> >> > The pool-owner's (i.e., the receiver's).
> >> >
> >> >> If it's the receiver's, and if the
> >> >> receiver can configure a memcg, then it seems that even a single
> >> >> receiver could probably cause the sender to block for an unlimited
> >> >> amount of time.
> >> >
> >> > How? Which of those calls can block? I don't see how that can happen.
> >>
> >> I admit I don't fully understand memcg, but vfs_iter_write is
> >> presumably going to need to get write access to the target pool page,
> >> and that, in turn, will need that page to exist in memory and to be
> >> writable, which may need to page it in and/or allocate a page.  If
> >> that uses the receiver's memcg (as it should), then the receiver can
> >> make it block.  Even if it doesn't use the receiver's memcg, it can
> >> trigger direct reclaim, I think.
> >
> > Yes, memcg direct reclaim might trigger but we are no longer waiting for
> > the OOM victim from non page fault paths so the time is bounded. It
> > still might a quite some time, though, depending on the amount of work
> > done in the direct reclaim.
> 
> Is that still true if OOM notifiers are involved?  I've lost track of
> what changed there.

memcg OOM is not triggered from get_user_pages. See 519e52473ebe (mm:
memcg: enable memcg OOM killer only for user faults)
 
> Any any event, I'm not entirely convinced that having a broadcast send
> cause, say, PID 1 to block until an unbounded number of pages in a
> potentially unbounded number of memcgs are reclaimed is a good idea.

This deserves a clarification I guess. It is the memcg of the current
task which gets charged during the page fault normally. So if PID1 tries
to fault the memory in it will be its (most probably root) memcg which
gets charged. If the memory was already charged to a different task's
memcg and then it got swapped out, though, the PID1 would indeed wait
for the reclaim in the target memcg to swap the page back in.

In either case this sounds like a potential problem, because tasks
could hide their memory charges from the limit or PID1 context could
be blocked. But maybe I just misunderstood the and an uncharged memory
cannot be used for the buffer.

> In the kdbus model's favor, I think that allowing pages of data in the
> receive queue to be swapped out is potentially quite nice, but I'm
> less convinced about non-full pages in the receive queue.  There's a
> resource management tradeoff here, and one nice thing about AF_UNIX is
> that sends are genuinely non-blocking.
> 
> --Andy

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-17 20:27                           ` Havoc Pennington
  2015-04-17 21:45                             ` Alex Elsayed
@ 2015-04-20 18:01                             ` James Bottomley
  2015-04-21  8:09                               ` Daniel Mack
  1 sibling, 1 reply; 316+ messages in thread
From: James Bottomley @ 2015-04-20 18:01 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: David Herrmann, Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt,
	John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, linux-kernel, Daniel Mack, Djalal Harouni,
	Paul E. McKenney

On Fri, 2015-04-17 at 16:27 -0400, Havoc Pennington wrote:
> Hi,
> 
> On Fri, Apr 17, 2015 at 3:27 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> >
> > This is why I think kdbus is a bad idea: it solidifies as a linux kernel
> > API something which runs counter to granular OS virtualization (and
> > something which caused Windows to fall behind Linux in the container
> > space).  Splitting out the acceleration problem and leaving the rest to
> > user space currently looks fine because the ideas Al and Andy are
> > kicking around don't cause problems with OS virtualization.
> >
> 
> I'm interested in understanding this problem (if only for my own
> curiosity) but I'm not confident I understand what you're saying
> correctly.
> 
> Can I try to explain back / ask questions and see what I have right?
> 
> I think you are saying that if an application relies on a system
> service (= any other process that runs on the system bus) then to
> virtualize that app by itself in a dedicated container, the system bus
> and the system service need to also be in the container. So the
> container ends up with a bunch of stuff in it beyond only the
> application.  Right / wrong / confused?

Right.  Consider named as the unix equivalent.  In most application
containers, it's provided from outside.  However, any container that
wants it provided inside simply intercepts and overrides the well known
socket.  We can do this in UNIX because there's no global bus handling
these queries, it's simply a matter of knowing where the socket is.  In
windows you can't pick and choose the services you consume from outside.
Either you pull the whole OLE namespace into the container, and thus
have to provide everything from within, or try to run with none of it
provided by the container.  It's this everything or nothing that's the
problem.  Container virtualisation is about being granular and a system
bus (or global OLE namespace) is about being monolithic.

> I also think you're saying that userspace dbus has the same issue
> (this isn't a userspace vs. kernel thing per se), the objection to
> kdbus is that it makes this issue more solidified / harder to fix?

Yes, it does.  We have problems containerising Linux desktops as well.
However, most of our server stuff is daemon and socket based, so that
containerises nicely.  In windows, OLE has been absorbed even into the
server model which is why they have a bigger problem.

> Do you have ideas on how to go about fixing it, whether in userspace
> or kernel dbus?

Well, I've always suspected the solution would be for dbus to have a
hierarchical namespace of its own with the default policy be pass
message to parent namespace.  This would allow a container to determine
which services were serviced outside and which inside the container (if
you attach as a provider to the system bus in the container, that
attachment supersedes the parent).

However, this doesn't solve the security problem: just because a
container hasn't attached an interior provider doesn't mean it should be
allowed complete access to all services provided from outside. This is
the nasty problem because it involves some type of filter on busses
which pass through containers.

James



^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-20 12:43                                 ` Michal Hocko
@ 2015-04-20 20:03                                   ` Andy Lutomirski
  0 siblings, 0 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-20 20:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David Herrmann, Tom Gundersen, Havoc Pennington, Rik van Riel,
	One Thousand Gnomes, Greg Kroah-Hartman, Jiri Kosina,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	linux-kernel, Daniel Mack, Djalal Harouni

On Mon, Apr 20, 2015 at 5:43 AM, Michal Hocko <mhocko@suse.cz> wrote:
> On Fri 17-04-15 11:54:42, Andy Lutomirski wrote:
>> On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <mhocko@suse.cz> wrote:
>> > On Thu 16-04-15 10:04:17, Andy Lutomirski wrote:
>> >> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <dh.herrmann@gmail.com> wrote:
>> >> > Hi
>> >> >
>> >> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> >> >> Whose memcg does the pool use?
>> >> >
>> >> > The pool-owner's (i.e., the receiver's).
>> >> >
>> >> >> If it's the receiver's, and if the
>> >> >> receiver can configure a memcg, then it seems that even a single
>> >> >> receiver could probably cause the sender to block for an unlimited
>> >> >> amount of time.
>> >> >
>> >> > How? Which of those calls can block? I don't see how that can happen.
>> >>
>> >> I admit I don't fully understand memcg, but vfs_iter_write is
>> >> presumably going to need to get write access to the target pool page,
>> >> and that, in turn, will need that page to exist in memory and to be
>> >> writable, which may need to page it in and/or allocate a page.  If
>> >> that uses the receiver's memcg (as it should), then the receiver can
>> >> make it block.  Even if it doesn't use the receiver's memcg, it can
>> >> trigger direct reclaim, I think.
>> >
>> > Yes, memcg direct reclaim might trigger but we are no longer waiting for
>> > the OOM victim from non page fault paths so the time is bounded. It
>> > still might a quite some time, though, depending on the amount of work
>> > done in the direct reclaim.
>>
>> Is that still true if OOM notifiers are involved?  I've lost track of
>> what changed there.
>
> memcg OOM is not triggered from get_user_pages. See 519e52473ebe (mm:
> memcg: enable memcg OOM killer only for user faults)
>
>> Any any event, I'm not entirely convinced that having a broadcast send
>> cause, say, PID 1 to block until an unbounded number of pages in a
>> potentially unbounded number of memcgs are reclaimed is a good idea.
>
> This deserves a clarification I guess. It is the memcg of the current
> task which gets charged during the page fault normally. So if PID1 tries
> to fault the memory in it will be its (most probably root) memcg which
> gets charged. If the memory was already charged to a different task's
> memcg and then it got swapped out, though, the PID1 would indeed wait
> for the reclaim in the target memcg to swap the page back in.
>
> In either case this sounds like a potential problem, because tasks
> could hide their memory charges from the limit or PID1 context could
> be blocked. But maybe I just misunderstood the and an uncharged memory
> cannot be used for the buffer.
>

Hmm.  One of the explicit design goals of kdbus is for sandboxing,
i.e. creating a restricted view ("endpoint") and letting sandboxed
things talk to non-sandboxed things outside through that restricted
view.

Given that, the ability for a broadcast receiver to cause a sender
(PID 1?) to allocate root-memcg pages seems like it could be a
problem.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-20 18:01                             ` James Bottomley
@ 2015-04-21  8:09                               ` Daniel Mack
  2015-04-21 18:25                                 ` Andy Lutomirski
  0 siblings, 1 reply; 316+ messages in thread
From: Daniel Mack @ 2015-04-21  8:09 UTC (permalink / raw)
  To: James Bottomley, Havoc Pennington
  Cc: David Herrmann, Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt,
	John Stoffel, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, linux-kernel, Djalal Harouni, Paul E. McKenney

Hi,

On 04/20/2015 08:01 PM, James Bottomley wrote:
> On Fri, 2015-04-17 at 16:27 -0400, Havoc Pennington wrote:

>> Do you have ideas on how to go about fixing it, whether in userspace
>> or kernel dbus?
> 
> Well, I've always suspected the solution would be for dbus to have a
> hierarchical namespace of its own with the default policy be pass
> message to parent namespace.  This would allow a container to determine
> which services were serviced outside and which inside the container (if
> you attach as a provider to the system bus in the container, that
> attachment supersedes the parent).
> 
> However, this doesn't solve the security problem: just because a
> container hasn't attached an interior provider doesn't mean it should be
> allowed complete access to all services provided from outside. This is
> the nasty problem because it involves some type of filter on busses
> which pass through containers.

Fair point, we've been thinking about that as well. What we implemented
for that is something we call 'custom endpoints', which is described in
kdbus.endpoint(7).

In short, an endpoint is an entry point to the bus. Each bus provides a
default endpoint node that enforces the bus-wide policy rules that
define which well-known names a peer may own, see, or talk to. Custom
endpoints can be added to carry additional policy rules for peers
connected through it, and redirecting a task or container to the custom
endpoint instead of the default one is as easy as bind-mounting the
node. systemd units actually have support for that since a while, which
is how we tested this feature. This implementation doesn't even add much
code to kdbus, because we do have the policy code around anyway, so
that's just a matter of which policy database to look at during runtime.

That said, it would actually even be easy to implement a way to allow
overriding names on custom endpoints too, so that services inside a
container can replace such that already exist on the bus. It's just that
so far, we haven't yet seen a use case for this.


Thanks,
Daniel



^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-16 12:02                             ` Tom Gundersen
  2015-04-16 12:15                               ` Olaf Hering
@ 2015-04-21 16:36                               ` Eric W. Biederman
  2015-04-21 19:38                                 ` Matthew Garrett
  1 sibling, 1 reply; 316+ messages in thread
From: Eric W. Biederman @ 2015-04-21 16:36 UTC (permalink / raw)
  To: Tom Gundersen
  Cc: Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

Tom Gundersen <teg@jklm.no> writes:

> Moreover, the daemon performing the shutdown tasks is necessarily
> always privileged enough to do so, so calling into the kernel and see
> what happens is completely the wrong thing to do (it would simply
> succeed). What matters is if the client calling the daemon is
> sufficiently privileged. If the client has the capabilites necessary
> to call the reboot syscall directly, it makes no sense to disallow
> them from doing a clean reboot. It would be like giving someone access
> to pull the power plug, but not allow them to shutdown the machine
> cleanly.
>
> To conclude, the kernel makes the decision for allowing reboot() to
> succeed based on CAP_SYS_BOOT, so when we decide whether or not to
> perform the preparation steps, we really must also use CAP_SYS_BOOT.
> If we are more restrictive, it does not gain us anything as people
> with CAP_SYS_BOOT can just circumvent our logic and "pull the plug" by
> calling reboot() directly. If we are less restrictive and for instance
> check for uid==0 it would essentially mean that we have added a way to
> circumvent the dropping of CAP_SYS_BOOT.

*Blink*  Privilege escalation via CAP_SYS_BOOT *Blink*

*Puts on black hat*

HeHeHe.  You mean all I need to do to get around all of the logging servers is
capture CAP_SYS_BOOT?  Say like just capture this crazy watchdog program
that doesn't run as root so that it can only reboot the system? HeHeHe
So I can just trigger a clean reboot wait for journald, auditd, and
syslog all to shut down and then do evil things to the machine without
having to worry about erasing forensic evidence?

Bahahaha! This looks like fun I should play with this.

*Takes black hat off* 

Seriously it does not make sense to reuse these bits for purpose to
which they were not designed.   A reboot proceeded by a clean shutdown
is something different from a reboot that skips all of those steps.

I can understand the concerns about not wanting to allow circumventing
dropping CAP_SYS_BOOT but even with that concern in place I think it is
silly.  That isn't what CAP_SYS_BOOT means.

Over the long term userspace doing weird things like this will mean that
we will have the change the kernel to add
CAP_SYS_BOOT_THIS_TIME_I_MEAN_IT.  And have that control the reboot
system call and have the existing CAP_SYS_BOOT be some kind of token for
userspace.

Instead of going down that rat hole it would be much better for
userspace to figure out a token of their own.  Perhaps a file descriptor
certain privileged processes can pass, perhaps something else.

The bottom line is that I tend to suck at figuring out how to exploit
systems and I saw an exploit possibility with the extended privileges
you granted to CAP_SYS_BOOT nearly instantly.

I can't imagine how kernel capabilities are the right too for this kind
of job.

Eric


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-15 17:59                 ` Austin S Hemmelgarn
  2015-04-15 18:04                   ` Rik van Riel
  2015-04-15 22:22                   ` One Thousand Gnomes
@ 2015-04-21 16:54                   ` Diego Viola
  2015-04-21 17:06                     ` Greg Kroah-Hartman
  2 siblings, 1 reply; 316+ messages in thread
From: Diego Viola @ 2015-04-21 16:54 UTC (permalink / raw)
  To: Austin S Hemmelgarn
  Cc: Greg Kroah-Hartman, Al Viro, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

I'd like to see D-Bus in the kernel (kdbus), if that's going to make
D-Bus faster.

See this application taking 15 seconds to start just because D-Bus is too slow.

https://bugs.kde.org/show_bug.cgi?id=342682

Hopefully kdbus solves problems such as this one.

Diego

On Wed, Apr 15, 2015 at 2:59 PM, Austin S Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2015-04-14 15:43, Greg Kroah-Hartman wrote:
>>
>> On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote:
>>>
>>> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
>>>
>>>>> I agree.  You've sent a pull request for an unfortunate design.  I
>>>>> don't think that unfortunate design belongs in the kernel.  If it says
>>>>> in userspace, then user programmers could potentially fix it some day.
>>>>
>>>>
>>>> You might not like the design, but it is a valid design.  Again, we
>>>> don't refuse to support hardware that is designed badly.  Or support
>>>> protocols we don't necessarily like, that's not the job of a kernel or
>>>> operating system.
>>>
>>>
>>> And no, "the sole consumer of that API knows better, so bend over" is not
>>> a good idea.  We have shitloads of examples when single-consumer APIs
>>> turned into screaming horrors; taking that in over the objections to API
>>> design, merely on "they do it that way, who the hell we are to say they
>>> are wrong?" is insane.
>>
>>
>> Again, in this domain, the design is sound.  So much so that everyone
>> who works in that area moved toward it (KDE, Qt, Go, etc.)  We might not
>> think it makes sense, and it did take me a while to wrap my head around
>> it, but to call it "crap" is unfair, sorry.
>>
>
> The reason that 'everyone who works in this area' adopted is not as much
> that the design is sound (I'm not arguing whether it is or isn't in this
> case) as it is that none of them could come up with anything better.
>

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-21 16:54                   ` Diego Viola
@ 2015-04-21 17:06                     ` Greg Kroah-Hartman
  2015-04-21 17:25                       ` Diego Viola
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-21 17:06 UTC (permalink / raw)
  To: Diego Viola
  Cc: Austin S Hemmelgarn, Al Viro, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Tue, Apr 21, 2015 at 01:54:54PM -0300, Diego Viola wrote:
> I'd like to see D-Bus in the kernel (kdbus), if that's going to make
> D-Bus faster.
> 
> See this application taking 15 seconds to start just because D-Bus is too slow.
> 
> https://bugs.kde.org/show_bug.cgi?id=342682
> 
> Hopefully kdbus solves problems such as this one.

That bug really doesn't look like it would be solved by kdbus, I don't
see a ton of messages being sent as the issue, do you?  It seems like
something is timing out and then continuing on with the application
startup.

But, you can try it out, grab the kernel patch, enable it in systemd,
and try it for yourself and let us know!

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-21 17:06                     ` Greg Kroah-Hartman
@ 2015-04-21 17:25                       ` Diego Viola
  0 siblings, 0 replies; 316+ messages in thread
From: Diego Viola @ 2015-04-21 17:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Austin S Hemmelgarn, Al Viro, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

I'm not exactly sure what the problem is. It might not even be a
problem with D-bus, and it's probably a timeout issue as you said.

I'll give kdbus a try anyway and report back.

Thanks,

Diego

On Tue, Apr 21, 2015 at 2:06 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Tue, Apr 21, 2015 at 01:54:54PM -0300, Diego Viola wrote:
>> I'd like to see D-Bus in the kernel (kdbus), if that's going to make
>> D-Bus faster.
>>
>> See this application taking 15 seconds to start just because D-Bus is too slow.
>>
>> https://bugs.kde.org/show_bug.cgi?id=342682
>>
>> Hopefully kdbus solves problems such as this one.
>
> That bug really doesn't look like it would be solved by kdbus, I don't
> see a ton of messages being sent as the issue, do you?  It seems like
> something is timing out and then continuing on with the application
> startup.
>
> But, you can try it out, grab the kernel patch, enable it in systemd,
> and try it for yourself and let us know!
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-21  8:09                               ` Daniel Mack
@ 2015-04-21 18:25                                 ` Andy Lutomirski
  0 siblings, 0 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-21 18:25 UTC (permalink / raw)
  To: Daniel Mack
  Cc: James Bottomley, Havoc Pennington, David Herrmann,
	Greg Kroah-Hartman, Jiri Kosina, Steven Rostedt, John Stoffel,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, linux-kernel, Djalal Harouni,
	Paul E. McKenney

On Tue, Apr 21, 2015 at 1:09 AM, Daniel Mack <daniel@zonque.org> wrote:
> Hi,
>
> On 04/20/2015 08:01 PM, James Bottomley wrote:
>> On Fri, 2015-04-17 at 16:27 -0400, Havoc Pennington wrote:
>
>>> Do you have ideas on how to go about fixing it, whether in userspace
>>> or kernel dbus?
>>
>> Well, I've always suspected the solution would be for dbus to have a
>> hierarchical namespace of its own with the default policy be pass
>> message to parent namespace.  This would allow a container to determine
>> which services were serviced outside and which inside the container (if
>> you attach as a provider to the system bus in the container, that
>> attachment supersedes the parent).
>>
>> However, this doesn't solve the security problem: just because a
>> container hasn't attached an interior provider doesn't mean it should be
>> allowed complete access to all services provided from outside. This is
>> the nasty problem because it involves some type of filter on busses
>> which pass through containers.
>
> Fair point, we've been thinking about that as well. What we implemented
> for that is something we call 'custom endpoints', which is described in
> kdbus.endpoint(7).
>
> In short, an endpoint is an entry point to the bus. Each bus provides a
> default endpoint node that enforces the bus-wide policy rules that
> define which well-known names a peer may own, see, or talk to. Custom
> endpoints can be added to carry additional policy rules for peers
> connected through it, and redirecting a task or container to the custom
> endpoint instead of the default one is as easy as bind-mounting the
> node. systemd units actually have support for that since a while, which
> is how we tested this feature. This implementation doesn't even add much
> code to kdbus, because we do have the policy code around anyway, so
> that's just a matter of which policy database to look at during runtime.
>
> That said, it would actually even be easy to implement a way to allow
> overriding names on custom endpoints too, so that services inside a
> container can replace such that already exist on the bus. It's just that
> so far, we haven't yet seen a use case for this.

This is part of why I think that kdbus is the wrong design.  All of
this is great, but this is the kind of policy that IMO belongs in
userspace.  If nothing else, it means that you can add things like
this in the future without any kernel changes.

dbus-daemon can do all of this (in principle, anyway) already -- just
stick another dbus-daemon-like program in the container that proxies
things as appropriate.  I think that a good kernel-accelerated design
could do the same thing without having to put any of this type of
policy in the kernel.

(As an example, capability-based IPC gets all of this for free.)

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-21 16:36                               ` Eric W. Biederman
@ 2015-04-21 19:38                                 ` Matthew Garrett
  2015-04-21 19:55                                   ` Austin S Hemmelgarn
  0 siblings, 1 reply; 316+ messages in thread
From: Matthew Garrett @ 2015-04-21 19:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tom Gundersen, Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 21, 2015 at 11:36:54AM -0500, Eric W. Biederman wrote:
> 
> HeHeHe.  You mean all I need to do to get around all of the logging servers is
> capture CAP_SYS_BOOT?  Say like just capture this crazy watchdog program
> that doesn't run as root so that it can only reboot the system? HeHeHe
> So I can just trigger a clean reboot wait for journald, auditd, and
> syslog all to shut down and then do evil things to the machine without
> having to worry about erasing forensic evidence?

CAP_SYS_BOOT gives you kexec, and kexec with init=/bin/sh lets you do 
anything. You added that in dc009d92435f99498cbc579ce76bf28e837e2c14 and 
now the horse is long gone. Don't give CAP_SYS_BOOT to anything you 
don't trust with full privileges.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-21 19:38                                 ` Matthew Garrett
@ 2015-04-21 19:55                                   ` Austin S Hemmelgarn
  0 siblings, 0 replies; 316+ messages in thread
From: Austin S Hemmelgarn @ 2015-04-21 19:55 UTC (permalink / raw)
  To: Matthew Garrett, Eric W. Biederman
  Cc: Tom Gundersen, Jiri Kosina, Greg Kroah-Hartman, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

[-- Attachment #1: Type: text/plain, Size: 1051 bytes --]

On 2015-04-21 15:38, Matthew Garrett wrote:
> On Tue, Apr 21, 2015 at 11:36:54AM -0500, Eric W. Biederman wrote:
>>
>> HeHeHe.  You mean all I need to do to get around all of the logging servers is
>> capture CAP_SYS_BOOT?  Say like just capture this crazy watchdog program
>> that doesn't run as root so that it can only reboot the system? HeHeHe
>> So I can just trigger a clean reboot wait for journald, auditd, and
>> syslog all to shut down and then do evil things to the machine without
>> having to worry about erasing forensic evidence?
>
> CAP_SYS_BOOT gives you kexec, and kexec with init=/bin/sh lets you do
> anything. You added that in dc009d92435f99498cbc579ce76bf28e837e2c14 and
> now the horse is long gone. Don't give CAP_SYS_BOOT to anything you
> don't trust with full privileges.
>
The point is that Eric's suggestion works even on kernels without 
kexec(), which is significant because a significant number of security 
minded people (myself included) explicitly disable kexec in their kernel 
configuration.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Issues with capability bits and meta-data in kdbus
  2015-04-14 17:55     ` Greg Kroah-Hartman
@ 2015-04-21 21:06       ` Eric W. Biederman
  2015-04-22  1:30         ` Linus Torvalds
  0 siblings, 1 reply; 316+ messages in thread
From: Eric W. Biederman @ 2015-04-21 21:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg,
	jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz

Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:

> On Mon, Apr 13, 2015 at 07:19:49PM -0500, Eric W. Biederman wrote:
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>> 
>> > And the code that transfers the meta-data is wrong.
>> 
>> In fact it is worse than I thought.
>
> Please see the email response I just wrote to Andy about this, it should
> address these misconceptions.

Nothing has changed my analysis of the kdbus code.

Hopefully now that I have made some semblance of getting some rest and
have a little bit of time I can explain the issues that I see.  I will
be focusing on user namespaces and capability bits as that is my area of
the kernel.

- The userspace interface for capability bits has a version number that
  determines the quantity and the meaning of the bits, kdbus does not
  pass that number to userspace.

- There is a well defined translation of capability bits between user
  namespaces kdbus does not perform that translation.

- Access to the capability bits is guarded with PTRACE_MAY_READ
  kdbus does not honor that and thus leaks information.

- Usage of the capability bits by userspace is a layering violation.

- The layering violation results in a privilege escalation (by
  definiton) whenever userspace uses the presence of the capability bits
  to allow anything.

- Another kind of privilege escalation happens when userspace makes a
  decision based on the abscense of in kernel capability bits.

The only safe way for userspace to use the kernel capability bits is to
decide which small handful of processes need the bits.  Let those
processes be what implements userspace policy and drop the capabilities
from everything else.

With bugs at every layer of the implementation stack from implementation
to design I concluded earlier that the code that transfers these
capabilities is not at all mature or ready to be merged.  Which is why I
asked for the code to be left for later until someone would pay
attention to it properly.


I will add that when playing with the unix security mechanism the design
has to be done very carefully, as the system is fragile and certain
kinds of changes modify existing tested designs into insecure code.  I
fail to see any of that kind of needed care being applied to the design
of the kdbus mechanisms.


My conclusion is that the design of the meta-data passing code is
fundamentally broken.  The issues I have observed with the kernel
capability bits apply to all of the other meta-data except that
meta-data that unix domain sockets already pass.  The privilege
escalation issues are fundamentally unfixable, and apply in general.

As this is fundamental to kdbus, kdbus is apparently fundmanetally
broken by design.

Eric

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-21 21:06       ` Issues with capability bits and meta-data in kdbus Eric W. Biederman
@ 2015-04-22  1:30         ` Linus Torvalds
  2015-04-22  1:54           ` Andy Lutomirski
                             ` (2 more replies)
  0 siblings, 3 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-22  1:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 21, 2015 at 2:06 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> - The userspace interface for capability bits has a version number that
>   determines the quantity and the meaning of the bits, kdbus does not
>   pass that number to userspace.

Well, realistically, we'll just have to freeze the version anyway.
There's no sane way not to - if anybody believes that you can change
the version without breaking all kdbus users, they are living in some
drug-induced happy-land.

So we could make the version number available to match other
interfaces, but in practice that will just make people think that the
capabilities are versioned, which in reality they are not.

> - There is a well defined translation of capability bits between user
>   namespaces kdbus does not perform that translation.

Now this looks like a big oversight, and serious.

If you have a capability inside a lower namespace, and can fool the
other end of a kdbus connection into thinking that you have that
capability globally, then that sounds like a very obvious and bad
security issue. This needs fixing. That said, it's likely a fairly
simple fix.

> - Access to the capability bits is guarded with PTRACE_MAY_READ
>   kdbus does not honor that and thus leaks information.

Now, this is likely not a real problem.

Yes, when you try to read other processes capabilities, you need
PTRACE_MAY_READ to see them. HOWEVER, that's not really what a kdbus
message would do - it doesn't "read somebody elses capabilities". When
you do a kdbus write, you export your *own* capabilities. If you don't
want others to know what privileges you have, then you shouldn't be
using kdbus.

It's like saying "you need PTRACE_MAY_READ" to be able to read the
process image of another process. True. But if that other process does
a "write()" system call, then the written data will contain the data
from that process. That's how write works.

> - Usage of the capability bits by userspace is a layering violation.
>
> - The layering violation results in a privilege escalation (by
>   definiton) whenever userspace uses the presence of the capability bits
>   to allow anything.

Well, but that's a user space decision thing. If some system daemon
uses the capabilities of the other end to decide to accept some
message or not, that's a policy decision by that (privileged) system
daemon. The daemon itself obviously still needs to have the proper
capabilities in order to do any action that needs such capabilities.

So it's not a privilege escalation. It's intentional.

> - Another kind of privilege escalation happens when userspace makes a
>   decision based on the absense of in kernel capability bits.

I don't see that this is any different from the above.

So I agree that kdbus clearly *must* translate the capabilities when
passing messages from one namespace to another. The fact that you may
have setuid rights in one container, does *not* necessarily mean that
the recipient in another namespace should think you have that
capability - because in the receiving namespace the originator may not
have that capability at all.

So I think that one is a real and serious bug. But the other
complaints seem to be off the mark. It seems quite reasonable to me to
say that a recipient should be able to distinguish between *root*
sending it a dbus message to take down the system, and some random
luser doing the same.

               Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22  1:30         ` Linus Torvalds
@ 2015-04-22  1:54           ` Andy Lutomirski
  2015-04-22  2:32             ` Linus Torvalds
  2015-04-22 10:45           ` One Thousand Gnomes
  2015-04-22 11:41           ` David Herrmann
  2 siblings, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-22  1:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 21, 2015 at 6:30 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Apr 21, 2015 at 2:06 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> - Access to the capability bits is guarded with PTRACE_MAY_READ
>>   kdbus does not honor that and thus leaks information.
>
> Now, this is likely not a real problem.
>
> Yes, when you try to read other processes capabilities, you need
> PTRACE_MAY_READ to see them. HOWEVER, that's not really what a kdbus
> message would do - it doesn't "read somebody elses capabilities". When
> you do a kdbus write, you export your *own* capabilities. If you don't
> want others to know what privileges you have, then you shouldn't be
> using kdbus.
>
> It's like saying "you need PTRACE_MAY_READ" to be able to read the
> process image of another process. True. But if that other process does
> a "write()" system call, then the written data will contain the data
> from that process. That's how write works.

There's an interesting philosophical question here.

If kdbus were a general purpose IPC tool and if the libraries would
expose nice knobs like "set this flag if and only if you want to
assert CAP_WHATEVER to the server", then maybe this would be okay.

But I don't believe that for a second.  AFAICS sd-bus (maybe the
primary implementation) will always set that flag if for no other
reason than that it *doesn't know* when the client is trying to assert
a capability.  So we'd be giving users a gun which is, in practice,
only ever pointed at the users' feet.

It's a little worse than that, since this gun also shoots
cap_permitted, cap_inheritable, and bset bullets, none of which seem
usable for anything other than foot-shooting even in the best case.
And don't even get me started about some of the other metadata items.

All of this completely ignores the fact that selinux can and does
restrict access to /proc and kdbus will never respect those
restrictions.  Also, I'd like to see us move closer to a world where
real distros set hide_pid=1, and this is a big step in the opposite
direction.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22  1:54           ` Andy Lutomirski
@ 2015-04-22  2:32             ` Linus Torvalds
  2015-04-22  3:19               ` Andy Lutomirski
  2015-04-22 11:40               ` Austin S Hemmelgarn
  0 siblings, 2 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-22  2:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> If kdbus were a general purpose IPC tool

 .. but it's not ..

>                                     and if the libraries would
> expose nice knobs like "set this flag if and only if you want to
> assert CAP_WHATEVER to the server", then maybe this would be okay.

I really don't agree.

The whole notion that you should be able to be anonymous when
communicating is *wrong*.

Now, when you talk across machines using TCP, there's no identity that
you can trust, so anonymity is kind of enforced. But locally, if you
want to connect with somebody else, I actually think that tryin gto be
anonymous is just stupid - and more than that - plain wrong.

Yeah, if you do a pipe, that's one thing. You don't "connect" to
somebody else with a pipe, you just create both end points. So there
is little point in having identifying information for pipes. But when
you connect to a service, it just *makes sense* for the other end to
know about you. They should know your user ID, they should know your
identity (pid or whatever), and they should know your capabilities.

If you don't want that, then you use some anonymizing service, and it
because *your* problem. But a server that gets connected by different
people should know who it gets connected by.

Unix domain sockets simply got this wrong.

I don't think thi sis the problem of kdbus. There may be *other&*
problems, but I think it's very reasonable to just have as a basic
*requirement* that when you get connected, you get to know who
connects you, and what their rights are. Because it really *is*
something fundamental, and something important to know. Is it some
random nobody, or is it a system service?

If I was a service writer, that would be *the* most basic requirement
I would have. No ifs, buts or maybes about it.

           Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22  2:32             ` Linus Torvalds
@ 2015-04-22  3:19               ` Andy Lutomirski
  2015-04-22 13:46                 ` David Herrmann
  2015-04-22 11:40               ` Austin S Hemmelgarn
  1 sibling, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-22  3:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 21, 2015 at 7:32 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> If kdbus were a general purpose IPC tool
>
>  .. but it's not ..
>
>>                                     and if the libraries would
>> expose nice knobs like "set this flag if and only if you want to
>> assert CAP_WHATEVER to the server", then maybe this would be okay.
>
> I really don't agree.
>
> The whole notion that you should be able to be anonymous when
> communicating is *wrong*.
>
> Now, when you talk across machines using TCP, there's no identity that
> you can trust, so anonymity is kind of enforced. But locally, if you
> want to connect with somebody else, I actually think that tryin gto be
> anonymous is just stupid - and more than that - plain wrong.
>
> Yeah, if you do a pipe, that's one thing. You don't "connect" to
> somebody else with a pipe, you just create both end points. So there
> is little point in having identifying information for pipes. But when
> you connect to a service, it just *makes sense* for the other end to
> know about you. They should know your user ID, they should know your
> identity (pid or whatever), and they should know your capabilities.
>
> If you don't want that, then you use some anonymizing service, and it
> because *your* problem. But a server that gets connected by different
> people should know who it gets connected by.
>
> Unix domain sockets simply got this wrong.
>
> I don't think thi sis the problem of kdbus. There may be *other&*
> problems, but I think it's very reasonable to just have as a basic
> *requirement* that when you get connected, you get to know who
> connects you, and what their rights are. Because it really *is*
> something fundamental, and something important to know. Is it some
> random nobody, or is it a system service?

Where do you draw the line?

In the dbus dream world, if I type "wget
http://www.example.com/foobar", then my DNS resolver knows the whole
URL I'm downloading (cmdline), that I'm wget (or that I'm pretending
to be wget), that I inherited these capabilities from my caller (using
the totally broken capability inheritance model, but whatever), that I
started at time such-and-such (breaking CRIU), that my gid is
such-and-such, etc.  That DNS resolver ideally should sandbox itself,
but it now *necessarily* has this information leak in to its sandbox.

It gets worse.  My little tray notifier widget that shows the progress
bar also learns all this information.  Heck, my tray notifier probably
also finds out what capabilities systemd is holding every time I plug
in a USB stick, because of broadcasts.

It also conflates information-for-the-hell-of-it, auditing, and
authorization.  When I write a program that tries to drop privileges,
I need to know *what those privileges are*.  When every damn attribute
of my process might be seen as a privilege by whatever daft daemon,
system or otherwise, I'm talking to, it's *really hard* to figure out
what's exposed if I get compromised.

>
> If I was a service writer, that would be *the* most basic requirement
> I would have. No ifs, buts or maybes about it.

Sorry, but I don't believe you.  Do you really need to know all this
random information about everyone who connects?  Why only for local
users?  Why is it different over a network?  Which of those pieces of
information are you going to use for authentication?  Which are
potential information leaks for a reason that didn't even exist when
you wrote your service?

If I were a service writer, I would define what privileges are needed
to do what, and I would require those and no more.  I don't want to
know more, because everything else I learn for no good reason is a
potential security problem.

As a concrete example, remember all those lovely euid and caps bugs we
found in write(2) a couple years ago?  The only reason those bugs were
possible is because write(2) is implemented in the kernel, and the
kernel can do whatever the hell it wants including incorrectly looking
at caps in write(2) and those bugs were IMO especially embarrassing
because all of the code needed for the stupid interaction that caused
those bugs is in the kernel tree.  Everyone "knows" that write(2)
*must not honor effective caps*, but a lot of people forgot.

I'd much rather have dbus not tell the peer random stupid things about
me that the peer can fuck up than have dbus tell the peer all this
stuff for the hell of it, because the peer *will* fuck it up, and that
fuckup might only be discoverable when you look at the interaction of
several different programs supplied by several different vendors and
written years apart.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:29 ` Eric W. Biederman
  2015-04-13 19:42   ` Greg Kroah-Hartman
  2015-04-14  0:19   ` Eric W. Biederman
@ 2015-04-22  8:58   ` Borislav Petkov
  2015-04-23 19:14     ` Greg Kroah-Hartman
  2 siblings, 1 reply; 316+ messages in thread
From: Borislav Petkov @ 2015-04-22  8:58 UTC (permalink / raw)
  To: Eric W. Biederman, Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg,
	jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz

On Mon, Apr 13, 2015 at 02:29:35PM -0500, Eric W. Biederman wrote:
> And the code that transfers the meta-data is wrong.
> 
> It is generally not something that userspace requires today, certainly
> userspace is not using it.
> 
> You are exporting a weird set of information in a unique way that makes
> it race free enough to make ``security'' decisions upon but the data
> in general is not appropriate to make those decisions.
> 
> I remain opposed to this half thought out trash of an ABI for the
> meta-data.
> 
> Just because something happens to be exported in a DEBUG api today does
> not make it appropriate for userspace to run around making security
> decisions with that information.
> 
> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> I think it is premature to be merging kdbus.  You have fuddamental
> issues that can not be fixed once the ABI is frozen.
> 
> The semantics of the meta-data you export are extremely poorly defined.

Not only that - it looks like a serious amount of work on each sent
packet. So I did some staring, correct me if I missed something:

kdbus_cmd_send	- KDBUS_CMD_SEND, ioctl cmd, copy stuff from userspace
|-> kdbus_kmsg_new_from_cmd(), kmalloc+memset + prepare a *lot* of stuff like:
    |-> m->proc_meta = kdbus_meta_proc_new();
	m->conn_meta = kdbus_meta_conn_new();
	...
    |-> kdbus_bus_broadcast(conn->ep->bus, conn, kmsg); let's look at the broadcast mode
        |-> hash_for_each(bus->conn_hash, i, conn_dst, hentry) { 	iterate over hash buckets, O(256)
	    |-> kdbus_meta_proc_collect(kmsg->proc_meta, attach_flags);	collect a *lot* of stuff from current etc
	    |-> kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src, attach_flags); collect more stuff

and this happens on *every* send. A *lot* of work.

Now multiply that by the amount of messages this thing is going to send
per second. It piles up. So you have the overhead right then and there
in the design without even being able to fix it. Or at least pretty damn
hard to fix.

So unless I'm missing something, this right there is a design problem.

Why can't this messaging be done with a nifty O(1) scheme like sending
parties issuing auth tokens and whatever and the kernel doing the
arbitration and distribution of those tokens?

That gets you sandboxing, dropping privileges and whatever else fancy
containers people wanna do for free. Token recipient has the token -
that's all that counts.

Again, this is from a short staring only, I might just as well be
missing something but you'll tell me :-)

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22  1:30         ` Linus Torvalds
  2015-04-22  1:54           ` Andy Lutomirski
@ 2015-04-22 10:45           ` One Thousand Gnomes
  2015-04-22 11:41           ` David Herrmann
  2 siblings, 0 replies; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-22 10:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

> > - Access to the capability bits is guarded with PTRACE_MAY_READ
> >   kdbus does not honor that and thus leaks information.
> 
> Now, this is likely not a real problem.
> 
> Yes, when you try to read other processes capabilities, you need
> PTRACE_MAY_READ to see them. HOWEVER, that's not really what a kdbus
> message would do - it doesn't "read somebody elses capabilities". When
> you do a kdbus write, you export your *own* capabilities. If you don't
> want others to know what privileges you have, then you shouldn't be
> using kdbus.

That's broken but fixable.

It should not share any capability information *unless* you pass a flag
which says "flash my security badges around".

That fails safe (descriptor passed to another process), and gives a
default behaviour which is non surprising, non leaky and useful for
general purposes. This is also mirroring AF_LOCAL/AF_UNIX where you have
to choose to wave your bits in public.

(again its showing that kdbus really should be done by adding multicast
reliable delivery to AF_LOCAL sockets)

> So I think that one is a real and serious bug. But the other
> complaints seem to be off the mark. It seems quite reasonable to me to
> say that a recipient should be able to distinguish between *root*
> sending it a dbus message to take down the system, and some random
> luser doing the same.

Agreed but there are better ways to do this including opening some
kind of capability object and passing it as proof.

Also do I need to be root when I send the message or root when you ask ...


Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22  2:32             ` Linus Torvalds
  2015-04-22  3:19               ` Andy Lutomirski
@ 2015-04-22 11:40               ` Austin S Hemmelgarn
  2015-04-22 13:07                 ` Greg Kroah-Hartman
  2015-04-22 13:27                 ` Havoc Pennington
  1 sibling, 2 replies; 316+ messages in thread
From: Austin S Hemmelgarn @ 2015-04-22 11:40 UTC (permalink / raw)
  To: Linus Torvalds, Andy Lutomirski
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

[-- Attachment #1: Type: text/plain, Size: 465 bytes --]

On 2015-04-21 22:32, Linus Torvalds wrote:
> On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> If kdbus were a general purpose IPC tool
>
>   .. but it's not ..
>

Except, IIRC, that was one of the stated design goals in the original 
patch set.  I'm pretty sure that i remember a rather verbose exposition 
that pretty much could be summarized as "Linux has no general purpose 
IPC in the kernel, this fixes that"



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22  1:30         ` Linus Torvalds
  2015-04-22  1:54           ` Andy Lutomirski
  2015-04-22 10:45           ` One Thousand Gnomes
@ 2015-04-22 11:41           ` David Herrmann
  2 siblings, 0 replies; 316+ messages in thread
From: David Herrmann @ 2015-04-22 11:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Andy Lutomirski, Linux Kernel Mailing List, Daniel Mack,
	Djalal Harouni

Hi

On Wed, Apr 22, 2015 at 3:30 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Apr 21, 2015 at 2:06 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> - There is a well defined translation of capability bits between user
>>   namespaces kdbus does not perform that translation.
>
> Now this looks like a big oversight, and serious.
>
> If you have a capability inside a lower namespace, and can fool the
> other end of a kdbus connection into thinking that you have that
> capability globally, then that sounds like a very obvious and bad
> security issue. This needs fixing. That said, it's likely a fairly
> simple fix.

kdbus drops capability items if we cross user-namespaces. This
security issue was fixed in v2.

I think the translation Eric was referring to, is what cap_capable()
(security/commoncap.c) does. That is, it translates capabilities of
parent namespaces into its child namespaces (making the parent
privileged in the child namespace). Right now, we don't do this.
Instead, we drop the item, so the receiver knows that the information
could not be gathered (unlike a zeroed capability item, which means
the sender does not have the capabilities).

I have an experimental patch to support translation [1]. However, I
dislike copying the code from security/ into kdbus. So if we introduce
translation later on, I'd like to figure out with LSM developers how
to do this best.

Also note that /proc does not do any translation on its own, which
makes cap_get_pid() tricky to use.

>> - Access to the capability bits is guarded with PTRACE_MAY_READ
>>   kdbus does not honor that and thus leaks information.
>
> Now, this is likely not a real problem.
>
> Yes, when you try to read other processes capabilities, you need
> PTRACE_MAY_READ to see them. HOWEVER, that's not really what a kdbus
> message would do - it doesn't "read somebody elses capabilities". When
> you do a kdbus write, you export your *own* capabilities. If you don't
> want others to know what privileges you have, then you shouldn't be
> using kdbus.

I fully agree.

And to clear things up: there is no such PTRACE_MODE_READ protection
for capability bits in /proc.

Thanks!
David

[1] http://cgit.freedesktop.org/~dvdhrm/linux/commit/?h=kdbus&id=894085ff39afc653ed711102fa698d937818ce1f

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22 11:40               ` Austin S Hemmelgarn
@ 2015-04-22 13:07                 ` Greg Kroah-Hartman
  2015-04-22 14:05                   ` Austin S Hemmelgarn
  2015-04-22 13:27                 ` Havoc Pennington
  1 sibling, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-22 13:07 UTC (permalink / raw)
  To: Austin S Hemmelgarn
  Cc: Linus Torvalds, Andy Lutomirski, Eric W. Biederman,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Linux Kernel Mailing List, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 22, 2015 at 07:40:25AM -0400, Austin S Hemmelgarn wrote:
> On 2015-04-21 22:32, Linus Torvalds wrote:
> >On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >>
> >>If kdbus were a general purpose IPC tool
> >
> >  .. but it's not ..
> >
> 
> Except, IIRC, that was one of the stated design goals in the original patch
> set.  I'm pretty sure that i remember a rather verbose exposition that
> pretty much could be summarized as "Linux has no general purpose IPC in the
> kernel, this fixes that"

Did I say that somewhere?  Here's what the patchset has always started
with every time I have posted it for review, starting back last year in
October:

	kdbus is a kernel-level IPC implementation that aims for
	resemblance to the the protocol layer with the existing
	userspace D-Bus daemon while enabling some features that
	couldn't be implemented before in userspace.

2+ years ago, I had the dream that maybe we could make kdbus into the
"general purpose IPC layer for the kernel", but in working through all
of the issues, and the requirements of the userspace users and
protocols, it just really didn't work out that way, sorry.

I know some people would like such a "general purpose IPC", but perhaps
because no one has ever done it, maybe it either can't be done, or that
no one really wants such a thing. :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22 11:40               ` Austin S Hemmelgarn
  2015-04-22 13:07                 ` Greg Kroah-Hartman
@ 2015-04-22 13:27                 ` Havoc Pennington
  2015-04-22 14:35                   ` Michele Curti
  1 sibling, 1 reply; 316+ messages in thread
From: Havoc Pennington @ 2015-04-22 13:27 UTC (permalink / raw)
  To: Austin S Hemmelgarn
  Cc: Linus Torvalds, Andy Lutomirski, Eric W. Biederman,
	Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 22, 2015 at 7:40 AM, Austin S Hemmelgarn
<ahferroin7@gmail.com> wrote:
> Except, IIRC, that was one of the stated design goals in the original patch
> set.  I'm pretty sure that i remember a rather verbose exposition that
> pretty much could be summarized as "Linux has no general purpose IPC in the
> kernel, this fixes that"
>

This is probably just debating definitions and technicalities, but
what I'd say is that dbus is pretty universally applicable *within*
the case of connecting apps and services on the local machine. That's
what it's for. Right now it probably works for I don't know, 85-90% of
that, with kdbus trying to take it closer to 100% by removing
performance concerns and early boot concerns that currently rule out
certain uses.

If we say it isn't "general purpose" we could mean more than one thing -

 - it's a complete system / batteries-included, with a defined
protocol, vs. a "make your own protocol" kit
 - it isn't especially appropriate as a cross-machine protocol,
whether you mean within a cluster or across the internet
 - it isn't portable in a very useful way (it kind of runs on
windows/mac but isn't the native way of doing things there)

On the other hand, it is "general purpose" in the sense that so many
apps and services are using it for so many purposes already (i.e. it
isn't tied to a particular kind of app or service).

I just opened d-feet on my workstation which is default-ish Fedora 21,
I have ~35 well-known names available on the system bus, and ~100 (got
tired of counting) on the session bus. These are all kinds of
different apps and services.

d-feet incidentally is a good way to explore current usage of dbus -
you can list names, list objects within names, list methods/properties
on objects, and even call methods from d-feet. (There are command line
alternatives too like `gdbus`.)

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22  3:19               ` Andy Lutomirski
@ 2015-04-22 13:46                 ` David Herrmann
  0 siblings, 0 replies; 316+ messages in thread
From: David Herrmann @ 2015-04-22 13:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Eric W. Biederman, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Linux Kernel Mailing List, Daniel Mack,
	Djalal Harouni

Hi

On Wed, Apr 22, 2015 at 5:19 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> Where do you draw the line?

User-space draws _this_ line.

A bus creator can set the "mandatory metadata mask" of a bus. It
defines a mask all senders (!) have to use as base. The bus creator
can thus mandate a policy for its bus and force everyone who wants to
communicate via this bus to at least agree to transmit the requested
set of information. Using UIDs+GIDs+PIDs+seclabel+names as masks works
just fine.

To be clear, kdbus only transmits metadata that sender and receiver
both agreed on. Both peers have to opt-in for an item to be
transmitted.

Thanks
David

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22 13:07                 ` Greg Kroah-Hartman
@ 2015-04-22 14:05                   ` Austin S Hemmelgarn
  0 siblings, 0 replies; 316+ messages in thread
From: Austin S Hemmelgarn @ 2015-04-22 14:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andy Lutomirski, Eric W. Biederman,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Linux Kernel Mailing List, Daniel Mack,
	David Herrmann, Djalal Harouni

[-- Attachment #1: Type: text/plain, Size: 1712 bytes --]

On 2015-04-22 09:07, Greg Kroah-Hartman wrote:
> On Wed, Apr 22, 2015 at 07:40:25AM -0400, Austin S Hemmelgarn wrote:
>> On 2015-04-21 22:32, Linus Torvalds wrote:
>>> On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>>>
>>>> If kdbus were a general purpose IPC tool
>>>
>>>   .. but it's not ..
>>>
>>
>> Except, IIRC, that was one of the stated design goals in the original patch
>> set.  I'm pretty sure that i remember a rather verbose exposition that
>> pretty much could be summarized as "Linux has no general purpose IPC in the
>> kernel, this fixes that"
>
> Did I say that somewhere?  Here's what the patchset has always started
> with every time I have posted it for review, starting back last year in
> October:
>
> 	kdbus is a kernel-level IPC implementation that aims for
> 	resemblance to the the protocol layer with the existing
> 	userspace D-Bus daemon while enabling some features that
> 	couldn't be implemented before in userspace.
>
> 2+ years ago, I had the dream that maybe we could make kdbus into the
> "general purpose IPC layer for the kernel", but in working through all
> of the issues, and the requirements of the userspace users and
> protocols, it just really didn't work out that way, sorry.
>
I think it may have been someone else elaborating on this ideal that I 
was remembering.  Personally, I could care less whether it is considered 
'general purpose', as far as I'm concerned, POSIX semaphores, shm, and 
UDS fit all the IPC I ever need.  On that note, I have considered trying 
to implement SOCK_SEQPACKET support for AF_LOCAL, although I've gotten 
by just fine using SCTP over the loop-back interface.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22 13:27                 ` Havoc Pennington
@ 2015-04-22 14:35                   ` Michele Curti
  2015-04-22 20:02                     ` Havoc Pennington
  0 siblings, 1 reply; 316+ messages in thread
From: Michele Curti @ 2015-04-22 14:35 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Austin S Hemmelgarn, Linus Torvalds, Andy Lutomirski,
	Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

Hi Havoc.

On Wed, Apr 22, 2015 at 09:27:56AM -0400, Havoc Pennington wrote:
>
> If we say it isn't "general purpose" we could mean more than one thing -
>
>  - it's a complete system / batteries-included, with a defined
> protocol, vs. a "make your own protocol" kit
>  - it isn't especially appropriate as a cross-machine protocol,
> whether you mean within a cluster or across the internet
>  - it isn't portable in a very useful way (it kind of runs on
> windows/mac but isn't the native way of doing things there)
>
> On the other hand, it is "general purpose" in the sense that so many
> apps and services are using it for so many purposes already (i.e. it
> isn't tied to a particular kind of app or service).
>

Just out of curiosity, would you like to change something in dbus design,
if you didn't have to worry about ABI breaks and the like?

Thanks,
Michele


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22 14:35                   ` Michele Curti
@ 2015-04-22 20:02                     ` Havoc Pennington
  2015-04-22 21:48                       ` Linus Torvalds
  2015-04-23  8:38                       ` Michele Curti
  0 siblings, 2 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-22 20:02 UTC (permalink / raw)
  To: Michele Curti
  Cc: Austin S Hemmelgarn, Linus Torvalds, Andy Lutomirski,
	Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 22, 2015 at 10:35 AM, Michele Curti <michele.curti@gmail.com> wrote:
>
> Just out of curiosity, would you like to change something in dbus design,
> if you didn't have to worry about ABI breaks and the like?
>

Good question. I can't remember any big-picture things, I'm sure the
current maintainers and users have a longer list. :-) There are a
variety of little small things, some examples I can immediately think
of:

 * the ad hoc authentication protocol is sort of ugly
 * the byte order marker in every message is silly
 * protocol version in every message is useless
 * Ryan Lortie's nice fixes in GVariant, which I think kdbus adopts (
https://people.gnome.org/~ryanl/gvariant-serialisation.pdf ), for the
most part these are 'cleanups' but nullable types ("maybe" types for
Haskell fans) are a notable semantic addition
 * specify how it works on Windows, the Windows port last I checked
(years ago) didn't do things in a Windows-sensible way
 * specify what happens when resource limits are reached
 * wouldn't use XML for introspection data these days
http://dbus.freedesktop.org/doc/dbus-specification.html#introspection-format

The implementation has more problems:

 * libdbus had a flawed goal (be the underlying implementation used by
higher-level libs), it turns out it's better to implement the protocol
in every lib, libdbus was trying to serve too many masters. libdbus is
slow and has an annoying API, and the protocol is simple enough for
every "stack" (glib, python, etc.) to implement it themselves.
 * rethink what happens when hitting resource limits in the bus
daemon, as discussed in an earlier sub-thread
 * OOM handling code in the daemon is quite a burden, maybe there's a
better way http://blog.ometer.com/2008/02/04/out-of-memory-handling-d-bus-experience/
 * config file format, security policy stuff... work to do here

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22 20:02                     ` Havoc Pennington
@ 2015-04-22 21:48                       ` Linus Torvalds
  2015-04-23  5:35                         ` Havoc Pennington
  2015-04-24 14:32                         ` Olaf Hering
  2015-04-23  8:38                       ` Michele Curti
  1 sibling, 2 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-22 21:48 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Michele Curti, Austin S Hemmelgarn, Andy Lutomirski,
	Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 22, 2015 at 1:02 PM, Havoc Pennington <hp@pobox.com> wrote:
>  * the byte order marker in every message is silly

It's worse than that.

Conditional byte order is worse than silly - it's terminally stupid.

This is not a "per connection" thing or a "every message"; thing. It's
more fundamental than that. Protocols that have dynamic byte orders
are pure and utter crap.

The only sane model is to specify one fixed byte order. Seriously.
It's equally portable, it generates better code - even on
architectures that then have to unconditionally do byte order swapping
- and it's simpler to add static type checks for etc. It's literally
less code and faster to do a "bswap" instruction than to do a
conditional test of some variable (even if you can then avoid the
bswap dynamically),

In other words, think networking, which statically just decided to use
big-endian. Sure, that was the wrong choice in the end, but even
picking the wrong endianness - but picking it statically - is better
than the horrible mistake of thinking that you should have some
variable byte order.

              Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22 21:48                       ` Linus Torvalds
@ 2015-04-23  5:35                         ` Havoc Pennington
  2015-04-24 14:32                         ` Olaf Hering
  1 sibling, 0 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-23  5:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michele Curti, Austin S Hemmelgarn, Andy Lutomirski,
	Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 22, 2015 at 5:48 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Conditional byte order is worse than silly - it's terminally stupid.
>

Hey, usually I write a long rant myself, but I was trying to keep it
to one bullet point for once in my life. Way to ruin it, geez.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22 20:02                     ` Havoc Pennington
  2015-04-22 21:48                       ` Linus Torvalds
@ 2015-04-23  8:38                       ` Michele Curti
  1 sibling, 0 replies; 316+ messages in thread
From: Michele Curti @ 2015-04-23  8:38 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Austin S Hemmelgarn, Linus Torvalds, Andy Lutomirski,
	Eric W. Biederman, Greg Kroah-Hartman, Andrew Morton,
	Arnd Bergmann, One Thousand Gnomes, Tom Gundersen, Jiri Kosina,
	Linux Kernel Mailing List, Daniel Mack, David Herrmann,
	Djalal Harouni

On Wed, Apr 22, 2015 at 04:02:34PM -0400, Havoc Pennington wrote:
> On Wed, Apr 22, 2015 at 10:35 AM, Michele Curti <michele.curti@gmail.com> wrote:
> >
> > Just out of curiosity, would you like to change something in dbus design,
> > if you didn't have to worry about ABI breaks and the like?
> >
> 
> Good question. I can't remember any big-picture things, I'm sure the
> current maintainers and users have a longer list. :-) There are a
> variety of little small things, some examples I can immediately think
> of:
> 
>  * the ad hoc authentication protocol is sort of ugly
>  * the byte order marker in every message is silly
>  * protocol version in every message is useless
>  * Ryan Lortie's nice fixes in GVariant, which I think kdbus adopts (
> https://people.gnome.org/~ryanl/gvariant-serialisation.pdf ), for the
> most part these are 'cleanups' but nullable types ("maybe" types for
> Haskell fans) are a notable semantic addition
>  * specify how it works on Windows, the Windows port last I checked
> (years ago) didn't do things in a Windows-sensible way
>  * specify what happens when resource limits are reached
>  * wouldn't use XML for introspection data these days
> http://dbus.freedesktop.org/doc/dbus-specification.html#introspection-format
>

Nice, thanks!

It seems that all of these are userspace related only.  Yes I saw a "gvariant
readme" in systemd sources, now I understood what it is (I'm not an expert) :D 

My only fear was that kdbus was trying to keep something that even dbus himself
don't want.  But it seems that this is not the case.

Thanks, regards,
Michele


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-13 19:03 [GIT PULL] kdbus for 4.1-rc1 Greg Kroah-Hartman
  2015-04-13 19:29 ` Eric W. Biederman
  2015-04-13 20:13 ` Andy Lutomirski
@ 2015-04-23 13:05 ` Greg Kroah-Hartman
  2015-04-23 13:06   ` [PATCH] kdbus: pool: use __vfs_read() Greg Kroah-Hartman
                     ` (4 more replies)
  2 siblings, 5 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-23 13:05 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: Arnd Bergmann, ebiederm, gnomes, teg, jkosina, luto,
	linux-kernel, daniel, dh.herrmann, tixxdz

On Mon, Apr 13, 2015 at 09:03:50PM +0200, Greg Kroah-Hartman wrote:
> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> 
>   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> 
> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> 
>   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> 
> ----------------------------------------------------------------

Given this has been a crazy email thread, let's try to figure out what
the status is here.

Al Viro pointed out some odd locking (r/w lock only used in write mode),
and asked for some more documentation / description of the object model
used here.  David provided that, and will send a minor fix for the rw
lock, so I think that issue is now resolved.  David has created a few
other minor changes based on Al's review that I will forward on later.

Andy's concerns about the capability stuff has been hashed out in
multiple threads here.  The kernel code isn't buggy as-designed or
implemented from what we can all tell, it's just that the new
functionality isn't liked by everyone, which is totally fair, but not a
reason to declare that the function isn't useful.

Alan, and others, want a tiny, generic, multi-cast IPC method that also
works across networks.  They feel that this is something that D-Bus
might be able to use in the future in userspace to build on top of. 
Lots of people have said they want something like this for years, but
that doesn't address the issue here with kdbus, which is a very specific
solution for a very common and wide-spread usage model that Linux
userspace relies on today.  I too would love to see such an IPC be
created, and two years ago thought it would be possible to achieve
here.  But over time, and in working with the D-Bus model and
requirements, it just didn't happen here.  Given that no one has ever
been able to accomplish such a thing in the past means that it's either
impossible to do, or that no one really wants such a thing bad enough to
actually do the work :)

Did I miss anything else here?  Are there any technical reasons I'm
forgetting about for why this can't be pulled in as-is for this merge
window?

As for merging this, due to some changes in the vfs tree, specifically
due to 5d5d56897530 ("make new_sync_{read,write}() static"), after the
kdbus code is merged with your latest tree, it can cause problems, as
reported by Sergei Zviagintsev.  I didn't want to rebase anything, and
solving the issue against 3.19 would require us to export __vfs_read(),
as Al already did in your tree, so you can just merge it, and then apply
the patch I'll send in response to this message for it, which resolves
the issue.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* [PATCH] kdbus: pool: use __vfs_read()
  2015-04-23 13:05 ` Greg Kroah-Hartman
@ 2015-04-23 13:06   ` Greg Kroah-Hartman
  2015-04-23 14:17   ` [GIT PULL] kdbus for 4.1-rc1 One Thousand Gnomes
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-23 13:06 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton
  Cc: Arnd Bergmann, ebiederm, gnomes, teg, jkosina, luto,
	linux-kernel, daniel, dh.herrmann, tixxdz

From: Sergei Zviagintsev <sergei@s15v.net>

After commit 5d5d56897530 ("make new_sync_{read,write}() static")
->read() cannot be called directly.

kdbus_pool_slice_copy() leads to oops, which can be reproduced by
launching tools/testing/selftests/kdbus/kdbus-test -t message-quota:

[ 1167.146793] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1167.147554] IP: [<          (null)>]           (null)
[ 1167.148670] PGD 3a9dd067 PUD 3a841067 PMD 0
[ 1167.149611] Oops: 0010 [#1] SMP
[ 1167.150088] Modules linked in: nfsv3 nfs kdbus lockd grace sunrpc
[ 1167.150771] CPU: 0 PID: 518 Comm: kdbus-test Not tainted 4.0.0-next-20150420-kdbus #62
[ 1167.150771] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1167.150771] task: ffff88003daed120 ti: ffff88003a800000 task.ti: ffff88003a800000
[ 1167.150771] RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
[ 1167.150771] RSP: 0018:ffff88003a803bc0  EFLAGS: 00010286
[ 1167.150771] RAX: ffff8800377fb000 RBX: 00000000000201e8 RCX: ffff88003a803c00
[ 1167.150771] RDX: 0000000000000b40 RSI: ffff8800377fb4c0 RDI: ffff88003d815700
[ 1167.150771] RBP: ffff88003a803c48 R08: ffffffff8139e380 R09: ffff880039d80490
[ 1167.150771] R10: ffff88003a803a90 R11: 00000000000004c0 R12: 00000000002a24c0
[ 1167.150771] R13: 0000000000000b40 R14: ffff88003d815700 R15: ffffffff8139e460
[ 1167.150771] FS:  00007f41dccd4740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 1167.150771] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1167.150771] CR2: 0000000000000000 CR3: 000000003ccdf000 CR4: 00000000000007b0
[ 1167.150771] Stack:
[ 1167.150771]  ffffffffa0065497 ffff88003a803c10 00007ffffffff000 ffff88003aaa67c0
[ 1167.150771]  00000000000004c0 ffff88003aaa6870 ffff88003ca83300 ffffffffa006537d
[ 1167.150771]  00000000000201e8 ffffea0000ddfec0 ffff88003a803c20 0000000000000018
[ 1167.150771] Call Trace:
[ 1167.150771]  [<ffffffffa0065497>] ? kdbus_pool_slice_copy+0x127/0x200 [kdbus]
[ 1167.150771]  [<ffffffffa006537d>] ? kdbus_pool_slice_copy+0xd/0x200 [kdbus]
[ 1167.150771]  [<ffffffffa006670a>] kdbus_queue_entry_move+0xaa/0x180 [kdbus]
[ 1167.150771]  [<ffffffffa0059e64>] kdbus_conn_move_messages+0x1e4/0x2c0 [kdbus]
[ 1167.150771]  [<ffffffffa006234e>] kdbus_name_acquire+0x31e/0x390 [kdbus]
[ 1167.150771]  [<ffffffffa00625c5>] kdbus_cmd_name_acquire+0x125/0x130 [kdbus]
[ 1167.150771]  [<ffffffffa005db5d>] kdbus_handle_ioctl+0x4ed/0x610 [kdbus]
[ 1167.150771]  [<ffffffff811040e0>] do_vfs_ioctl+0x2e0/0x4e0
[ 1167.150771]  [<ffffffff81389750>] ? preempt_schedule_common+0x1f/0x3f
[ 1167.150771]  [<ffffffff8110431c>] SyS_ioctl+0x3c/0x80
[ 1167.150771]  [<ffffffff8138c36e>] system_call_fastpath+0x12/0x71
[ 1167.150771] Code:  Bad RIP value.
[ 1167.150771] RIP  [<          (null)>]           (null)
[ 1167.150771]  RSP <ffff88003a803bc0>
[ 1167.150771] CR2: 0000000000000000
[ 1167.168756] ---[ end trace a676bcfa75db5a96 ]---

Use __vfs_read() instead.

Signed-off-by: Sergei Zviagintsev <sergei@s15v.net>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 ipc/kdbus/pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ipc/kdbus/pool.c b/ipc/kdbus/pool.c
index 139bb77056b3..45dcdea505f4 100644
--- a/ipc/kdbus/pool.c
+++ b/ipc/kdbus/pool.c
@@ -675,7 +675,7 @@ int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
 		}
 
 		kaddr = (char __force __user *)kmap(page) + page_off;
-		n_read = f_src->f_op->read(f_src, kaddr, copy_len, &off_src);
+		n_read = __vfs_read(f_src, kaddr, copy_len, &off_src);
 		kunmap(page);
 		mark_page_accessed(page);
 		flush_dcache_page(page);
-- 
2.3.6


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 13:05 ` Greg Kroah-Hartman
  2015-04-23 13:06   ` [PATCH] kdbus: pool: use __vfs_read() Greg Kroah-Hartman
@ 2015-04-23 14:17   ` One Thousand Gnomes
  2015-04-23 16:36   ` Greg Kroah-Hartman
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 316+ messages in thread
From: One Thousand Gnomes @ 2015-04-23 14:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, ebiederm, teg,
	jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz

> Alan, and others, want a tiny, generic, multi-cast IPC method that also
> works across networks.  They feel that this is something that D-Bus

I never said - across networks. And locally it has been done, even
microcontrollers have done it.

> Lots of people have said they want something like this for years, but
> that doesn't address the issue here with kdbus, which is a very specific
> solution for a very common and wide-spread usage model that Linux

You've missed off a variety of important points that have been raised

- whether its a dumb model performancewise compared with using it to set
  up a memfd or similar
- cgroup interactions
- the heavyweight nature of going via get_user_pages and __vfs_read raher
  than just assuming message sizes are sensibly constrained and could far
  better just be allocated and copied to a refcounted kernel buffer
- exposure of capabilities and how you futureproof it

> userspace relies on today.  I too would love to see such an IPC be
> created, and two years ago thought it would be possible to achieve
> here.  But over time, and in working with the D-Bus model and
> requirements, it just didn't happen here.  Given that no one has ever
> been able to accomplish such a thing in the past means that it's either
> impossible to do, or that no one really wants such a thing bad enough to
> actually do the work :)
> 
> Did I miss anything else here?  Are there any technical reasons I'm
> forgetting about for why this can't be pulled in as-is for this merge
> window?

Like the outstanding NACKS ?

Greg - you are sounding like you have some kind of special entitlement to
ignore the way this works for everyone else. If you are feeling
frustrated, annoyed and led up several avenues at once then welcome to
the world of every other submitter who doesn't think have some kind of
magic stage door pass to get their crap in the kernel when there
are core maintainers asking hard and unanswerd questions and who have
nacked it.

There's no huge hurry. There are a bunch of things like the interactions
with cgroups, and the privilege and capability model which need careful
examination. Slipping it one release to get that right isn't a big deal -
it's not even as if you can't use hardware without it as with a driver
missing a merge - this is just a performance tweak.

Alan

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 13:05 ` Greg Kroah-Hartman
  2015-04-23 13:06   ` [PATCH] kdbus: pool: use __vfs_read() Greg Kroah-Hartman
  2015-04-23 14:17   ` [GIT PULL] kdbus for 4.1-rc1 One Thousand Gnomes
@ 2015-04-23 16:36   ` Greg Kroah-Hartman
  2015-04-23 16:46     ` Andy Lutomirski
  2015-04-23 18:33   ` Richard Weinberger
  2015-04-23 18:57   ` Kdbus needs meaningful review (was: Re: [GIT PULL] kdbus for 4.1-rc1) Eric W. Biederman
  4 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-23 16:36 UTC (permalink / raw)
  To: Linus Torvalds, luto, Andrew Morton
  Cc: Arnd Bergmann, ebiederm, gnomes, teg, jkosina, linux-kernel,
	daniel, dh.herrmann, tixxdz

On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote:
> 
> Andy's concerns about the capability stuff has been hashed out in
> multiple threads here.  The kernel code isn't buggy as-designed or
> implemented from what we can all tell, it's just that the new
> functionality isn't liked by everyone, which is totally fair, but not a
> reason to declare that the function isn't useful.

Andy, did I capture your existing position correctly?  If we drop the
caps metadata, I'm guessing that you are ok with the code as you have
reviewed it and tested it out.  So should I just add a small patch that
removes this for now?  After that, we can discuss the addition of
capabilities to the metadata as an add-on feature with a future patch
and not hold up this larger merge request?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 16:36   ` Greg Kroah-Hartman
@ 2015-04-23 16:46     ` Andy Lutomirski
  2015-04-23 17:16       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-23 16:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Thu, Apr 23, 2015 at 9:36 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote:
>>
>> Andy's concerns about the capability stuff has been hashed out in
>> multiple threads here.  The kernel code isn't buggy as-designed or
>> implemented from what we can all tell, it's just that the new
>> functionality isn't liked by everyone, which is totally fair, but not a
>> reason to declare that the function isn't useful.
>
> Andy, did I capture your existing position correctly?  If we drop the
> caps metadata, I'm guessing that you are ok with the code as you have
> reviewed it and tested it out.  So should I just add a small patch that
> removes this for now?  After that, we can discuss the addition of
> capabilities to the metadata as an add-on feature with a future patch
> and not hold up this larger merge request?

No.  I can fish out lists I've posted of what I personally dislike.
To repeat from my not-yet-awake memory, briefly:

 - starttime, cmdline, and possibly other pieces of metadata are also
problematic.  I think starttime is especially bad because it both
breaks CRIU and is IMO completely unnecessary -- I sent out draft
"highpid" patches a while ago to give a much better alternative that
isn't racy and won't break CRIU.  But cmdline is also IMO ridiculous.

 - There's still an open performance question.  Namely: is kdbus performant?

 - The policy system still sucks.  Now, if we give up on the idea of
anyone ever using it for anything other than dbus as it currently
works, maybe this isn't a real problem.

 - Someone should probably convince someone who understands memory
accounting that the pool mechanism accounts memory acceptably.  I
don't know much about mm stuff, but I think it's subject to all kinds
of nasty latency and accounting abuses, some of which might even be
exploited by accident.

I haven't reviewed most of it.  I've reviewed the metadata code (and
not recently) and the pool *docs*.

Shouldn't the bulk of this code have actual review before it gets
merged?  I've only reviewed some of it, and I didn't like what I found
in that small fraction, hence my objections to caps.

--Andy

>
> thanks,
>
> greg k-h



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 16:46     ` Andy Lutomirski
@ 2015-04-23 17:16       ` Greg Kroah-Hartman
  2015-04-23 17:34         ` Andy Lutomirski
                           ` (3 more replies)
  0 siblings, 4 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-23 17:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote:
> On Thu, Apr 23, 2015 at 9:36 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote:
> >>
> >> Andy's concerns about the capability stuff has been hashed out in
> >> multiple threads here.  The kernel code isn't buggy as-designed or
> >> implemented from what we can all tell, it's just that the new
> >> functionality isn't liked by everyone, which is totally fair, but not a
> >> reason to declare that the function isn't useful.
> >
> > Andy, did I capture your existing position correctly?  If we drop the
> > caps metadata, I'm guessing that you are ok with the code as you have
> > reviewed it and tested it out.  So should I just add a small patch that
> > removes this for now?  After that, we can discuss the addition of
> > capabilities to the metadata as an add-on feature with a future patch
> > and not hold up this larger merge request?
> 
> No.  I can fish out lists I've posted of what I personally dislike.
> To repeat from my not-yet-awake memory, briefly:
> 
>  - starttime, cmdline, and possibly other pieces of metadata are also
> problematic.  I think starttime is especially bad because it both
> breaks CRIU and is IMO completely unnecessary -- I sent out draft
> "highpid" patches a while ago to give a much better alternative that
> isn't racy and won't break CRIU.  But cmdline is also IMO ridiculous.

starttime was removed a while ago, are you sure you are looking at the
latest code?

cmdline has been discussed and it really helps with debugging.
Decisions aren't being made based on it.

>  - There's still an open performance question.  Namely: is kdbus performant?

Yes, I thought that was already answered.  Tizen posted some numbers
with a much older version of the code, before David fixed a bunch of
issues that he and you found, and that averaged between 25-50% faster.
Details are in this presentation:
	http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf

The Tizen and GENIVI developers are off running numbers with the latest
code, or so they told me through emails, but I don't know when/if that
will ever happen, so I can't promise more than what is already here.

>  - The policy system still sucks.  Now, if we give up on the idea of
> anyone ever using it for anything other than dbus as it currently
> works, maybe this isn't a real problem.

As designed, it's for D-Bus, so there's not much I can suggest here,
this isn't a "generic IPC" :)

The binder developers at Samsung have stated that the implementation we
have here works for their model as well, so I guess that is some kind of
verification it's not entirely tied to D-Bus.  They have plans on
dropping the existing binder kernel code and using the kdbus code
instead when it is merged.

>  - Someone should probably convince someone who understands memory
> accounting that the pool mechanism accounts memory acceptably.  I
> don't know much about mm stuff, but I think it's subject to all kinds
> of nasty latency and accounting abuses, some of which might even be
> exploited by accident.

Michal and David agree that this all works properly.  I don't know of
anyone else to ask about it, do you?

> I haven't reviewed most of it.  I've reviewed the metadata code (and
> not recently) and the pool *docs*.
> 
> Shouldn't the bulk of this code have actual review before it gets
> merged?  I've only reviewed some of it, and I didn't like what I found
> in that small fraction, hence my objections to caps.

I'd love more review, and we have been asking for it since last October.
You provided a lot of it a while ago, and that helped immensely.

I can't force anyone to read the code, I can only go on what people
offer to do.  We have 3 signed-off-bys on the main kdbus patches, and
numerous other different developers have provided fixes / tweaks that
are in this tree, so it's not like this is unread/unposted code here at
all.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 17:16       ` Greg Kroah-Hartman
@ 2015-04-23 17:34         ` Andy Lutomirski
  2015-04-23 17:42         ` Stephen Smalley
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-23 17:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Thu, Apr 23, 2015 at 10:16 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote:
>> On Thu, Apr 23, 2015 at 9:36 AM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> > On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote:
>> >>
>> >> Andy's concerns about the capability stuff has been hashed out in
>> >> multiple threads here.  The kernel code isn't buggy as-designed or
>> >> implemented from what we can all tell, it's just that the new
>> >> functionality isn't liked by everyone, which is totally fair, but not a
>> >> reason to declare that the function isn't useful.
>> >
>> > Andy, did I capture your existing position correctly?  If we drop the
>> > caps metadata, I'm guessing that you are ok with the code as you have
>> > reviewed it and tested it out.  So should I just add a small patch that
>> > removes this for now?  After that, we can discuss the addition of
>> > capabilities to the metadata as an add-on feature with a future patch
>> > and not hold up this larger merge request?
>>
>> No.  I can fish out lists I've posted of what I personally dislike.
>> To repeat from my not-yet-awake memory, briefly:
>>
>>  - starttime, cmdline, and possibly other pieces of metadata are also
>> problematic.  I think starttime is especially bad because it both
>> breaks CRIU and is IMO completely unnecessary -- I sent out draft
>> "highpid" patches a while ago to give a much better alternative that
>> isn't racy and won't break CRIU.  But cmdline is also IMO ridiculous.
>
> starttime was removed a while ago, are you sure you are looking at the
> latest code?

No, I'm sure I haven't.  I looked at the latest code just long enough
to see that caps were still there.  So the latest code is unreviewed
by me or, as far as I can tell, by anyone else who should review it.

>
> cmdline has been discussed and it really helps with debugging.
> Decisions aren't being made based on it.

This might be addressed by the module parameter.  Haven't checked
recent versions.

None of this addresses the fact that metadata is captured both at send
and connect time.  I still think that this is asking for tons of
security problems down the line.

>
>>  - There's still an open performance question.  Namely: is kdbus performant?
>
> Yes, I thought that was already answered.  Tizen posted some numbers
> with a much older version of the code, before David fixed a bunch of
> issues that he and you found, and that averaged between 25-50% faster.
> Details are in this presentation:
>         http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf
>

AFAICS no one has ever even tried to address whether the kdbus design
(shmem pools, send-time metadata, plus optional memfd) gives as good
performance as plain ol' sockets.  A lot of the complexity of kdbus is
due to its novel buffering scheme, and that scheme AFAICS has only
been seriously benchmarked against userspace dbus, which is a poor
reference.  I neither see any compelling a priori reason to think that
the buffering scheme is a performance win, nor do I see good numbers.
Instead, I've seen numbers suggesting that it's much slower than
AF_UNIX peer to peer.

I realize that it looks like I'm comparing apples (peer to peer) to
oranges (bus), but that's just because AF_UNIX really is the best
comparison in the absence of a serious attempt at a socket-like bus
with benchmarks.

> The Tizen and GENIVI developers are off running numbers with the latest
> code, or so they told me through emails, but I don't know when/if that
> will ever happen, so I can't promise more than what is already here.
>
>>  - The policy system still sucks.  Now, if we give up on the idea of
>> anyone ever using it for anything other than dbus as it currently
>> works, maybe this isn't a real problem.
>
> As designed, it's for D-Bus, so there's not much I can suggest here,
> this isn't a "generic IPC" :)

Move it to userspace with a daemon that answers policy questions and
makes introductions?

>
>>  - Someone should probably convince someone who understands memory
>> accounting that the pool mechanism accounts memory acceptably.  I
>> don't know much about mm stuff, but I think it's subject to all kinds
>> of nasty latency and accounting abuses, some of which might even be
>> exploited by accident.
>
> Michal and David agree that this all works properly.  I don't know of
> anyone else to ask about it, do you?

I thought Michal wasn't a little less convinced.  I really don't see
why pages allocated due to sends would be charged to the receiver, nor
do I see why, even if that were fixed, it wouldn't be a serious
performance problem with memcgs and memory pressure in play.

I'm really surprised that GENIVI is okay with this.  The latency seems
like it will be highly unpredictable.

>
>> I haven't reviewed most of it.  I've reviewed the metadata code (and
>> not recently) and the pool *docs*.
>>
>> Shouldn't the bulk of this code have actual review before it gets
>> merged?  I've only reviewed some of it, and I didn't like what I found
>> in that small fraction, hence my objections to caps.
>
> I'd love more review, and we have been asking for it since last October.
> You provided a lot of it a while ago, and that helped immensely.
>
> I can't force anyone to read the code, I can only go on what people
> offer to do.  We have 3 signed-off-bys on the main kdbus patches, and
> numerous other different developers have provided fixes / tweaks that
> are in this tree, so it's not like this is unread/unposted code here at
> all.

I think it doesn't help that reviewing the code can be a painful
exercise when threads about a single review point drag on for hundreds
of posts.  Also, it's discouraging that, after a single review point
results in hundreds of posts, reviewers get asked whether everything's
okay now.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 17:16       ` Greg Kroah-Hartman
  2015-04-23 17:34         ` Andy Lutomirski
@ 2015-04-23 17:42         ` Stephen Smalley
  2015-04-23 19:30           ` Greg Kroah-Hartman
  2015-04-23 17:57         ` Linus Torvalds
  2015-04-24 13:50         ` Lukasz Skalski
  3 siblings, 1 reply; 316+ messages in thread
From: Stephen Smalley @ 2015-04-23 17:42 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Andy Lutomirski
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote:
> The binder developers at Samsung have stated that the implementation we
> have here works for their model as well, so I guess that is some kind of
> verification it's not entirely tied to D-Bus.  They have plans on
> dropping the existing binder kernel code and using the kdbus code
> instead when it is merged.

Where do things stand wrt LSM hooks for kdbus?  I don't see any security
hook calls in the kdbus tree except for the purpose of metadata
collection of process security labels.  But nothing for enforcing MAC
over kdbus IPC.  binder has a set of security hooks for that purpose, so
it would be a regression wrt MAC enforcement to switch from binder to
kdbus without equivalent checking there.

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 17:16       ` Greg Kroah-Hartman
  2015-04-23 17:34         ` Andy Lutomirski
  2015-04-23 17:42         ` Stephen Smalley
@ 2015-04-23 17:57         ` Linus Torvalds
  2015-04-23 18:04           ` Linus Torvalds
  2015-04-23 18:48           ` Linus Torvalds
  2015-04-24 13:50         ` Lukasz Skalski
  3 siblings, 2 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-23 17:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Thu, Apr 23, 2015 at 10:16 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>>
>>  - starttime, cmdline, and possibly other pieces of metadata are also
>> problematic.  I think starttime is especially bad because it both
>> breaks CRIU and is IMO completely unnecessary -- I sent out draft
>> "highpid" patches a while ago to give a much better alternative that
>> isn't racy and won't break CRIU.  But cmdline is also IMO ridiculous.
>
> starttime was removed a while ago, are you sure you are looking at the
> latest code?
>
> cmdline has been discussed and it really helps with debugging.
> Decisions aren't being made based on it.

Quite frankly, I personally find cmdline/comm etc *much* worse than
sending the capabilities.

The whole notion of knowing "the other end is root" (or more
specifically some capability like "the other end can access raw
hardware") I think is a thing that absolutely makes sense in any
communication channel. I really don't even see why it would be
conditional. I mean, it's not exactly a secret anyway, and it just
makes *sense* for any protocol that may end up doing operations _for_
the recipient.

Same goes for uid etc - if you are implementing a service daemon, the
uid of the requester sure as hell makes a ton of difference in what
you might want to expose. Things like "does this user have access
rights to the printer?" are very natural questions to ask.

So I really don't understand why that part is even controversial.
kdbus wasn't meant to be some generic IPC mechanism. It is meant as a
way to talk to system daemons.

So the whole "capabilities and user information" is really to me a
non-issue. It's clearly required information, and if you don't want to
expose it, you damn well have absolutely *zero* business talking to
system daemons.

Really, it's that simple.

But things like "comm" and the cmdline? That makes me nervous. There
are real privacy issues there. Sure, maybe you think it's useful for
debugging, but the very fact that you think it's useful for debugging
makes me suspect you might be logging it (for future debugging). And
quite frankly, I don't think you should be logging things like that.
Yes, yes, if you're a system admin, you can find those things out, but
they should *not* be something that you just end up logging by mistake
or because "it's easy and all the information is right there".

If somebody is printing something, it shouldn't matter if it's "lpr"
or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
does the printing.

And you can go "but we don't log it" all you want. It's still a bad
idea. Sane people should refuse to allow a system service to see those
kinds of things by default, for a very simple reason: it's none of
their business.

So I'd suggest just getting rid of "tid_comm/pid_comm/cmdline". There
is no possible valid excuse for them. They aren't trustworthy anyway
(ie a real attacker can obfuscate them easily), and they *are*
potentially sensitive.

[ Side note: the tid_comm/pid_comm ones depend on TASK_COMM_LEN
anyway, which might change. a 16-byte command name used to be insanely
long in the traditional unix environment, but these days it's actually
regularly a truncated name due to programs called things like
"gnome-shell-extension-prefs" or
"abrt-action-generate-core-backtrace". ]

                         Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 17:57         ` Linus Torvalds
@ 2015-04-23 18:04           ` Linus Torvalds
  2015-04-23 18:56             ` Greg Kroah-Hartman
  2015-04-23 18:48           ` Linus Torvalds
  1 sibling, 1 reply; 316+ messages in thread
From: Linus Torvalds @ 2015-04-23 18:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If somebody is printing something, it shouldn't matter if it's "lpr"
> or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
> does the printing.

And btw, it's not just "this is information that shouldn't be logged".

It's literally "information that should not *ever* be used". I can
easily see some phone manufacturer deciding to do "value add" by
adding a special case where a special vendor system manager program
gets a back door to some service, because it needs to access the
camera for user identification at login time, so there's some magic

   if (!strcmp(client->pid_comm, "vendor-login-pr"))
       return ACCESS_OK;

because "it was the simplest way to do this", and the programmer knew
it was a hack, but he needed to get it working because he had a
deadline yesterday.

And then somebody figures this out, and makes an app that takes
pictures on your phone surreptitiously.

No, we can't protect against vendors doing stupid things, but we very
much also shouldn't make the kernel have interfaces that basically
encourage people to do stupid things because they make irrelevant and
wrongheaded data available.

                       Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 13:05 ` Greg Kroah-Hartman
                     ` (2 preceding siblings ...)
  2015-04-23 16:36   ` Greg Kroah-Hartman
@ 2015-04-23 18:33   ` Richard Weinberger
  2015-04-23 19:01     ` Greg Kroah-Hartman
  2015-04-23 18:57   ` Kdbus needs meaningful review (was: Re: [GIT PULL] kdbus for 4.1-rc1) Eric W. Biederman
  4 siblings, 1 reply; 316+ messages in thread
From: Richard Weinberger @ 2015-04-23 18:33 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	LKML, Daniel Mack, David Herrmann, Djalal Harouni,
	Borislav Petkov, Steven Rostedt

On Thu, Apr 23, 2015 at 3:05 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> Did I miss anything else here?  Are there any technical reasons I'm
> forgetting about for why this can't be pulled in as-is for this merge
> window?

Maybe I get again accused of  ``being a jerk'' but I still dare to ask about
Boris' unanswered question:
http://marc.info/?l=linux-kernel&m=142969313220781

In fact Boris and I are currently reviewing the code but it is a slow
process as we both have day jobs...
AFACT Steven is also looking at it.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 17:57         ` Linus Torvalds
  2015-04-23 18:04           ` Linus Torvalds
@ 2015-04-23 18:48           ` Linus Torvalds
  1 sibling, 0 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-23 18:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Same goes for uid etc - if you are implementing a service daemon, the
> uid of the requester sure as hell makes a ton of difference in what
> you might want to expose. Things like "does this user have access
> rights to the printer?" are very natural questions to ask.

Hmm. Looking at the code, it strikes me that not only does
kdbus_meta_proc_collect() collect too much, but some of what it
collects it just seems to do *wrong*.

So I agree with collecting user and credential information (obviously
unlike some people ;), but I think the code that does it is just
wrong.

The way to collect user and credential information is very simple: you
look at "file->f_cred".

That's _it_. Nothing more. Maybe you do "get_cred(file->f_cred):" if
you have lifetimes of this after the "struct file" is gone. But you
don't copy the fields individually or willy-nilly.

That "struct cred" reference gets you all you need. It gets you the
supplementary groups. It gets you the capabilities. It gets you the
user and group id's.

And equally importantly, it gets you the namespace so that you can do
conversions to random target namespaces later, when you actually *use*
the information.

There might be some question about whether you should use
"current->cred" or "file->f_cred", but the latter is almost always the
right thing to use when you are doing file operations. The unix
filesystem security model is about permissions at open time, not at
use time.

                         Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 18:04           ` Linus Torvalds
@ 2015-04-23 18:56             ` Greg Kroah-Hartman
  2015-04-23 19:22               ` Andy Lutomirski
  2015-04-23 20:51               ` Linus Torvalds
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-23 18:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Thu, Apr 23, 2015 at 11:04:36AM -0700, Linus Torvalds wrote:
> On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > If somebody is printing something, it shouldn't matter if it's "lpr"
> > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
> > does the printing.
> 
> And btw, it's not just "this is information that shouldn't be logged".
> 
> It's literally "information that should not *ever* be used". I can
> easily see some phone manufacturer deciding to do "value add" by
> adding a special case where a special vendor system manager program
> gets a back door to some service, because it needs to access the
> camera for user identification at login time, so there's some magic
> 
>    if (!strcmp(client->pid_comm, "vendor-login-pr"))
>        return ACCESS_OK;
> 
> because "it was the simplest way to do this", and the programmer knew
> it was a hack, but he needed to get it working because he had a
> deadline yesterday.
> 
> And then somebody figures this out, and makes an app that takes
> pictures on your phone surreptitiously.
> 
> No, we can't protect against vendors doing stupid things, but we very
> much also shouldn't make the kernel have interfaces that basically
> encourage people to do stupid things because they make irrelevant and
> wrongheaded data available.

Doing access control based on comm and cmdline is horrid, I totally
agree.  But right now, any process in the system can read any other
process's comm and cmdline value out of /proc today.  So removing it
from the metadata is fine for kdbus, I can live with that, but it really
isn't "preventing" anything that's not already visible to everyone, so
if someone wanting to be "bad" could always still log it or do anything
else they wanted with it.

Doesn't syslog uses it today all over the place for logging stuff that
happens in the system?

Or am I missing something here?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Kdbus needs meaningful review (was: Re: [GIT PULL] kdbus for 4.1-rc1)
  2015-04-23 13:05 ` Greg Kroah-Hartman
                     ` (3 preceding siblings ...)
  2015-04-23 18:33   ` Richard Weinberger
@ 2015-04-23 18:57   ` Eric W. Biederman
  4 siblings, 0 replies; 316+ messages in thread
From: Eric W. Biederman @ 2015-04-23 18:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, gnomes, teg,
	jkosina, luto, linux-kernel, daniel, dh.herrmann, tixxdz

Greg Kroah-Hartman <gregkh@linuxfoundation.org> writes:

> On Mon, Apr 13, 2015 at 09:03:50PM +0200, Greg Kroah-Hartman wrote:
>> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>> 
>>   Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>> 
>> are available in the git repository at:
>> 
>>   git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>> 
>> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>> 
>>   kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>> 
>> ----------------------------------------------------------------
>
> Given this has been a crazy email thread, let's try to figure out what
> the status is here.
>
> Al Viro pointed out some odd locking (r/w lock only used in write mode),
> and asked for some more documentation / description of the object model
> used here.  David provided that, and will send a minor fix for the rw
> lock, so I think that issue is now resolved.  David has created a few
> other minor changes based on Al's review that I will forward on later.

As I recall Al could not even trace through all of the locking, and that
prevented him looking at any of the bigger issues.  That does not
qualify as fixed with a little patch or two and everything thing is
fixed.

Perhaps with the minor patch or two Al could potentially actually review
the code.

> Andy's concerns about the capability stuff has been hashed out in
> multiple threads here.  The kernel code isn't buggy as-designed or
> implemented from what we can all tell, it's just that the new
> functionality isn't liked by everyone, which is totally fair, but not a
> reason to declare that the function isn't useful.

There are in fact implementation bugs and you asserion that there are
none is the largest reason this code should not be merged.  You are
turning a blind eye to problems.

> Alan, and others, want a tiny, generic, multi-cast IPC method that also
> works across networks.  They feel that this is something that D-Bus
> might be able to use in the future in userspace to build on top of. 
> Lots of people have said they want something like this for years, but
> that doesn't address the issue here with kdbus, which is a very specific
> solution for a very common and wide-spread usage model that Linux
> userspace relies on today.  I too would love to see such an IPC be
> created, and two years ago thought it would be possible to achieve
> here.  But over time, and in working with the D-Bus model and
> requirements, it just didn't happen here.  Given that no one has ever
> been able to accomplish such a thing in the past means that it's either
> impossible to do, or that no one really wants such a thing bad enough to
> actually do the work :)

What is the rush?

> Did I miss anything else here?  Are there any technical reasons I'm
> forgetting about for why this can't be pulled in as-is for this merge
> window?

****The code has not been meaningfully or properly reviewed.****


Greg you are pushing entirely too hard for this code to get it.  When
someone pushes as hard as you are doing, inevitiable problems get
through.

Greg it is my professional opinion that the code smells.  There are all
sort of missteps and oversights that indicate that almost certainly that
something important has been overlooked.

I do not believe this code is yet up to the standards we want for core
kernel code.

This code has astonishingly complex interactions with all kinds of other
kernel subsystems and concerns.  As a community we should understand
them and accept them before letting them in.


The only way I have seen anything make meaninful progress with those
kinds of interactions is for the pieces to be teased apart.  And then
the code incrementally added to with all of the right people being
pulled into the discussion.

I suspect removing all of the extensions to the capabilities of dbus-1
would be a good start for getting a piece of code that could be
meaningfully reviewed.  Then the controversial bits can be addressed on
their own.  As it is there is too much for to properly address any one
issue.


Greg this process has fundamentally not given people time to understand
the code, the interactions or the complexities.  The discussions show
that.


Further by refusing to tease apart the pieces.  By refusing to allow
other people time to understand this code.  By refusing to give an inch
and admit anyone else has a valid point real problems, and real issues
can not be revealed and fixed.

With such a pig headed direction I do not believe that kdbus is in any
sense ready for merging.


Eric

p.s. One of the issues of smell that I have been talking about is I see
in kdbus patterns of code construction that have caused real world
performance problems, and real world security issues.  And those issues
get ignored when brought up.

p.p.s Not that this complaint is not in any sense new you have been
ignoring people who try to bring up meaningful issues for a long time.
The fact that when people bring up uncomfortable points about the kdbus
code they get routingely blown off certainly contributes to the lack of
meaningful review as it is not rewarding to work with someone who does
not listen to criticism.  At this point the strongest possible language
and the strongest possible push back are being used because everything
else is routinely swept under the rug.



^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 18:33   ` Richard Weinberger
@ 2015-04-23 19:01     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-23 19:01 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	LKML, Daniel Mack, David Herrmann, Djalal Harouni,
	Borislav Petkov, Steven Rostedt

On Thu, Apr 23, 2015 at 08:33:47PM +0200, Richard Weinberger wrote:
> On Thu, Apr 23, 2015 at 3:05 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > Did I miss anything else here?  Are there any technical reasons I'm
> > forgetting about for why this can't be pulled in as-is for this merge
> > window?
> 
> Maybe I get again accused of  ``being a jerk'' but I still dare to ask about
> Boris' unanswered question:
> http://marc.info/?l=linux-kernel&m=142969313220781

No, I'm not going to say you are being a jerk for asking for a response,
only when you say things like "the code is too big" :)

I go reply now, I didn't really understand it the first time around, and
still don't the second time either...

> In fact Boris and I are currently reviewing the code but it is a slow
> process as we both have day jobs...

That's great, thanks for doing it.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-22  8:58   ` [GIT PULL] kdbus for 4.1-rc1 Borislav Petkov
@ 2015-04-23 19:14     ` Greg Kroah-Hartman
  2015-04-23 20:56       ` Borislav Petkov
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-23 19:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann,
	tixxdz

On Wed, Apr 22, 2015 at 10:58:28AM +0200, Borislav Petkov wrote:
> On Mon, Apr 13, 2015 at 02:29:35PM -0500, Eric W. Biederman wrote:
> > And the code that transfers the meta-data is wrong.
> > 
> > It is generally not something that userspace requires today, certainly
> > userspace is not using it.
> > 
> > You are exporting a weird set of information in a unique way that makes
> > it race free enough to make ``security'' decisions upon but the data
> > in general is not appropriate to make those decisions.
> > 
> > I remain opposed to this half thought out trash of an ABI for the
> > meta-data.
> > 
> > Just because something happens to be exported in a DEBUG api today does
> > not make it appropriate for userspace to run around making security
> > decisions with that information.
> > 
> > Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > 
> > I think it is premature to be merging kdbus.  You have fuddamental
> > issues that can not be fixed once the ABI is frozen.
> > 
> > The semantics of the meta-data you export are extremely poorly defined.
> 
> Not only that - it looks like a serious amount of work on each sent
> packet. So I did some staring, correct me if I missed something:
> 
> kdbus_cmd_send	- KDBUS_CMD_SEND, ioctl cmd, copy stuff from userspace
> |-> kdbus_kmsg_new_from_cmd(), kmalloc+memset + prepare a *lot* of stuff like:
>     |-> m->proc_meta = kdbus_meta_proc_new();
> 	m->conn_meta = kdbus_meta_conn_new();
> 	...
>     |-> kdbus_bus_broadcast(conn->ep->bus, conn, kmsg); let's look at the broadcast mode
>         |-> hash_for_each(bus->conn_hash, i, conn_dst, hentry) { 	iterate over hash buckets, O(256)

I don't know what O(256) means here, O notation usually is used to
show the complexity of a function, so this really is almost always the
same amount of time, based on using the hash function.  I've never seen
a number in O() before, but I went to school a long time ago, and
probably forgot something...

Or am I misunderstanding your note here?

> 	    |-> kdbus_meta_proc_collect(kmsg->proc_meta, attach_flags);	collect a *lot* of stuff from current etc
> 	    |-> kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src, attach_flags); collect more stuff
> 
> and this happens on *every* send. A *lot* of work.

Yes, these looks like a lot of stuff but it's still really fast.  And we
need it.

> Now multiply that by the amount of messages this thing is going to send
> per second. It piles up. So you have the overhead right then and there
> in the design without even being able to fix it. Or at least pretty damn
> hard to fix.

It's way faster than what we have today, and David has found a few
areas that can go faster, so I don't really understand the objection.
If you can come up with a faster way to do this, that would be great and
most appreciated.

> So unless I'm missing something, this right there is a design problem.
> 
> Why can't this messaging be done with a nifty O(1) scheme like sending
> parties issuing auth tokens and whatever and the kernel doing the
> arbitration and distribution of those tokens?

Hm, this seems to be to be O(1), pretty constant, we do the same amount
of work all the time.  Then we send the message to the people listening
to it (so that is O(n) depending on the number of listeners, really the
best that I think you can get).

Or am I misunderstanding what you are asking for here?

> That gets you sandboxing, dropping privileges and whatever else fancy
> containers people wanna do for free. Token recipient has the token -
> that's all that counts.

I don't understand what a token provides that is different from what is
happening here, please explain.  How can that be faster than what we do
today?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 18:56             ` Greg Kroah-Hartman
@ 2015-04-23 19:22               ` Andy Lutomirski
  2015-04-23 19:33                 ` Greg KH
  2015-04-23 20:51               ` Linus Torvalds
  1 sibling, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-23 19:22 UTC (permalink / raw)
  To: Greg KH
  Cc: One Thousand Gnomes, Arnd Bergmann, Linus Torvalds,
	Tom Gundersen, linux-kernel, Jiri Kosina, David Herrmann,
	Eric W. Biederman, Andrew Morton, Djalal Harouni, Daniel Mack

On Apr 23, 2015 11:56 AM, "Greg Kroah-Hartman"
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Apr 23, 2015 at 11:04:36AM -0700, Linus Torvalds wrote:
> > On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > If somebody is printing something, it shouldn't matter if it's "lpr"
> > > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
> > > does the printing.
> >
> > And btw, it's not just "this is information that shouldn't be logged".
> >
> > It's literally "information that should not *ever* be used". I can
> > easily see some phone manufacturer deciding to do "value add" by
> > adding a special case where a special vendor system manager program
> > gets a back door to some service, because it needs to access the
> > camera for user identification at login time, so there's some magic
> >
> >    if (!strcmp(client->pid_comm, "vendor-login-pr"))
> >        return ACCESS_OK;
> >
> > because "it was the simplest way to do this", and the programmer knew
> > it was a hack, but he needed to get it working because he had a
> > deadline yesterday.
> >
> > And then somebody figures this out, and makes an app that takes
> > pictures on your phone surreptitiously.
> >
> > No, we can't protect against vendors doing stupid things, but we very
> > much also shouldn't make the kernel have interfaces that basically
> > encourage people to do stupid things because they make irrelevant and
> > wrongheaded data available.
>
> Doing access control based on comm and cmdline is horrid, I totally
> agree.  But right now, any process in the system can read any other
> process's comm and cmdline value out of /proc today.  So removing it
> from the metadata is fine for kdbus, I can live with that, but it really
> isn't "preventing" anything that's not already visible to everyone, so
> if someone wanting to be "bad" could always still log it or do anything
> else they wanted with it.

I feel like a broken record.  This isn't true in general.  Selinux can
and, I believe, often does prevent this.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 17:42         ` Stephen Smalley
@ 2015-04-23 19:30           ` Greg Kroah-Hartman
  2015-04-24  2:08             ` Karol Lewandowski
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-23 19:30 UTC (permalink / raw)
  To: Stephen Smalley, Karol Lewandowski
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Thu, Apr 23, 2015 at 01:42:25PM -0400, Stephen Smalley wrote:
> On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote:
> > The binder developers at Samsung have stated that the implementation we
> > have here works for their model as well, so I guess that is some kind of
> > verification it's not entirely tied to D-Bus.  They have plans on
> > dropping the existing binder kernel code and using the kdbus code
> > instead when it is merged.
> 
> Where do things stand wrt LSM hooks for kdbus?  I don't see any security
> hook calls in the kdbus tree except for the purpose of metadata
> collection of process security labels.  But nothing for enforcing MAC
> over kdbus IPC.  binder has a set of security hooks for that purpose, so
> it would be a regression wrt MAC enforcement to switch from binder to
> kdbus without equivalent checking there.

There was a set of LSM hooks proposed for kdbus posted by Karol
Lewandowsk last October, and it also included SELinux and Smack patches.
They were going to be refreshed based on the latest code changes, but I
haven't seen them posted, or I can't seem to find them in my limited
email archive.

Karol, what's the status of them?

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 19:22               ` Andy Lutomirski
@ 2015-04-23 19:33                 ` Greg KH
  2015-04-23 20:53                   ` Linus Torvalds
  0 siblings, 1 reply; 316+ messages in thread
From: Greg KH @ 2015-04-23 19:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: One Thousand Gnomes, Arnd Bergmann, Linus Torvalds,
	Tom Gundersen, linux-kernel, Jiri Kosina, David Herrmann,
	Eric W. Biederman, Andrew Morton, Djalal Harouni, Daniel Mack

On Thu, Apr 23, 2015 at 12:22:10PM -0700, Andy Lutomirski wrote:
> On Apr 23, 2015 11:56 AM, "Greg Kroah-Hartman"
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Apr 23, 2015 at 11:04:36AM -0700, Linus Torvalds wrote:
> > > On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > >
> > > > If somebody is printing something, it shouldn't matter if it's "lpr"
> > > > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
> > > > does the printing.
> > >
> > > And btw, it's not just "this is information that shouldn't be logged".
> > >
> > > It's literally "information that should not *ever* be used". I can
> > > easily see some phone manufacturer deciding to do "value add" by
> > > adding a special case where a special vendor system manager program
> > > gets a back door to some service, because it needs to access the
> > > camera for user identification at login time, so there's some magic
> > >
> > >    if (!strcmp(client->pid_comm, "vendor-login-pr"))
> > >        return ACCESS_OK;
> > >
> > > because "it was the simplest way to do this", and the programmer knew
> > > it was a hack, but he needed to get it working because he had a
> > > deadline yesterday.
> > >
> > > And then somebody figures this out, and makes an app that takes
> > > pictures on your phone surreptitiously.
> > >
> > > No, we can't protect against vendors doing stupid things, but we very
> > > much also shouldn't make the kernel have interfaces that basically
> > > encourage people to do stupid things because they make irrelevant and
> > > wrongheaded data available.
> >
> > Doing access control based on comm and cmdline is horrid, I totally
> > agree.  But right now, any process in the system can read any other
> > process's comm and cmdline value out of /proc today.  So removing it
> > from the metadata is fine for kdbus, I can live with that, but it really
> > isn't "preventing" anything that's not already visible to everyone, so
> > if someone wanting to be "bad" could always still log it or do anything
> > else they wanted with it.
> 
> I feel like a broken record.  This isn't true in general.

Works on my box :)

> Selinux can and, I believe, often does prevent this.

Ok, then the LSM patches for kdbus should be able to also mediate this
as well if needed.  I haven't looked at the LSM kdbus patches in a long
time, so I don't remember exactly what they were looking at.

Again, I don't object to dropping this in kdbus, just confused as this
seemed to me to be something that is always available to all processes
anyway, we weren't adding something previously "hidden".

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 18:56             ` Greg Kroah-Hartman
  2015-04-23 19:22               ` Andy Lutomirski
@ 2015-04-23 20:51               ` Linus Torvalds
  1 sibling, 0 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-23 20:51 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andy Lutomirski, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Thu, Apr 23, 2015 at 11:56 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> Doing access control based on comm and cmdline is horrid, I totally
> agree.  But right now, any process in the system can read any other
> process's comm and cmdline value out of /proc today.

You have to work extra hard for it, and it's preventable anyway (ie selinux).

In contrast, with the information in the kdbus message, it's almost
certain that any random "enable debugging for dbus" patch will start
logging it, because "it's just there".

That's a big difference. Most bugs and security issues come because
people make trivial make trivial mistakes, not because people
explicitly go out of their way to make them.

> Doesn't syslog uses it today all over the place for logging stuff that
> happens in the system?

Hell no.

Sure, if an application explicitly says "log this message", then we
save the application name. But not for random system interactions.

The example Andy gave about doing things like name lookup is a good
one. Doesn't systemd already do a dns cache module?

Doing a name lookup is some *seriously* different thing than using
"syslog()" to explicitly log messages.

And if kdbus people can't see that difference, I don't see what we can
discuss here. Do you really not see the privacy implications? It turns
privacy violations from "you have to actually work at it" to "they
happen pretty much by mistake".

                           Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 19:33                 ` Greg KH
@ 2015-04-23 20:53                   ` Linus Torvalds
  0 siblings, 0 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-23 20:53 UTC (permalink / raw)
  To: Greg KH
  Cc: Andy Lutomirski, One Thousand Gnomes, Arnd Bergmann,
	Tom Gundersen, linux-kernel, Jiri Kosina, David Herrmann,
	Eric W. Biederman, Andrew Morton, Djalal Harouni, Daniel Mack

On Thu, Apr 23, 2015 at 12:33 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Thu, Apr 23, 2015 at 12:22:10PM -0700, Andy Lutomirski wrote:
>
>> Selinux can and, I believe, often does prevent this.
>
> Ok, then the LSM patches for kdbus should be able to also mediate this
> as well if needed.

No Greg.

Just remove the shit. Really. Take out the command line and the task
name. You already admitted that there is no actual valid use for it.

We don't add crap that then has to be disabled with secuirity rules
just because it was a bad interface. Just make the interface not do it
in the first place. It's that simple.

                        Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 19:14     ` Greg Kroah-Hartman
@ 2015-04-23 20:56       ` Borislav Petkov
  2015-04-23 21:22         ` David Herrmann
  2015-04-24  6:36         ` Greg Kroah-Hartman
  0 siblings, 2 replies; 316+ messages in thread
From: Borislav Petkov @ 2015-04-23 20:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann,
	tixxdz

On Thu, Apr 23, 2015 at 09:14:33PM +0200, Greg Kroah-Hartman wrote:
> I don't know what O(256) means here, O notation usually is used to
> show the complexity of a function, so this really is almost always the
> same amount of time, based on using the hash function.

This is iterating over 256 hash buckets. So O(n) complexity. Better?

> Yes, these looks like a lot of stuff but it's still really fast.

"really fast" - that's the right way to quantify things, right? Let me
reply in your terms: "no, it is dumb and slow".

> And we need it.

*Of* *course* you need it, what else. Lemme guess: there's no other
way to do this than the way it was done now, right? And we should stop
asking such stupid questions and accept it... Yeah, of course.

> Hm, this seems to be to be O(1), pretty constant, we do the same amount
> of work all the time.

The same *pile* of unnecessary and needless work. You go and collect
*all* that data on *every* packet send?!

How many packets per second are we talking here? 100, 1000, 10000...?

Let's say you're "really fast" because you've bought a "bigger machine"
and do that information collection per packet for, say 10 microseconds
(I'm probably too generous here but whatever).

So at peak rates of 10000 packets per second, and 10µs preparation time
per packet, you're wasting 100000 µs == 100 msec, i.e. 1/10th of a
second you're busy only with sending packets.

Hmm, but then the receiving side needs CPU time too...

Oh yeah, and then those pesky userspace processes need some CPU time
too...

Are you really serious or is this some tactic of deliberately asking
dumb questions? Let me know now so that I can stop wasting my time.

> I don't understand what a token provides that is different from what is
> happening here, please explain.  How can that be faster than what we do
> today?

A token-based scheme would give you significantly less traffic;
distributing those in sandboxing, containers, etc for free and you can
throw the metadata collecting in the garbage can:

Example:

* A daemon issues a token, say, a capability to reboot.

* It gives that token (with the kernel as intermediary) to a recipient
  which should be allowed to reboot.

* recipient can drop privileges, run in a sandbox, whatever, it still
  has that token.

That's exactly one packet sent *without* any information collection.
Recipient has to authenticate itself to the kernel when requesting the
packet.

Clean and simple.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 20:56       ` Borislav Petkov
@ 2015-04-23 21:22         ` David Herrmann
  2015-04-23 21:33           ` Richard Weinberger
  2015-04-23 21:41           ` Borislav Petkov
  2015-04-24  6:36         ` Greg Kroah-Hartman
  1 sibling, 2 replies; 316+ messages in thread
From: David Herrmann @ 2015-04-23 21:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Andy Lutomirski, linux-kernel, Daniel Mack,
	Djalal Harouni

Hi

On Thu, Apr 23, 2015 at 10:56 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 23, 2015 at 09:14:33PM +0200, Greg Kroah-Hartman wrote:
>> I don't know what O(256) means here, O notation usually is used to
>> show the complexity of a function, so this really is almost always the
>> same amount of time, based on using the hash function.
>
> This is iterating over 256 hash buckets. So O(n) complexity. Better?

No it's not. O(256) equals O(1).

Thanks
David

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 21:22         ` David Herrmann
@ 2015-04-23 21:33           ` Richard Weinberger
  2015-04-24 14:02             ` Steven Rostedt
  2015-04-23 21:41           ` Borislav Petkov
  1 sibling, 1 reply; 316+ messages in thread
From: Richard Weinberger @ 2015-04-23 21:33 UTC (permalink / raw)
  To: David Herrmann
  Cc: Borislav Petkov, Greg Kroah-Hartman, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-kernel, Daniel Mack, Djalal Harouni

On Thu, Apr 23, 2015 at 11:22 PM, David Herrmann <dh.herrmann@gmail.com> wrote:
> Hi
>
> On Thu, Apr 23, 2015 at 10:56 PM, Borislav Petkov <bp@alien8.de> wrote:
>> On Thu, Apr 23, 2015 at 09:14:33PM +0200, Greg Kroah-Hartman wrote:
>>> I don't know what O(256) means here, O notation usually is used to
>>> show the complexity of a function, so this really is almost always the
>>> same amount of time, based on using the hash function.
>>
>> This is iterating over 256 hash buckets. So O(n) complexity. Better?
>
> No it's not. O(256) equals O(1).

Yeah, that's absolutely correct.
I think Boris wanted to say that iterating over all hash buckets
can be costly.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 21:22         ` David Herrmann
  2015-04-23 21:33           ` Richard Weinberger
@ 2015-04-23 21:41           ` Borislav Petkov
  2015-04-24  5:02             ` Steven Noonan
  1 sibling, 1 reply; 316+ messages in thread
From: Borislav Petkov @ 2015-04-23 21:41 UTC (permalink / raw)
  To: David Herrmann
  Cc: Greg Kroah-Hartman, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Andy Lutomirski, linux-kernel, Daniel Mack,
	Djalal Harouni

On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote:
> No it's not. O(256) equals O(1).

Ok, you're right. Maybe O() was not the right thing to use when trying
to point out that iterating over 256 hash buckets and then following the
chain in each bucket per packet broadcast looks like a lot.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 19:30           ` Greg Kroah-Hartman
@ 2015-04-24  2:08             ` Karol Lewandowski
  2015-04-29 21:16               ` Paul Moore
  0 siblings, 1 reply; 316+ messages in thread
From: Karol Lewandowski @ 2015-04-24  2:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Paul Osmialowski
  Cc: Stephen Smalley, Karol Lewandowski, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni, k.lewandowsk

On Thu, Apr 23, 2015 at 09:30:13PM +0200, Greg Kroah-Hartman wrote:
> On Thu, Apr 23, 2015 at 01:42:25PM -0400, Stephen Smalley wrote:
> > On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote:
> > > The binder developers at Samsung have stated that the implementation we
> > > have here works for their model as well, so I guess that is some kind of
> > > verification it's not entirely tied to D-Bus.  They have plans on
> > > dropping the existing binder kernel code and using the kdbus code
> > > instead when it is merged.
> > 
> > Where do things stand wrt LSM hooks for kdbus?  I don't see any security
> > hook calls in the kdbus tree except for the purpose of metadata
> > collection of process security labels.  But nothing for enforcing MAC
> > over kdbus IPC.  binder has a set of security hooks for that purpose, so
> > it would be a regression wrt MAC enforcement to switch from binder to
> > kdbus without equivalent checking there.
> 
> There was a set of LSM hooks proposed for kdbus posted by Karol
> Lewandowsk last October, and it also included SELinux and Smack patches.
> They were going to be refreshed based on the latest code changes, but I
> haven't seen them posted, or I can't seem to find them in my limited
> email archive.

We have been waiting for right moment with these. :-)

> Karol, what's the status of them?

I have handed patchset over to Paul Osmialowski who started rework it for v4
relatively recently.  I think it shouldn't be that hard to post updated version...

Paul?


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 21:41           ` Borislav Petkov
@ 2015-04-24  5:02             ` Steven Noonan
  2015-04-24  9:04               ` Borislav Petkov
  0 siblings, 1 reply; 316+ messages in thread
From: Steven Noonan @ 2015-04-24  5:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: David Herrmann, Greg Kroah-Hartman, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-kernel, Daniel Mack, Djalal Harouni

On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote:
>> No it's not. O(256) equals O(1).
>
> Ok, you're right. Maybe O() was not the right thing to use when trying
> to point out that iterating over 256 hash buckets and then following the
> chain in each bucket per packet broadcast looks like a lot.
>

Heh. I guess you could call it an "expensive O(1)". While big-O
notation is useful for describing algorithm scalability with respect
to input size, it falls flat on its face when trying to articulate
impact in measurable units.

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 20:56       ` Borislav Petkov
  2015-04-23 21:22         ` David Herrmann
@ 2015-04-24  6:36         ` Greg Kroah-Hartman
  2015-04-24  6:45           ` Greg Kroah-Hartman
  1 sibling, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-24  6:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann,
	tixxdz

On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote:
> > Hm, this seems to be to be O(1), pretty constant, we do the same amount
> > of work all the time.
> 
> The same *pile* of unnecessary and needless work. You go and collect
> *all* that data on *every* packet send?!

No, not at all, the metadata is cached, we only collect that for the
first message sent, if we didn't know it already, or we do it on the
"open" of the connection, depending on what we are gathering metadata
for.

The mc->collected test right before collecting the specific metadata is
that "cached or not" test.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24  6:36         ` Greg Kroah-Hartman
@ 2015-04-24  6:45           ` Greg Kroah-Hartman
  2015-04-24  7:27             ` Martin Steigerwald
  2015-04-24  8:35             ` Greg Kroah-Hartman
  0 siblings, 2 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-24  6:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann,
	tixxdz

On Fri, Apr 24, 2015 at 08:36:03AM +0200, Greg Kroah-Hartman wrote:
> On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote:
> > > Hm, this seems to be to be O(1), pretty constant, we do the same amount
> > > of work all the time.
> > 
> > The same *pile* of unnecessary and needless work. You go and collect
> > *all* that data on *every* packet send?!
> 
> No, not at all, the metadata is cached, we only collect that for the
> first message sent, if we didn't know it already, or we do it on the
> "open" of the connection, depending on what we are gathering metadata
> for.
> 
> The mc->collected test right before collecting the specific metadata is
> that "cached or not" test.

Oh wait, no, there are some send-time metadata that is collected for
every message, see Linus's email for more details about that.  Maybe
this can be changed to cache things even more than we currently do.

it's early, shouldn't write emails before coffee...

David had some flamegraphs floating around that showed where all the
time on transmit / receive was being spent, and I don't think that the
metadata area was all that relevant, but I can't find them anymore to
say for sure.  There are other areas that can be sped up on the send
path, but perf data is the best way to verify this.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24  6:45           ` Greg Kroah-Hartman
@ 2015-04-24  7:27             ` Martin Steigerwald
  2015-04-24  8:35             ` Greg Kroah-Hartman
  1 sibling, 0 replies; 316+ messages in thread
From: Martin Steigerwald @ 2015-04-24  7:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Borislav Petkov, Eric W. Biederman, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, gnomes, teg, jkosina, luto,
	linux-kernel, daniel, dh.herrmann, tixxdz

Am Freitag, 24. April 2015, 08:45:15 schrieb Greg Kroah-Hartman:
> On Fri, Apr 24, 2015 at 08:36:03AM +0200, Greg Kroah-Hartman wrote:
> > On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote:
> > > > Hm, this seems to be to be O(1), pretty constant, we do the same
> > > > amount
> > > > of work all the time.
> > > 
> > > The same *pile* of unnecessary and needless work. You go and collect
> > > *all* that data on *every* packet send?!
> > 
> > No, not at all, the metadata is cached, we only collect that for the
> > first message sent, if we didn't know it already, or we do it on the
> > "open" of the connection, depending on what we are gathering metadata
> > for.
> > 
> > The mc->collected test right before collecting the specific metadata
> > is
> > that "cached or not" test.
> 
> Oh wait, no, there are some send-time metadata that is collected for
> every message, see Linus's email for more details about that.  Maybe
> this can be changed to cache things even more than we currently do.
> 
> it's early, shouldn't write emails before coffee...
> 
> David had some flamegraphs floating around that showed where all the
> time on transmit / receive was being spent, and I don't think that the
> metadata area was all that relevant, but I can't find them anymore to
> say for sure.  There are other areas that can be sped up on the send
> path, but perf data is the best way to verify this.

I think thats exactly the data that others have asked for several times, 
so I think it would be good to find it again.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24  6:45           ` Greg Kroah-Hartman
  2015-04-24  7:27             ` Martin Steigerwald
@ 2015-04-24  8:35             ` Greg Kroah-Hartman
  1 sibling, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-24  8:35 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	gnomes, teg, jkosina, luto, linux-kernel, daniel, dh.herrmann,
	tixxdz

On Fri, Apr 24, 2015 at 08:45:15AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Apr 24, 2015 at 08:36:03AM +0200, Greg Kroah-Hartman wrote:
> > On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote:
> > > > Hm, this seems to be to be O(1), pretty constant, we do the same amount
> > > > of work all the time.
> > > 
> > > The same *pile* of unnecessary and needless work. You go and collect
> > > *all* that data on *every* packet send?!
> > 
> > No, not at all, the metadata is cached, we only collect that for the
> > first message sent, if we didn't know it already, or we do it on the
> > "open" of the connection, depending on what we are gathering metadata
> > for.
> > 
> > The mc->collected test right before collecting the specific metadata is
> > that "cached or not" test.
> 
> Oh wait, no, there are some send-time metadata that is collected for
> every message, see Linus's email for more details about that.  Maybe
> this can be changed to cache things even more than we currently do.
> 
> it's early, shouldn't write emails before coffee...
> 
> David had some flamegraphs floating around that showed where all the
> time on transmit / receive was being spent, and I don't think that the
> metadata area was all that relevant, but I can't find them anymore to
> say for sure.  There are other areas that can be sped up on the send
> path, but perf data is the best way to verify this.

Here's the graphs that he posted during the last code review cycle that
are relevant here:
	http://lkml.iu.edu/hypermail/linux/kernel/1503.2/02624.html

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24  5:02             ` Steven Noonan
@ 2015-04-24  9:04               ` Borislav Petkov
  2015-04-24 10:28                 ` Daniel Mack
  0 siblings, 1 reply; 316+ messages in thread
From: Borislav Petkov @ 2015-04-24  9:04 UTC (permalink / raw)
  To: Steven Noonan
  Cc: David Herrmann, Greg Kroah-Hartman, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-kernel, Daniel Mack, Djalal Harouni

On Thu, Apr 23, 2015 at 10:02:52PM -0700, Steven Noonan wrote:
> On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov <bp@alien8.de> wrote:
> > On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote:
> >> No it's not. O(256) equals O(1).
> >
> > Ok, you're right. Maybe O() was not the right thing to use when trying
> > to point out that iterating over 256 hash buckets and then following the
> > chain in each bucket per packet broadcast looks like a lot.
> >
> 
> Heh. I guess you could call it an "expensive O(1)". While big-O
> notation is useful for describing algorithm scalability with respect
> to input size, it falls flat on its face when trying to articulate
> impact in measurable units.

Right, so in thinking about this more today, on a fresh head, it still
is O(n) because we do broadcast the packet to n recipients - the
hash_for_each() thing iterates over 256 hash buckets and also follows
the linked list chain in each bucket. Its length is depending on how
many connections are in the bucket, i.e. recipients. And I'd guess that
number changes dynamically so probably linear.

And then there's the collection of, let's call it metadata of
questionable use, *per* packet which is pretty expensive in my book.
It becomes even more expensive if it is completely useless as in, the
receiving side doesn't need it all.

Now, one might argue that you have to do O(n) work when broadcasting
to n recipients anyway and you can't get that cheaper but maybe the
design is not optimal. Maybe it could be made to not broadcast at all,
or broadcast to a subset of recipients, only those which are actually
interested in the broadcast.

That's why I was looking at some simple token-based schemes. And that's
why I think Andy has some very cool ideas which we should definitely pay
attention to:

https://lkml.kernel.org/r/CALCETrXXUiYKAhsXsdqH2uZMddDhK5hX6V9%2BrZcHwa1X5WC%2B1g@mail.gmail.com

before we go and commit this thing and cast it stone. Because if it goes
in, there's no changing it because we'll be then breaking userspace and
that's no-no.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24  9:04               ` Borislav Petkov
@ 2015-04-24 10:28                 ` Daniel Mack
  2015-04-24 10:50                   ` Borislav Petkov
  0 siblings, 1 reply; 316+ messages in thread
From: Daniel Mack @ 2015-04-24 10:28 UTC (permalink / raw)
  To: Borislav Petkov, Steven Noonan
  Cc: David Herrmann, Greg Kroah-Hartman, Eric W. Biederman,
	Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-kernel, Djalal Harouni

Hi,

On 04/24/2015 11:04 AM, Borislav Petkov wrote:
> On Thu, Apr 23, 2015 at 10:02:52PM -0700, Steven Noonan wrote:
>> On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov <bp@alien8.de> wrote:
>>> On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote:
>>>> No it's not. O(256) equals O(1).
>>>
>>> Ok, you're right. Maybe O() was not the right thing to use when trying
>>> to point out that iterating over 256 hash buckets and then following the
>>> chain in each bucket per packet broadcast looks like a lot.
>>>
>>
>> Heh. I guess you could call it an "expensive O(1)". While big-O
>> notation is useful for describing algorithm scalability with respect
>> to input size, it falls flat on its face when trying to articulate
>> impact in measurable units.
> 
> Right, so in thinking about this more today, on a fresh head, it still
> is O(n) because we do broadcast the packet to n recipients - the
> hash_for_each() thing iterates over 256 hash buckets and also follows
> the linked list chain in each bucket. Its length is depending on how
> many connections are in the bucket, i.e. recipients. And I'd guess that
> number changes dynamically so probably linear.

Sure, for broadcasts, we have to walk the list of peers connected to the
bus and see which one is interested in a particular message. We do that
by looking at the match rules of each of them, which are based on
well-known names, IDs, notification types or bloom filters. The policy
logic limits this further, as receivers of a broadcast must have TALK
access to the sender.

If these rules let a message pass, all the metadata that the receiving
peer asked for (by setting a flag at connect time) is collected, unless
it has been collected already for some other peer for the same message.
In other words, in worst case, we collect all the metadata items exactly
once per message.

If none of the connections with permissive match/policy rules for a
message is interested in any metadata items, nothing will be collected
at all.

The reason why the peers are organized in a hash table is that we have
to look them up by ID for unicast messages.

> And then there's the collection of, let's call it metadata of
> questionable use, *per* packet which is pretty expensive in my book.
> It becomes even more expensive if it is completely useless as in, the
> receiving side doesn't need it all.

If the receiving side doesn't need it, it shouldn't opt-in for that
piece of information.

The metadata logic is really only there so receiving peers are directly
supplied with information that they would otherwise look up themselves
from /proc or something. Also, we collect metadata at send time and for
every message intentionally, so that it reflects the state of the sender
at the time of sending. This way, the information is not subject to
races of asynchronous lookups.

> Now, one might argue that you have to do O(n) work when broadcasting
> to n recipients anyway and you can't get that cheaper but maybe the
> design is not optimal. Maybe it could be made to not broadcast at all,
> or broadcast to a subset of recipients, only those which are actually
> interested in the broadcast.

That's exactly what happens :) There are some more details on this in
kdbus.match(7).


Thanks,
Daniel


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24 10:28                 ` Daniel Mack
@ 2015-04-24 10:50                   ` Borislav Petkov
  2015-04-24 11:26                     ` Daniel Mack
  0 siblings, 1 reply; 316+ messages in thread
From: Borislav Petkov @ 2015-04-24 10:50 UTC (permalink / raw)
  To: Daniel Mack
  Cc: Steven Noonan, David Herrmann, Greg Kroah-Hartman,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-kernel, Djalal Harouni

Hi,

On Fri, Apr 24, 2015 at 12:28:54PM +0200, Daniel Mack wrote:
> Sure, for broadcasts, we have to walk the list of peers connected to the
> bus and see which one is interested in a particular message. We do that

And this "... we have to walk the list ..." right there raises the
alarm. Can this walking of elements where you know they wouldn't match
be avoided?

> by looking at the match rules of each of them, which are based on
> well-known names, IDs, notification types or bloom filters. The policy
> logic limits this further, as receivers of a broadcast must have TALK
> access to the sender.

So it sounds to me like there are characteristics which can already
prepare lists of recipients interested in some sort of message. So
would it be possible for recipients to "register" for such messages
and the sending side would simply iterate a list of solely interested
recipients?

This will definitely save you the iteration over all n connections and
would make the metadata collection probably not needed (or at least a
subset of it) because recipients will have to establish eligibility for
receiving a certain message at register time and once they're on the
list, you implicitly know why they're there.

I don't know whether that fits all use cases but it definitely does only
the *necessary* work for message transfer and not more.

> If these rules let a message pass, all the metadata that the receiving
> peer asked for (by setting a flag at connect time) is collected, unless
> it has been collected already for some other peer for the same message.
> In other words, in worst case, we collect all the metadata items exactly
> once per message.

Right.

> If none of the connections with permissive match/policy rules for a
> message is interested in any metadata items, nothing will be collected
> at all.

But we still iterate through there and look at the arg @what and
->collected. And this is useless work which can be avoided IMHO.

> If the receiving side doesn't need it, it shouldn't opt-in for that
> piece of information.
> 
> The metadata logic is really only there so receiving peers are directly
> supplied with information that they would otherwise look up themselves
> from /proc or something. Also, we collect metadata at send time and for
> every message intentionally, so that it reflects the state of the sender
> at the time of sending. This way, the information is not subject to
> races of asynchronous lookups.

Ok.

> > Now, one might argue that you have to do O(n) work when broadcasting
> > to n recipients anyway and you can't get that cheaper but maybe the
> > design is not optimal. Maybe it could be made to not broadcast at all,
> > or broadcast to a subset of recipients, only those which are actually
> > interested in the broadcast.
> 
> That's exactly what happens :) There are some more details on this in
> kdbus.match(7).

But this is not for KDBUS_DST_ID_BROADCAST types, right? Because there
you have to iterate over *all* recipients in the connection hash.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24 10:50                   ` Borislav Petkov
@ 2015-04-24 11:26                     ` Daniel Mack
  0 siblings, 0 replies; 316+ messages in thread
From: Daniel Mack @ 2015-04-24 11:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Steven Noonan, David Herrmann, Greg Kroah-Hartman,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-kernel, Djalal Harouni

Hi,

On 04/24/2015 12:50 PM, Borislav Petkov wrote:
> On Fri, Apr 24, 2015 at 12:28:54PM +0200, Daniel Mack wrote:
>> Sure, for broadcasts, we have to walk the list of peers connected to the
>> bus and see which one is interested in a particular message. We do that
> 
> And this "... we have to walk the list ..." right there raises the
> alarm. Can this walking of elements where you know they wouldn't match
> be avoided?

Yes, see below.

>> by looking at the match rules of each of them, which are based on
>> well-known names, IDs, notification types or bloom filters. The policy
>> logic limits this further, as receivers of a broadcast must have TALK
>> access to the sender.
> 
> So it sounds to me like there are characteristics which can already
> prepare lists of recipients interested in some sort of message. So
> would it be possible for recipients to "register" for such messages
> and the sending side would simply iterate a list of solely interested
> recipients?
> 
> This will definitely save you the iteration over all n connections and
> would make the metadata collection probably not needed (or at least a
> subset of it) because recipients will have to establish eligibility for
> receiving a certain message at register time and once they're on the
> list, you implicitly know why they're there.

David is working on patches that store hashes of the matches in trees so
we can look them up more efficiently. We'd still need to check the bloom
filter for all remaining candidates though.

These are, however, implementation details which potentially make the
code harder to read. We are well aware of certain spots that can be made
more efficient, but we were hoping for more reviews by keeping the
implementation simple for now.

>> If none of the connections with permissive match/policy rules for a
>> message is interested in any metadata items, nothing will be collected
>> at all.
> 
> But we still iterate through there and look at the arg @what and
> ->collected. And this is useless work which can be avoided IMHO.

Not sure if it really matters, but we can probably add an early bail
there, yes. Something like

	what &= ~mp->collected;
	if (!what)
		return;

Noted down, thanks!

>>> Now, one might argue that you have to do O(n) work when broadcasting
>>> to n recipients anyway and you can't get that cheaper but maybe the
>>> design is not optimal. Maybe it could be made to not broadcast at all,
>>> or broadcast to a subset of recipients, only those which are actually
>>> interested in the broadcast.
>>
>> That's exactly what happens :) There are some more details on this in
>> kdbus.match(7).
> 
> But this is not for KDBUS_DST_ID_BROADCAST types, right?

Yes it is - all broadcast messages are subject to opt-in filters
installed by the receiving peer.


Thanks,
Daniel





^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 17:16       ` Greg Kroah-Hartman
                           ` (2 preceding siblings ...)
  2015-04-23 17:57         ` Linus Torvalds
@ 2015-04-24 13:50         ` Lukasz Skalski
  2015-04-24 14:19           ` Havoc Pennington
  2015-04-27 21:32           ` Linus Torvalds
  3 siblings, 2 replies; 316+ messages in thread
From: Lukasz Skalski @ 2015-04-24 13:50 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Andy Lutomirski
  Cc: Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni, linux-kernel

Hi All,

On 04/23/2015 07:16 PM, Greg Kroah-Hartman wrote:
> On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote:
>>  - There's still an open performance question.  Namely: is kdbus performant?
> 
> Yes, I thought that was already answered.  Tizen posted some numbers
> with a much older version of the code, before David fixed a bunch of
> issues that he and you found, and that averaged between 25-50% faster.
> Details are in this presentation:
> 	http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf
> 
> The Tizen and GENIVI developers are off running numbers with the latest
> code, or so they told me through emails, but I don't know when/if that
> will ever happen, so I can't promise more than what is already here.
> 

I'm working on kdbus support for GLib ([1],[2]). I saw some questions
about kdbus performance, so I've prepared simple benchmark. Because
David already has posted some comparison results between kdbus and UDS,
I've decided to use my GLib port with native kdbus support (it should
be noted, that this port is not finished yet and there are still some
places for improvements, thus please do not treat these test results as
final).

To perform tests I've created two simple apps:

- server: http://fpaste.org/215157/
- client: http://fpaste.org/215156/

The first one (server) registers itself on the bus under well-known
name ("com.test.app") and waits for calls to its objects and methods.
The second one (client) makes calls and records periods of time between
moment of preparing of a call to the moment of receiving an answer. The
measurement is made by performing 20000 of calls and computing a sum of
duration of every call (for two different sizes of message payload:
1000 and 10000 bytes). The client program returns total time of
performed calls after successful execution. All tests have been run on
VirtualBox with ArchLinux and latest version of systemd and kdbus.

The test results are following:

+--------------+--------------------+--------------------+
|              |    Elapsed time    |    Elapsed time    |
| Message size |  GLIB WITH NATIVE  | GLIB + DBUS-DAEMON |
|   [bytes]    |    KDBUS SUPPORT*  |                    |
+--------------+--------------------+--------------------+
|              |    1) 2.874264 s   |    1) 4.624631 s   |
|     1000     |    2) 2.932835 s   |    2) 4.669730 s   |
|              |    3) 2.899634 s   |    3) 4.747275 s   |
|              |    4) 2.970106 s   |    4) 4.725723 s   |
+--------------+--------------------+--------------------+
|              |    3) 3.182379 s   |    3) 5.469663 s   |
|    10000     |    3) 3.334170 s   |    3) 5.520757 s   |
|              |    3) 3.353305 s   |    3) 5.556374 s   |
|              |    3) 3.367732 s   |    3) 5.597758 s   |
+--------------+--------------------+--------------------+

*all tests performed without using memfd mechanism.

I hope it will be useful for someone :)

[1] https://github.com/lukasz-skalski/glib
[2] https://bugzilla.gnome.org/show_bug.cgi?id=721861

Cheers,--
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
l.skalski@samsung.com

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-23 21:33           ` Richard Weinberger
@ 2015-04-24 14:02             ` Steven Rostedt
  0 siblings, 0 replies; 316+ messages in thread
From: Steven Rostedt @ 2015-04-24 14:02 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: David Herrmann, Borislav Petkov, Greg Kroah-Hartman,
	Eric W. Biederman, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, Andy Lutomirski,
	linux-kernel, Daniel Mack, Djalal Harouni

On Thu, Apr 23, 2015 at 11:33:19PM +0200, Richard Weinberger wrote:
> > No it's not. O(256) equals O(1).
> 
> Yeah, that's absolutely correct.
> I think Boris wanted to say that iterating over all hash buckets
> can be costly.

You are thinking of 'k' (the constant), where you usually have k*O(1), where k
does matter when comparing two algorithms with the same Big O value. And
sometimes even different O() values if the 'n' is small enough. 100*O(1) vs
1*O(n), the latter is better if n < 100.

Something that runs at O(n) but takes 1ms per n is a much worse algorithm than
something that runs at O(n) and takes 1us per n.

Both have the same O() notation, but which algorithm you use is obvious.

But Greg is right, you O notation isn't applicable here.

-- Steve


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24 13:50         ` Lukasz Skalski
@ 2015-04-24 14:19           ` Havoc Pennington
  2015-04-24 14:34             ` Lukasz Skalski
  2015-04-27 21:32           ` Linus Torvalds
  1 sibling, 1 reply; 316+ messages in thread
From: Havoc Pennington @ 2015-04-24 14:19 UTC (permalink / raw)
  To: Lukasz Skalski
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
> - client: http://fpaste.org/215156/
>

Cool - it might also be interesting to try this without blocking round
trips, i.e. send requests as quickly as you can, and collect replies
asynchronously. That's how people ideally use dbus. It should
certainly reduce the total benchmark time, but just wondering if this
usage increases or decreases the delta between userspace daemon and
kdbus.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-22 21:48                       ` Linus Torvalds
  2015-04-23  5:35                         ` Havoc Pennington
@ 2015-04-24 14:32                         ` Olaf Hering
  2015-04-24 14:39                           ` Michele Curti
                                             ` (2 more replies)
  1 sibling, 3 replies; 316+ messages in thread
From: Olaf Hering @ 2015-04-24 14:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Havoc Pennington, Michele Curti, Austin S Hemmelgarn,
	Andy Lutomirski, Eric W. Biederman, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Linux Kernel Mailing List, Daniel Mack,
	David Herrmann, Djalal Harouni

On Wed, Apr 22, Linus Torvalds wrote:

> Conditional byte order is worse than silly - it's terminally stupid.

> In other words, think networking, which statically just decided to use
> big-endian. Sure, that was the wrong choice in the end, but even

Why was that wrong? Any pointers to further details?

Olaf

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24 14:19           ` Havoc Pennington
@ 2015-04-24 14:34             ` Lukasz Skalski
  2015-04-24 19:25               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 316+ messages in thread
From: Lukasz Skalski @ 2015-04-24 14:34 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Linus Torvalds,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On 04/24/2015 04:19 PM, Havoc Pennington wrote:
> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
>> - client: http://fpaste.org/215156/
>>
> 
> Cool - it might also be interesting to try this without blocking round
> trips, i.e. send requests as quickly as you can, and collect replies
> asynchronously. That's how people ideally use dbus. It should
> certainly reduce the total benchmark time, but just wondering if this
> usage increases or decreases the delta between userspace daemon and
> kdbus.

No problem - I'll prepare also asynchronous version.

> 
> Havoc
> 

BR,
-- 
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
l.skalski@samsung.com

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-24 14:32                         ` Olaf Hering
@ 2015-04-24 14:39                           ` Michele Curti
  2015-04-24 15:02                             ` Olaf Hering
  2015-04-24 14:41                           ` Jiri Kosina
  2015-04-24 17:52                           ` Linus Torvalds
  2 siblings, 1 reply; 316+ messages in thread
From: Michele Curti @ 2015-04-24 14:39 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Linus Torvalds, Havoc Pennington, Austin S Hemmelgarn,
	Andy Lutomirski, Eric W. Biederman, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Linux Kernel Mailing List, Daniel Mack,
	David Herrmann, Djalal Harouni

On Fri, Apr 24, 2015 at 04:32:12PM +0200, Olaf Hering wrote:
> On Wed, Apr 22, Linus Torvalds wrote:
> 
> > Conditional byte order is worse than silly - it's terminally stupid.
> 
> > In other words, think networking, which statically just decided to use
> > big-endian. Sure, that was the wrong choice in the end, but even
> 
> Why was that wrong? Any pointers to further details?
>

http://www.barrgroup.com/Embedded-Systems/How-To/Big-Endian-Little-Endian

"Serious run-time performance penalties occur when using TCP/IP on a little
endian processor."

The Intel x86 and x86-64 series of processors use the little-endian format

I think for this reason..

Michele

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-24 14:32                         ` Olaf Hering
  2015-04-24 14:39                           ` Michele Curti
@ 2015-04-24 14:41                           ` Jiri Kosina
  2015-04-24 15:04                             ` Olaf Hering
  2015-04-24 17:52                           ` Linus Torvalds
  2 siblings, 1 reply; 316+ messages in thread
From: Jiri Kosina @ 2015-04-24 14:41 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Linus Torvalds, Havoc Pennington, Michele Curti,
	Austin S Hemmelgarn, Andy Lutomirski, Eric W. Biederman,
	Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Linux Kernel Mailing List,
	Daniel Mack, David Herrmann, Djalal Harouni

On Fri, 24 Apr 2015, Olaf Hering wrote:

> > Conditional byte order is worse than silly - it's terminally stupid.
> 
> > In other words, think networking, which statically just decided to use
> > big-endian. Sure, that was the wrong choice in the end, but even
> 
> Why was that wrong? Any pointers to further details?

Becase the architecture that is running on overwhelming majority of 
today's world computers is little-endian, and therefore has to convert all 
the time.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-24 14:39                           ` Michele Curti
@ 2015-04-24 15:02                             ` Olaf Hering
  2015-04-24 15:14                               ` Michele Curti
  0 siblings, 1 reply; 316+ messages in thread
From: Olaf Hering @ 2015-04-24 15:02 UTC (permalink / raw)
  To: Michele Curti
  Cc: Linus Torvalds, Havoc Pennington, Austin S Hemmelgarn,
	Andy Lutomirski, Eric W. Biederman, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Linux Kernel Mailing List, Daniel Mack,
	David Herrmann, Djalal Harouni

On Fri, Apr 24, Michele Curti wrote:

> http://www.barrgroup.com/Embedded-Systems/How-To/Big-Endian-Little-Endian
> "Serious run-time performance penalties occur when using TCP/IP on a little
> endian processor."

This URL lacks the numbers to proof such claim.

Olaf

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-24 14:41                           ` Jiri Kosina
@ 2015-04-24 15:04                             ` Olaf Hering
  0 siblings, 0 replies; 316+ messages in thread
From: Olaf Hering @ 2015-04-24 15:04 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Linus Torvalds, Havoc Pennington, Michele Curti,
	Austin S Hemmelgarn, Andy Lutomirski, Eric W. Biederman,
	Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	One Thousand Gnomes, Tom Gundersen, Linux Kernel Mailing List,
	Daniel Mack, David Herrmann, Djalal Harouni

On Fri, Apr 24, Jiri Kosina wrote:

> Becase the architecture that is running on overwhelming majority of 
> today's world computers is little-endian, and therefore has to convert all 
> the time.

There is no swap-on-load/store to compensate for the odd decision?

Olaf

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-24 15:02                             ` Olaf Hering
@ 2015-04-24 15:14                               ` Michele Curti
  0 siblings, 0 replies; 316+ messages in thread
From: Michele Curti @ 2015-04-24 15:14 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Linus Torvalds, Havoc Pennington, Austin S Hemmelgarn,
	Andy Lutomirski, Eric W. Biederman, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Linux Kernel Mailing List, Daniel Mack,
	David Herrmann, Djalal Harouni

On Fri, Apr 24, 2015 at 05:02:32PM +0200, Olaf Hering wrote:
> On Fri, Apr 24, Michele Curti wrote:
> 
> > http://www.barrgroup.com/Embedded-Systems/How-To/Big-Endian-Little-Endian
> > "Serious run-time performance penalties occur when using TCP/IP on a little
> > endian processor."
> 
> This URL lacks the numbers to proof such claim.
> 

?

It may be negligible compared to the whole stack operations, but swapping 
bytes require more instructions that not swapping bytes.

Michele


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-24 14:32                         ` Olaf Hering
  2015-04-24 14:39                           ` Michele Curti
  2015-04-24 14:41                           ` Jiri Kosina
@ 2015-04-24 17:52                           ` Linus Torvalds
  2015-04-24 18:00                             ` Linus Torvalds
  2 siblings, 1 reply; 316+ messages in thread
From: Linus Torvalds @ 2015-04-24 17:52 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Havoc Pennington, Michele Curti, Austin S Hemmelgarn,
	Andy Lutomirski, Eric W. Biederman, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Linux Kernel Mailing List, Daniel Mack,
	David Herrmann, Djalal Harouni

On Fri, Apr 24, 2015 at 7:32 AM, Olaf Hering <olaf@aepfle.de> wrote:
> On Wed, Apr 22, Linus Torvalds wrote:
>
>> Conditional byte order is worse than silly - it's terminally stupid.
>
>> In other words, think networking, which statically just decided to use
>> big-endian. Sure, that was the wrong choice in the end, but even
>
> Why was that wrong? Any pointers to further details?

Just because BE is effectively dead these days.  Every BE architecture
is either gone, or is slowly (or not so slowly) converting to LE.

But more importantly, even when you pick the byte order that history
then relegates to the the losing position, and you end up doing byte
swappign on most machines, that is *still* better than conditionally
*not* doing byte swapping.

So even today, by all means make your protocols or disk images use
big-endian byte formats. But do it unconditionally. Don't make the
mistake of encoding the byte order as part of the data, and then
dynamically switching things (or not) around.

In fact, even today a BE byte order can make sense, if only exactly
because of network byte order - thanks to network byte order, there
are all those nice mostly-portable and often well-optimized "ntohl()"
functions available to you. So rather than introduce your own helper
functions for LE byte order access, you may be better off just using
BE because of existing network infrastructure.

                        Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: Issues with capability bits and meta-data in kdbus
  2015-04-24 17:52                           ` Linus Torvalds
@ 2015-04-24 18:00                             ` Linus Torvalds
  0 siblings, 0 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-24 18:00 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Havoc Pennington, Michele Curti, Austin S Hemmelgarn,
	Andy Lutomirski, Eric W. Biederman, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, Linux Kernel Mailing List, Daniel Mack,
	David Herrmann, Djalal Harouni

On Fri, Apr 24, 2015 at 10:52 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So even today, by all means make your protocols or disk images use
> big-endian byte formats. But do it unconditionally. Don't make the
> mistake of encoding the byte order as part of the data, and then
> dynamically switching things (or not) around.

Side note: even if you pick the "wrong" byte order, an unconditional
byte ordering choice can often avoid the bswap, just because many
operations are byte order independent.

For example, you can still test specific bits in bitfields etc. If you
have a constant mask you want to check, instead of converting the
field to the CPU byte order at runtime, convert that constant _mask_
to the data byte order at compile-time, and use it without any dynamic
byte swapping of the data.

And generally, avoid byte swapping until you really need it. Some
people seem to think that if the data is in one particular byte order
on disk, you should byteswap as you read it in, and as you write it
out. No, it's often best to actually keep it in the original format,
and only byte swap when actually using the value. Then you can mmap
files and not generate extra copies, or - as per above - you may be
able to structure your code to never need the byte swap at all.

So this is why (for example) the kernel byte swapping helper functions
do odd things like this:

  #define __swab32(x)                             \
          (__builtin_constant_p((__u32)(x)) ?     \
          ___constant_swab32(x) :                 \
          __fswab32(x))

just because the "___constant_swab32()" is designed to do everything
at compile-time.

                   Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24 14:34             ` Lukasz Skalski
@ 2015-04-24 19:25               ` Greg Kroah-Hartman
  2015-04-27  8:57                 ` Lukasz Skalski
  0 siblings, 1 reply; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-24 19:25 UTC (permalink / raw)
  To: Lukasz Skalski
  Cc: Havoc Pennington, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:
> On 04/24/2015 04:19 PM, Havoc Pennington wrote:
> > On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
> >> - client: http://fpaste.org/215156/
> >>
> > 
> > Cool - it might also be interesting to try this without blocking round
> > trips, i.e. send requests as quickly as you can, and collect replies
> > asynchronously. That's how people ideally use dbus. It should
> > certainly reduce the total benchmark time, but just wondering if this
> > usage increases or decreases the delta between userspace daemon and
> > kdbus.
> 
> No problem - I'll prepare also asynchronous version.

That would be great to see as well.  Many thanks for doing this work.

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24 19:25               ` Greg Kroah-Hartman
@ 2015-04-27  8:57                 ` Lukasz Skalski
  2015-04-27 17:18                   ` Greg Kroah-Hartman
  2015-04-27 22:29                   ` David Lang
  0 siblings, 2 replies; 316+ messages in thread
From: Lukasz Skalski @ 2015-04-27  8:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Havoc Pennington
  Cc: Andy Lutomirski, Linus Torvalds, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote:
> On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:
>> On 04/24/2015 04:19 PM, Havoc Pennington wrote:
>>> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
>>>> - client: http://fpaste.org/215156/
>>>>
>>>
>>> Cool - it might also be interesting to try this without blocking round
>>> trips, i.e. send requests as quickly as you can, and collect replies
>>> asynchronously. That's how people ideally use dbus. It should
>>> certainly reduce the total benchmark time, but just wondering if this
>>> usage increases or decreases the delta between userspace daemon and
>>> kdbus.
>>
>> No problem - I'll prepare also asynchronous version.
> 
> That would be great to see as well.  Many thanks for doing this work.

As it was proposed by Havoc and Greg I've created simple benchmark for
asynchronous calls:

- server: http://fpaste.org/215157/ (the same as in the previous test)
- client: http://fpaste.org/215724/ (asynchronous version)

For asynchronous version of client I had to decrease number of calls to
128 (for synchronous version it was x20000 calls), otherwise we can
exceed the maximum number of pending replies per connection.

The test results are following:

+--------------+--------------------+--------------------+
|              |    Elapsed time    |    Elapsed time    |
| Message size |  GLIB WITH NATIVE  | GLIB + DBUS-DAEMON |
|   [bytes]    |    KDBUS SUPPORT*  |                    |
+--------------+--------------------+--------------------+
|              |    1) 0.018639 s   |    1) 0.029947 s   |
|     1000     |    2) 0.017045 s   |    2) 0.032812 s   |
|              |    3) 0.017490 s   |    3) 0.029971 s   |
|              |    4) 0.018001 s   |    4) 0.026485 s   |
+--------------+--------------------+--------------------+
|              |    3) 0.019898 s   |    3) 0.040914 s   |
|    10000     |    3) 0.022187 s   |    3) 0.033604 s   |
|              |    3) 0.020854 s   |    3) 0.037616 s   |
|              |    3) 0.020020 s   |    3) 0.033772 s   |
+--------------+--------------------+--------------------+
*all tests performed without using memfd mechanism.

And as I wrote in my previous mail, kdbus transport for GLib is not
finished yet and there are still some places for improvements, so please
do not treat these test results as final).

> 
> greg k-h
> 

Cheers,
-- 
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
l.skalski@samsung.com

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-27  8:57                 ` Lukasz Skalski
@ 2015-04-27 17:18                   ` Greg Kroah-Hartman
  2015-04-27 22:29                   ` David Lang
  1 sibling, 0 replies; 316+ messages in thread
From: Greg Kroah-Hartman @ 2015-04-27 17:18 UTC (permalink / raw)
  To: Lukasz Skalski
  Cc: Havoc Pennington, Andy Lutomirski, Linus Torvalds, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Mon, Apr 27, 2015 at 10:57:45AM +0200, Lukasz Skalski wrote:
> On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote:
> > On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:
> >> On 04/24/2015 04:19 PM, Havoc Pennington wrote:
> >>> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
> >>>> - client: http://fpaste.org/215156/
> >>>>
> >>>
> >>> Cool - it might also be interesting to try this without blocking round
> >>> trips, i.e. send requests as quickly as you can, and collect replies
> >>> asynchronously. That's how people ideally use dbus. It should
> >>> certainly reduce the total benchmark time, but just wondering if this
> >>> usage increases or decreases the delta between userspace daemon and
> >>> kdbus.
> >>
> >> No problem - I'll prepare also asynchronous version.
> > 
> > That would be great to see as well.  Many thanks for doing this work.
> 
> As it was proposed by Havoc and Greg I've created simple benchmark for
> asynchronous calls:
> 
> - server: http://fpaste.org/215157/ (the same as in the previous test)
> - client: http://fpaste.org/215724/ (asynchronous version)
> 
> For asynchronous version of client I had to decrease number of calls to
> 128 (for synchronous version it was x20000 calls), otherwise we can
> exceed the maximum number of pending replies per connection.
> 
> The test results are following:
> 
> +--------------+--------------------+--------------------+
> |              |    Elapsed time    |    Elapsed time    |
> | Message size |  GLIB WITH NATIVE  | GLIB + DBUS-DAEMON |
> |   [bytes]    |    KDBUS SUPPORT*  |                    |
> +--------------+--------------------+--------------------+
> |              |    1) 0.018639 s   |    1) 0.029947 s   |
> |     1000     |    2) 0.017045 s   |    2) 0.032812 s   |
> |              |    3) 0.017490 s   |    3) 0.029971 s   |
> |              |    4) 0.018001 s   |    4) 0.026485 s   |
> +--------------+--------------------+--------------------+
> |              |    3) 0.019898 s   |    3) 0.040914 s   |
> |    10000     |    3) 0.022187 s   |    3) 0.033604 s   |
> |              |    3) 0.020854 s   |    3) 0.037616 s   |
> |              |    3) 0.020020 s   |    3) 0.033772 s   |
> +--------------+--------------------+--------------------+
> *all tests performed without using memfd mechanism.
> 
> And as I wrote in my previous mail, kdbus transport for GLib is not
> finished yet and there are still some places for improvements, so please
> do not treat these test results as final).

Very nice, thanks.  Any chance you can bump those message sizes up to
over 512k?  I think that will show a huge difference.  Even just under
512k should be faster, as you have shown, but I have been told that for
messages larger than 512k, the D-Bus daemon has "issues", which has kept
people from wanting to use messages that large before now.

thanks again,

greg k-h

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-24 13:50         ` Lukasz Skalski
  2015-04-24 14:19           ` Havoc Pennington
@ 2015-04-27 21:32           ` Linus Torvalds
  2015-04-27 21:40             ` Andy Lutomirski
  2015-04-28 10:39             ` Lukasz Skalski
  1 sibling, 2 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-27 21:32 UTC (permalink / raw)
  To: Lukasz Skalski
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
>
> To perform tests I've created two simple apps:
>
> - server: http://fpaste.org/215157/
> - client: http://fpaste.org/215156/

So since Andy reported that dbus seems to be a few orders of magnitude
too slow, I tried to build these apps to see what it even does.

They don't buidl on F21. You seem to be using features that are too
new to exist even in fairly modern distros:

    server.c:47:24: error: ‘G_BUS_TYPE_USER’ undeclared

so I can't even see what dbus does *now*.

That said, either you're running your test on a potato, or dbus is
seriously screwed up. No way should it take 4+ seconds to send a 1000b
message to back and forth 20k times. But as mentioned, I can't even
see what it's doing right now.

                      Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-27 21:32           ` Linus Torvalds
@ 2015-04-27 21:40             ` Andy Lutomirski
  2015-04-27 22:00               ` Linus Torvalds
  2015-04-28 10:39             ` Lukasz Skalski
  1 sibling, 1 reply; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-27 21:40 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Mon, Apr 27, 2015 at 2:32 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
>>
>> To perform tests I've created two simple apps:
>>
>> - server: http://fpaste.org/215157/
>> - client: http://fpaste.org/215156/
>
> So since Andy reported that dbus seems to be a few orders of magnitude
> too slow, I tried to build these apps to see what it even does.
>
> They don't buidl on F21. You seem to be using features that are too
> new to exist even in fairly modern distros:
>
>     server.c:47:24: error: ‘G_BUS_TYPE_USER’ undeclared
>
> so I can't even see what dbus does *now*.

Change "USER" to "SESSION".  Build with:

gcc -Wall -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -o
client client.c -lglib-2.0 -ldbus-glib-1 -ldbus-1 -lgobject-2.0
-lglib-2.0 -ldbus-1 -lgio-2.0

The again with s/client/server/

For all I know, the USER vs SESSION distinction matters, but I can't
imagine why.

>
> That said, either you're running your test on a potato, or dbus is
> seriously screwed up. No way should it take 4+ seconds to send a 1000b
> message to back and forth 20k times. But as mentioned, I can't even
> see what it's doing right now.

Whee!  I'm typing this email on a potato!

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-27 21:40             ` Andy Lutomirski
@ 2015-04-27 22:00               ` Linus Torvalds
  2015-04-27 22:14                 ` Linus Torvalds
  2015-04-28 12:49                 ` Havoc Pennington
  0 siblings, 2 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-27 22:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Mon, Apr 27, 2015 at 2:40 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Change "USER" to "SESSION".

That works.

>  Build with:

Hell no. I used

  gcc client.c -o client $(pkg-config --cflags --libs gtk+-2.0)

instead. That worked.

>> That said, either you're running your test on a potato, or dbus is
>> seriously screwed up. No way should it take 4+ seconds to send a 1000b
>> message to back and forth 20k times. But as mentioned, I can't even
>> see what it's doing right now.
>
> Whee!  I'm typing this email on a potato!

No, I think you're right, there's the other non-potato choice: "dbus
is seriously screwed up".

That thing has almost no kernel footprint. It's spending all it's time
in user space overhead.

Quite frankly, the whole "kdbus is important for performance" seems to
be *totally* invalidated by even a minimal look at profiles for that
thing. Here's the top-15 offender list:

   2.62%  gdbus    libc-2.20.so                [.] _int_malloc
   2.43%  gdbus    libc-2.20.so                [.] free
   2.31%  server   libc-2.20.so                [.] free
   2.12%  gdbus    libc-2.20.so                [.] malloc
   1.77%  gdbus    libglib-2.0.so.0.4200.2     [.] g_utf8_validate
   1.43%  gdbus    libglib-2.0.so.0.4200.2     [.] g_slice_alloc
   1.41%  gdbus    libglib-2.0.so.0.4200.2     [.] g_hash_table_lookup
   1.28%  server   libc-2.20.so                [.] _int_malloc
   1.27%  gdbus    libglib-2.0.so.0.4200.2     [.] g_mutex_lock
   1.22%  gdbus    libglib-2.0.so.0.4200.2     [.] g_variant_unref
   1.16%  server   libc-2.20.so                [.] malloc
   1.14%  gdbus    libglib-2.0.so.0.4200.2     [.] g_bit_lock
   1.07%  gdbus    libglib-2.0.so.0.4200.2     [.] g_slice_free1
   1.05%  gdbus    libglib-2.0.so.0.4200.2     [.] g_bit_unlock
   1.01%  gdbus    libglib-2.0.so.0.4200.2     [.] g_mutex_unlock

there's not a kernel function in sight in the top-15, and it's all
just overhead. The above is from the server side, but the client looks
similar.

If somebody wants to speed up dbus, they should likely look at the
user-space code, not the kernel side.

My guess is that pretty much the entirely of the quoted kdbus
"speedup" isn't because it speeds up any kernel side thing, it's
because it avoids the user-space crap in the dbus server.

IOW, all the people who say that it's about avoiding context switches
are probably just full of shit. It's not about context switches, it's
about bad user-level code.

                     Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-27 22:00               ` Linus Torvalds
@ 2015-04-27 22:14                 ` Linus Torvalds
  2015-04-28 13:44                   ` Havoc Pennington
                                     ` (3 more replies)
  2015-04-28 12:49                 ` Havoc Pennington
  1 sibling, 4 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-27 22:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> IOW, all the people who say that it's about avoiding context switches
> are probably just full of shit. It's not about context switches, it's
> about bad user-level code.

Just to make sure, I did a system-wide profile (so that you can
actually see the overhead of context switching better), and that
didn't change the picture.

The scheduler overhead *might* be 1% or so.

So really. The people who talk about how kdbus improves performance
are just full of sh*t. Yes, it improves things, but the improvement
seems to be 100% "incidental", in that it avoids a few trips down the
user-space problems.

The real problems seem to be in dbus memory management (suggestion:
keep a small per-thread cache of those message allocations) and to a
smaller degree in the crazy utf8 validation (why the f*ck does it do
that anyway?), with some locking problems thrown in for good measure.

                              Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-27  8:57                 ` Lukasz Skalski
  2015-04-27 17:18                   ` Greg Kroah-Hartman
@ 2015-04-27 22:29                   ` David Lang
  2015-04-28 10:53                     ` Lukasz Skalski
  1 sibling, 1 reply; 316+ messages in thread
From: David Lang @ 2015-04-27 22:29 UTC (permalink / raw)
  To: Lukasz Skalski
  Cc: Greg Kroah-Hartman, Havoc Pennington, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Mon, 27 Apr 2015, Lukasz Skalski wrote:

> Subject: Re: [GIT PULL] kdbus for 4.1-rc1
> 
> On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote:
>> On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:
>>> On 04/24/2015 04:19 PM, Havoc Pennington wrote:
>>>> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
>>>>> - client: http://fpaste.org/215156/
>>>>>
>>>>
>>>> Cool - it might also be interesting to try this without blocking round
>>>> trips, i.e. send requests as quickly as you can, and collect replies
>>>> asynchronously. That's how people ideally use dbus. It should
>>>> certainly reduce the total benchmark time, but just wondering if this
>>>> usage increases or decreases the delta between userspace daemon and
>>>> kdbus.
>>>
>>> No problem - I'll prepare also asynchronous version.
>>
>> That would be great to see as well.  Many thanks for doing this work.
>
> As it was proposed by Havoc and Greg I've created simple benchmark for
> asynchronous calls:
>
> - server: http://fpaste.org/215157/ (the same as in the previous test)
> - client: http://fpaste.org/215724/ (asynchronous version)
>
> For asynchronous version of client I had to decrease number of calls to
> 128 (for synchronous version it was x20000 calls), otherwise we can
> exceed the maximum number of pending replies per connection.

aren't we being told that part of the reason for needing kdbus is that 
thousands, or tens of thousands of messages are being spewed out? how does 
limiting it to 128 messages represent real-life if this is the case?

David Lang

> The test results are following:
>
> +--------------+--------------------+--------------------+
> |              |    Elapsed time    |    Elapsed time    |
> | Message size |  GLIB WITH NATIVE  | GLIB + DBUS-DAEMON |
> |   [bytes]    |    KDBUS SUPPORT*  |                    |
> +--------------+--------------------+--------------------+
> |              |    1) 0.018639 s   |    1) 0.029947 s   |
> |     1000     |    2) 0.017045 s   |    2) 0.032812 s   |
> |              |    3) 0.017490 s   |    3) 0.029971 s   |
> |              |    4) 0.018001 s   |    4) 0.026485 s   |
> +--------------+--------------------+--------------------+
> |              |    3) 0.019898 s   |    3) 0.040914 s   |
> |    10000     |    3) 0.022187 s   |    3) 0.033604 s   |
> |              |    3) 0.020854 s   |    3) 0.037616 s   |
> |              |    3) 0.020020 s   |    3) 0.033772 s   |
> +--------------+--------------------+--------------------+
> *all tests performed without using memfd mechanism.
>
> And as I wrote in my previous mail, kdbus transport for GLib is not
> finished yet and there are still some places for improvements, so please
> do not treat these test results as final).
>
>>
>> greg k-h
>>
>
> Cheers,
>

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-27 21:32           ` Linus Torvalds
  2015-04-27 21:40             ` Andy Lutomirski
@ 2015-04-28 10:39             ` Lukasz Skalski
  1 sibling, 0 replies; 316+ messages in thread
From: Lukasz Skalski @ 2015-04-28 10:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg Kroah-Hartman, Andy Lutomirski, Andrew Morton,
	Arnd Bergmann, Eric W. Biederman, One Thousand Gnomes,
	Tom Gundersen, Jiri Kosina, linux-kernel, Daniel Mack,
	David Herrmann, Djalal Harouni

On 04/27/2015 11:32 PM, Linus Torvalds wrote:
> On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
>>
>> To perform tests I've created two simple apps:
>>
>> - server: http://fpaste.org/215157/
>> - client: http://fpaste.org/215156/
> 
> So since Andy reported that dbus seems to be a few orders of magnitude
> too slow, I tried to build these apps to see what it even does.
> 
> They don't buidl on F21. You seem to be using features that are too
> new to exist even in fairly modern distros:
> 
>     server.c:47:24: error: ‘G_BUS_TYPE_USER’ undeclared
> 
> so I can't even see what dbus does *now*.
>

I've just explained it in my mail to Andy. As it was discussed some
time ago with GLib developers, we introduced two new bus
types called "user" (G_BUS_TYPE_USER) and "machine"
(G_BUS_TYPE_MACHINE). At this moment, these are only available on GLib
devel branch, so I should replace G_BUS_TYPE_USER with
G_BUS_TYPE_SESSION before I posted my benchmark apps - sorry for that.

> That said, either you're running your test on a potato, or dbus is
> seriously screwed up. No way should it take 4+ seconds to send a 1000b
> message to back and forth 20k times. But as mentioned, I can't even
> see what it's doing right now.
> 
>                       Linus
> 

-- 
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
l.skalski@samsung.com

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-27 22:29                   ` David Lang
@ 2015-04-28 10:53                     ` Lukasz Skalski
  0 siblings, 0 replies; 316+ messages in thread
From: Lukasz Skalski @ 2015-04-28 10:53 UTC (permalink / raw)
  To: David Lang
  Cc: Greg Kroah-Hartman, Havoc Pennington, Andy Lutomirski,
	Linus Torvalds, Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On 04/28/2015 12:29 AM, David Lang wrote:
> On Mon, 27 Apr 2015, Lukasz Skalski wrote:
> 
> aren't we being told that part of the reason for needing kdbus is that
> thousands, or tens of thousands of messages are being spewed out? how
> does limiting it to 128 messages represent real-life if this is the case?
> 

AFAIK, at this moment some limits (like for example maximum number of
queued requests waiting for a reply or ) for both - DBus daemon and
kdbus, are the same (or at least quite similar).

> David Lang
> 
> 

-- 
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
l.skalski@samsung.com

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-27 22:00               ` Linus Torvalds
  2015-04-27 22:14                 ` Linus Torvalds
@ 2015-04-28 12:49                 ` Havoc Pennington
  1 sibling, 0 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-28 12:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Mon, Apr 27, 2015 at 6:00 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> If somebody wants to speed up dbus, they should likely look at the
> user-space code, not the kernel side.

To be more precise, your profile seems to show a lot of the gdbus
(glib bindings) user space code. (And the blocking version of this
benchmark *is* doing something ridiculous, by blocking for every round
trip, which is the one performance mistake the docs say over and over
not to make.)

There are at least two other C bindings (a plain-C one in systemd and
the original libdbus).

If someone wanted to get the noise out of the picture I imagine the
plain-C bindings in systemd might have a lot less in the way of
allocations and locks than gdbus, though I haven't looked at them.
Those systemd bindings are also the ones people asking for more
performance are probably using (because they are talking about early
boot, system services, etc.)

> My guess is that pretty much the entirely of the quoted kdbus
> "speedup" isn't because it speeds up any kernel side thing, it's
> because it avoids the user-space crap in the dbus server.

The dbus bus daemon doesn't link to any g_ functions, fwiw, when
interpreting these profiles. Though nobody would claim the bus daemon
is fast, it is on the order of a few times slower than a raw socket
last I checked (which was a long time ago) ... here are some old
threads:

http://lists.freedesktop.org/pipermail/dbus/2004-November/001779.html
http://lists.freedesktop.org/archives/dbus/2012-March/015024.html

In 2004, the libdbus parsing/validation/malloc/etc. overhead was 2.5x
a raw socket without the bus daemon, and about twice that with the bus
daemon (since the daemon adds another read and another write per
message). I'm not aware of any reason this would have changed
dramatically, though it doesn't mean there isn't one.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-27 22:14                 ` Linus Torvalds
@ 2015-04-28 13:44                   ` Havoc Pennington
  2015-04-28 14:48                     ` Havoc Pennington
  2015-06-22 17:33                   ` Jindrich Makovicka
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 316+ messages in thread
From: Havoc Pennington @ 2015-04-28 13:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Mon, Apr 27, 2015 at 6:14 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> The real problems seem to be in dbus memory management (suggestion:
> keep a small per-thread cache of those message allocations) and to a
> smaller degree in the crazy utf8 validation (why the f*ck does it do
> that anyway?), with some locking problems thrown in for good measure.
>

I would say there are two distinct performance topics here.

A. is the fixed overhead of various bindings (which may well vary a
lot by binding). This is parsing, validation, allocation, locking,
whatever. It tends to be "per message operation" (read/parse or
marshal/write of a message).

B. is how many of these "message operations" (read/parse,
marshal/write) are happening.

To make A*B smaller, one can reduce either A or B.

The kdbus idea seems to be mostly about B, eliminating the bus
daemon's read/parse and marshal/write, and reducing it to only one
marshal/write by the sender and one read/parse by the recipient
without the daemon in between.

People have worked on A for clients, by doing the systemd binding for
example, but perhaps they have been reluctant to work on the bus
daemon itself to improve A for the bus because they felt solving B
would involve eliminating the bus daemon anyway. If you are planning
to solve B via kdbus, then optimizing the bus daemon itself would be a
waste of time (A only matters for clients, not the bus, in kdbus
world).

That email I linked earlier
(http://lists.freedesktop.org/archives/dbus/2012-March/015024.html )
has many suggestions on A for the bus daemon itself, but of course
taking the bus daemon out of the equation would be more effective than
any amount of optimizing it.

A. is kind of a realm of many choices - there are tons of bindings,
and people can decide if they want the convenient-but-malloc-happy
glib ones, or the more traditional C style of systemd, or Python or
Java or JavaScript or whatever ... this is an area where people can
make the tradeoff they want. But everyone is "stuck" with the bus
daemon (or kdbus) since it has to be shared among clients, of course.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 13:44                   ` Havoc Pennington
@ 2015-04-28 14:48                     ` Havoc Pennington
  2015-04-28 17:18                       ` Theodore Ts'o
  2015-04-28 17:19                       ` David Lang
  0 siblings, 2 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-28 14:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

btw if I can make a suggestion, it's quite confusing to talk about
"dbus" unqualified when we are talking about implementation issues,
since it muddles bus daemon vs. clients, and also since there are lots
of implementations of the client bindings:

  http://www.freedesktop.org/wiki/Software/DBusBindings/

For the bus daemon, the only two implementations I know of are the
original one (which uses libdbus as its binding) and kdbus, though.

I would expect there's no question the bus daemon can be faster, maybe
say 1.5x raw sockets instead of 2.5x, or whatever - something on that
order. Should probably simply stipulate this for discussion purposes:
"someone could optimize the crap out of the bus daemon". The kdbus
question is about whether to eliminate this daemon entirely.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 14:48                     ` Havoc Pennington
@ 2015-04-28 17:18                       ` Theodore Ts'o
  2015-04-28 20:25                         ` Havoc Pennington
  2015-04-28 17:19                       ` David Lang
  1 sibling, 1 reply; 316+ messages in thread
From: Theodore Ts'o @ 2015-04-28 17:18 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Linus Torvalds, Andy Lutomirski, Lukasz Skalski,
	Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 28, 2015 at 10:48:10AM -0400, Havoc Pennington wrote:
> btw if I can make a suggestion, it's quite confusing to talk about
> "dbus" unqualified when we are talking about implementation issues,
> since it muddles bus daemon vs. clients, and also since there are lots
> of implementations of the client bindings:
> 
>   http://www.freedesktop.org/wiki/Software/DBusBindings/
> 
> For the bus daemon, the only two implementations I know of are the
> original one (which uses libdbus as its binding) and kdbus, though.
> 
> I would expect there's no question the bus daemon can be faster, maybe
> say 1.5x raw sockets instead of 2.5x, or whatever - something on that
> order. Should probably simply stipulate this for discussion purposes:
> "someone could optimize the crap out of the bus daemon". The kdbus
> question is about whether to eliminate this daemon entirely.

So the question is if one of the justifications for moving the daemon
into kernel space is that it's performance is crap, then I think it is
useful to determine whether a fully optimized userspace daemon would
be good enough.

After all, we can go down the Novell Netware path and push arbitrary
web servers, ldap servers, etc. all into the kernel on the excuse of
"the performance would be faster".  But that begs the question of how
much performance improvements can be made purely in userspace, and
ignores all of the security and stability costs of moving more and
more code into the kernel.

So the question I have is why in the world do we want to be able to
support 1.5x raw sockets for a bus speed?  What's the use case where
that kind of performance is required for a bus based system, and is
that a world we really want to live in?  I find dbus to be extremely
hard to debug when my desktop starts doing things I don't want it to
do.  The fact that it might be flinging around hundreds of thousands
of messages, and that this is something we want to encourage, doesn't
make me feel any more kindly inclined towards dbus or kdbus....

     	     	      	     	      	      - Ted

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 14:48                     ` Havoc Pennington
  2015-04-28 17:18                       ` Theodore Ts'o
@ 2015-04-28 17:19                       ` David Lang
  2015-04-28 19:19                         ` Havoc Pennington
  1 sibling, 1 reply; 316+ messages in thread
From: David Lang @ 2015-04-28 17:19 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Linus Torvalds, Andy Lutomirski, Lukasz Skalski,
	Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, 28 Apr 2015, Havoc Pennington wrote:

> btw if I can make a suggestion, it's quite confusing to talk about
> "dbus" unqualified when we are talking about implementation issues,
> since it muddles bus daemon vs. clients, and also since there are lots
> of implementations of the client bindings:
>
>  http://www.freedesktop.org/wiki/Software/DBusBindings/
>
> For the bus daemon, the only two implementations I know of are the
> original one (which uses libdbus as its binding) and kdbus, though.
>
> I would expect there's no question the bus daemon can be faster, maybe
> say 1.5x raw sockets instead of 2.5x, or whatever - something on that
> order. Should probably simply stipulate this for discussion purposes:
> "someone could optimize the crap out of the bus daemon". The kdbus
> question is about whether to eliminate this daemon entirely.

As I'm seeing things, we aren't talking about 1.5x vs 2.5x, we're talking about 
1000x

If the examples that are being used to show the performance advantage of kdbus 
vs normal dbus are doing the wrong thing, then we need to get some other 
examples available to people who don't live and breath dbus that 'so things 
right' so that the kernel developers can see what you think is the real problem 
and how kdbus addresses it.

So far, this 'wrong' example is the only thing that's been posted to show the 
performance advantage of kdbus.

David Lang

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 17:19                       ` David Lang
@ 2015-04-28 19:19                         ` Havoc Pennington
  2015-04-28 20:34                           ` David Lang
  2015-04-28 20:43                           ` Linus Torvalds
  0 siblings, 2 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-28 19:19 UTC (permalink / raw)
  To: David Lang
  Cc: Linus Torvalds, Andy Lutomirski, Lukasz Skalski,
	Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 28, 2015 at 1:19 PM, David Lang <david@lang.hm> wrote:
> If the examples that are being used to show the performance advantage of
> kdbus vs normal dbus are doing the wrong thing, then we need to get some
> other examples available to people who don't live and breath dbus that 'so
> things right' so that the kernel developers can see what you think is the
> real problem and how kdbus addresses it.
>
> So far, this 'wrong' example is the only thing that's been posted to show
> the performance advantage of kdbus.

I'm hopeful someone will do that.

fwiw, I would be suspicious of a broken benchmark if it didn't show:

* the bus daemon means an extra read/parse and marshal/write per
message, so 4 vs. 2
* the existence of the bus daemon therefore makes a message
send/receive take roughly twice as long

https://lwn.net/Articles/580194/ has a bit more elaboration about
number of copies, validations, and context switches in each case.

>From what I can tell, the core performance claim for kdbus is that for
a userspace daemon to be a routing intermediary, it has to receive and
re-send messages. If the baseline performance of IPC is the cost to
send once and receive once, adding the daemon means there's twice as
much to do (1 more receive, 1 more send). However fast you make
send/receive, the daemon always means there are twice as many
send/receives as there would be with no daemon.

If that isn't what a benchmark shows, then there's a mystery to
explain... (one disruption to the ratio of course could be if the
clients use a much faster or slower dbus lib than the daemon)

As noted many times, of course this 2x penalty for the daemon was a
conscious tradeoff - kdbus is trying to escape the tradeoff in order
to extend usage of dbus to more use cases. Given the tradeoff,
_existing_ uses of dbus seem to prefer the performance hit to the loss
of useful semantics, but potential new users would like to or need to
have both.

That LWN article lists some other non-performance rationales for kdbus
too, of course.

Aside: earlier I referred to the systemd dbus client binding without a
link, the link appears to be:
http://cgit.freedesktop.org/systemd/systemd/tree/src/libsystemd/sd-bus

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 17:18                       ` Theodore Ts'o
@ 2015-04-28 20:25                         ` Havoc Pennington
  2015-04-28 23:12                           ` John Stoffel
  0 siblings, 1 reply; 316+ messages in thread
From: Havoc Pennington @ 2015-04-28 20:25 UTC (permalink / raw)
  To: Theodore Ts'o, Havoc Pennington, Linus Torvalds,
	Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Tue, Apr 28, 2015 at 1:18 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> So the question is if one of the justifications for moving the daemon
> into kernel space is that it's performance is crap, then I think it is
> useful to determine whether a fully optimized userspace daemon would
> be good enough.
>

Yeah. I don't know how you answer that, because the answer is probably
"it would be good enough for some things and not for other things." It
depends on whether an app is sending enough data to be too slow, and
it depends on the hardware, right.

What I think we might know: the userspace:kernel time-to-send ratio
should always be around 2:1, if both of them are
similarly-implemented, because the userspace version has about 2x the
work to do.

The actual wall-clock time of course depends on the hardware and
what's being sent.

If there was a deviation from 2:1 in a benchmark, it might be because
of implementation issues - so for example libdbus+dbus-daemon might be
3:1 or 5:1 to sd-dbus+kdbus, because sd-dbus isn't as bloated as
libdbus, say. That isn't telling you anything about kernel vs.
userspace architecture, the extra ratio above 2:1 is only telling you
about userspace implementation quality.

For purposes of deciding what to put in kernel - the differences
between dbus client implementations (sd-dbus, libdbus, gdbus, etc.)
seem like irrelevant noise to me.

Re: the slippery slope to LDAP in the kernel - my questions would be
things like 1) what are non-performance reasons to have dbus in the
kernel, such as early boot or security considerations; 2) does LDAP in
kernel give these kind of 2:1 gains; 3) is there a simpler way to get
the 2:1 gain for dbus...

Others can answer those better than I can.

I _would_ say that dbus is more "generic" than something like LDAP;
dbus is specific to the use-case of coordinating processes on a single
machine, but it isn't specific to any particular application, and it's
been used for lots of different applications. On my laptop, which is a
pretty normal fedora 21 as far as I know:

$ rpm -q --whatrequires 'libdbus-1.so.3()(64bit)' | wc -l
113

this omits anyone using a different binding, it's only libdbus users.

> I find dbus to be extremely hard to debug when my desktop starts doing
> things I don't want it to do.  The fact that it might be flinging around hundreds
> of thousands of messages, and that this is something we want to encourage,

This particular argument doesn't resonate with me ... if dbus is hard
to debug, it's not as if "ad hoc application-specific sidechannel
somebody cooked up" is going to be easier.

People aren't usually making up data to send around just because they
can. If they need to send an audio stream, and dbus is too slow,
they'll send it another ad hoc way, but it ultimately has to get sent.
Same for most data, it is the size it is and it needs to go where it
needs to go, for some what-the-user-wants-to-do kind of reason.

If apps have to, they say "I'm sorry Dave I can't do that - you can't
software-decode 4K video on your 300mhz ARM" - of course.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 19:19                         ` Havoc Pennington
@ 2015-04-28 20:34                           ` David Lang
  2015-04-28 20:42                             ` Andy Lutomirski
  2015-04-28 20:43                           ` Linus Torvalds
  1 sibling, 1 reply; 316+ messages in thread
From: David Lang @ 2015-04-28 20:34 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Linus Torvalds, Andy Lutomirski, Lukasz Skalski,
	Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, 28 Apr 2015, Havoc Pennington wrote:

> On Tue, Apr 28, 2015 at 1:19 PM, David Lang <david@lang.hm> wrote:
>> If the examples that are being used to show the performance advantage of
>> kdbus vs normal dbus are doing the wrong thing, then we need to get some
>> other examples available to people who don't live and breath dbus that 'so
>> things right' so that the kernel developers can see what you think is the
>> real problem and how kdbus addresses it.
>>
>> So far, this 'wrong' example is the only thing that's been posted to show
>> the performance advantage of kdbus.
>
> I'm hopeful someone will do that.
>
> fwiw, I would be suspicious of a broken benchmark if it didn't show:
>
> * the bus daemon means an extra read/parse and marshal/write per
> message, so 4 vs. 2
> * the existence of the bus daemon therefore makes a message
> send/receive take roughly twice as long
>
> https://lwn.net/Articles/580194/ has a bit more elaboration about
> number of copies, validations, and context switches in each case.
>
> From what I can tell, the core performance claim for kdbus is that for
> a userspace daemon to be a routing intermediary, it has to receive and
> re-send messages. If the baseline performance of IPC is the cost to
> send once and receive once, adding the daemon means there's twice as
> much to do (1 more receive, 1 more send). However fast you make
> send/receive, the daemon always means there are twice as many
> send/receives as there would be with no daemon.

there are twice as many context switches, nobody disputes that, the question is 
if it matters.

It doesn't matter if the message router is in kernel space or user space, it 
still needs to read/parse, marshal/write the data, so you aren't saving that 
time due to it being in the kernel.

> If that isn't what a benchmark shows, then there's a mystery to
> explain... (one disruption to the ratio of course could be if the
> clients use a much faster or slower dbus lib than the daemon)
>
> As noted many times, of course this 2x penalty for the daemon was a
> conscious tradeoff - kdbus is trying to escape the tradeoff in order
> to extend usage of dbus to more use cases. Given the tradeoff,
> _existing_ uses of dbus seem to prefer the performance hit to the loss
> of useful semantics, but potential new users would like to or need to
> have both.

If there is a 2x performance improvement for being in the kernel, but a 100x 
performance improvement from fixing the userspace code, the effort should be 
spent on the userspace code, not on moving things to kernel space.

Remember the Tux in-kernel webserver? it showed performance improvements from 
putting the http daemon in the kernel, and a lot of the arguments about it sound 
very similar (reduced context switches, etc)

David Lang

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 20:34                           ` David Lang
@ 2015-04-28 20:42                             ` Andy Lutomirski
  0 siblings, 0 replies; 316+ messages in thread
From: Andy Lutomirski @ 2015-04-28 20:42 UTC (permalink / raw)
  To: David Lang
  Cc: Havoc Pennington, Linus Torvalds, Lukasz Skalski,
	Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 28, 2015 at 1:34 PM, David Lang <david@lang.hm> wrote:
> On Tue, 28 Apr 2015, Havoc Pennington wrote:
>
>> On Tue, Apr 28, 2015 at 1:19 PM, David Lang <david@lang.hm> wrote:
>>>
>>> If the examples that are being used to show the performance advantage of
>>> kdbus vs normal dbus are doing the wrong thing, then we need to get some
>>> other examples available to people who don't live and breath dbus that
>>> 'so
>>> things right' so that the kernel developers can see what you think is the
>>> real problem and how kdbus addresses it.
>>>
>>> So far, this 'wrong' example is the only thing that's been posted to show
>>> the performance advantage of kdbus.
>>
>>
>> I'm hopeful someone will do that.
>>
>> fwiw, I would be suspicious of a broken benchmark if it didn't show:
>>
>> * the bus daemon means an extra read/parse and marshal/write per
>> message, so 4 vs. 2
>> * the existence of the bus daemon therefore makes a message
>> send/receive take roughly twice as long
>>
>> https://lwn.net/Articles/580194/ has a bit more elaboration about
>> number of copies, validations, and context switches in each case.
>>
>> From what I can tell, the core performance claim for kdbus is that for
>> a userspace daemon to be a routing intermediary, it has to receive and
>> re-send messages. If the baseline performance of IPC is the cost to
>> send once and receive once, adding the daemon means there's twice as
>> much to do (1 more receive, 1 more send). However fast you make
>> send/receive, the daemon always means there are twice as many
>> send/receives as there would be with no daemon.
>
>
> there are twice as many context switches, nobody disputes that, the question
> is if it matters.
>
> It doesn't matter if the message router is in kernel space or user space, it
> still needs to read/parse, marshal/write the data, so you aren't saving that
> time due to it being in the kernel.
>
>> If that isn't what a benchmark shows, then there's a mystery to
>> explain... (one disruption to the ratio of course could be if the
>> clients use a much faster or slower dbus lib than the daemon)
>>
>> As noted many times, of course this 2x penalty for the daemon was a
>> conscious tradeoff - kdbus is trying to escape the tradeoff in order
>> to extend usage of dbus to more use cases. Given the tradeoff,
>> _existing_ uses of dbus seem to prefer the performance hit to the loss
>> of useful semantics, but potential new users would like to or need to
>> have both.
>
>
> If there is a 2x performance improvement for being in the kernel, but a 100x
> performance improvement from fixing the userspace code, the effort should be
> spent on the userspace code, not on moving things to kernel space.

I would guess that, if we compared a highly optimized userspace
implementation to a kernel implementation, we'd see less than 2x
difference.  After all, a userspace daemon doesn't really need to
unmarshal and re-marshal anything except headers.  For large messages,
we could use splice and avoid a couple of copies, too.

If the scheduler became a bottleneck, it could be interesting to add
something like a send-and-poll primitive.  I suspect that some
workloads currently do unnecessary context switches with only standard
POSIX primitives.  If A sends a message to B, then there's a brief
window in which both A and B are runnable.  Ideally we wouldn't
context switch until A calls poll or epoll_wait, but I don't know how
well that works in practice.

There's more room for generic improvements than just that.  At LSF/MM
we were talking about more scalable epoll variants that would allow a
multithreaded daemon to be woken up on the core that received incoming
data.  That would allow an efficient multi-queue dbus with fewer
migrations and IPIs.

At some point, I'd like to implement PCID on x86 (if no one beats me
to it, and this is a low priority for me), which will allow us to skip
expensive TLB flushes while context switching.  I have no idea whether
ARM can do something similar.

--Andy

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 19:19                         ` Havoc Pennington
  2015-04-28 20:34                           ` David Lang
@ 2015-04-28 20:43                           ` Linus Torvalds
  1 sibling, 0 replies; 316+ messages in thread
From: Linus Torvalds @ 2015-04-28 20:43 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: David Lang, Andy Lutomirski, Lukasz Skalski, Greg Kroah-Hartman,
	Andrew Morton, Arnd Bergmann, Eric W. Biederman,
	One Thousand Gnomes, Tom Gundersen, Jiri Kosina, linux-kernel,
	Daniel Mack, David Herrmann, Djalal Harouni

On Tue, Apr 28, 2015 at 12:19 PM, Havoc Pennington <hp@pobox.com> wrote:
>
> From what I can tell, the core performance claim for kdbus is that for
> a userspace daemon to be a routing intermediary, it has to receive and
> re-send messages. If the baseline performance of IPC is the cost to
> send once and receive once, adding the daemon means there's twice as
> much to do (1 more receive, 1 more send). However fast you make
> send/receive, the daemon always means there are twice as many
> send/receives as there would be with no daemon.

HOWEVER.

That's only a good optimization strategy if the code is optimized to begin with.

If the code spends 10x as much time in user space in "overhead" as it
actually spends in the kernel, the proper place to optimize is to get
rid of the 10x. That will make things much faster.

Once user space is lean and mean, at that point do I believe that "ok,
let's add kernel code for the last bit of performance". But as it is
right now, anybody who works on kdbus and claims that _performance_ is
the reason for their work is just looking at teh wrong piece of the
puzzle.

Now, there may be *other* reasons why kdbus is a good idea. But quite
frankly, every time somebody asks "why", performance seems to be one
of the main answers.

And quite frankly, that *stinks*.

Do proper optimizations of the actual real costs before starting to
work on kernel stuff. It's *stupid* to add a kernel driver to get 2x
improvement, when there's a 10x bloat in user space.

Is that really so hard to see? I don't think it is at *all*
appropriate to say "we're a f*cking bloated pig, but we're too lazy to
fix the bloat and the primary performance problems, so we'll add a
kernel interface to partially hide the issue".

That is particularly true because if you fix the user-level
performance problems, you may notice that there was something stupid
in the interfaces, and some of the kernel interface design was wrong.

                        Linus

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 20:25                         ` Havoc Pennington
@ 2015-04-28 23:12                           ` John Stoffel
  2015-04-29  0:45                             ` Havoc Pennington
                                               ` (2 more replies)
  0 siblings, 3 replies; 316+ messages in thread
From: John Stoffel @ 2015-04-28 23:12 UTC (permalink / raw)
  To: Havoc Pennington
  Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski,
	Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

>>>>> "Havoc" == Havoc Pennington <hp@pobox.com> writes:

Havoc> On Tue, Apr 28, 2015 at 1:18 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>> So the question is if one of the justifications for moving the daemon
>> into kernel space is that it's performance is crap, then I think it is
>> useful to determine whether a fully optimized userspace daemon would
>> be good enough.
>> 

Havoc> Yeah. I don't know how you answer that, because the answer is
Havoc> probably "it would be good enough for some things and not for
Havoc> other things." It depends on whether an app is sending enough
Havoc> data to be too slow, and it depends on the hardware, right.

So what happens if we put kdbus into the kernel and it's still too
slow?  What then?  

Havoc> What I think we might know: the userspace:kernel time-to-send
Havoc> ratio should always be around 2:1, if both of them are
Havoc> similarly-implemented, because the userspace version has about
Havoc> 2x the work to do.

I'm not sure I agree with this statement, just putting something into
the kernel doesn't magically make the work go away, and the overhead
people are talking about won't change if applications and libraries
keep opening/closing the connection to the bus all the time.

Havoc> The actual wall-clock time of course depends on the hardware
Havoc> and what's being sent.

Havoc> If there was a deviation from 2:1 in a benchmark, it might be
Havoc> because of implementation issues - so for example
Havoc> libdbus+dbus-daemon might be 3:1 or 5:1 to sd-dbus+kdbus,
Havoc> because sd-dbus isn't as bloated as libdbus, say. That isn't
Havoc> telling you anything about kernel vs.  userspace architecture,
Havoc> the extra ratio above 2:1 is only telling you about userspace
Havoc> implementation quality.

Which is also telling you that maybe userspace could be improved more,
before it needs to even think about going into the kernel?  

Havoc> For purposes of deciding what to put in kernel - the
Havoc> differences between dbus client implementations (sd-dbus,
Havoc> libdbus, gdbus, etc.)  seem like irrelevant noise to me.

Havoc> Re: the slippery slope to LDAP in the kernel - my questions
Havoc> would be things like 1) what are non-performance reasons to
Havoc> have dbus in the kernel, such as early boot or security
Havoc> considerations; 2) does LDAP in kernel give these kind of 2:1
Havoc> gains; 3) is there a simpler way to get the 2:1 gain for
Havoc> dbus...

Havoc> Others can answer those better than I can.

Havoc> I _would_ say that dbus is more "generic" than something like
Havoc> LDAP; dbus is specific to the use-case of coordinating
Havoc> processes on a single machine, but it isn't specific to any
Havoc> particular application, and it's been used for lots of
Havoc> different applications. On my laptop, which is a pretty normal
Havoc> fedora 21 as far as I know:

LDAP is pretty damn generic, in that you can put pretty large objects
into it, and pretty large OUs, etc.  So why would it be a candidate
for going into the kernel?  And why is kdbus so important in the
kernel as well?  People have talked about it needing to be there for
bootup, but isn't that why we ripped out RAID detection and such from
the kernel and built initramfs, so that there's LESS in the kernel,
and more in an early userspace?  Same idea with dbus in my opinion.

Havoc> $ rpm -q --whatrequires 'libdbus-1.so.3()(64bit)' | wc -l
Havoc> 113

Havoc> this omits anyone using a different binding, it's only libdbus users.

>> I find dbus to be extremely hard to debug when my desktop starts doing
>> things I don't want it to do.  The fact that it might be flinging around hundreds
>> of thousands of messages, and that this is something we want to encourage,

Havoc> This particular argument doesn't resonate with me ... if dbus
Havoc> is hard to debug, it's not as if "ad hoc application-specific
Havoc> sidechannel somebody cooked up" is going to be easier.

When Ted is saying it's hard to debug... then maybe it's a bit crappy
in design or implementation?  

Havoc> People aren't usually making up data to send around just because they
Havoc> can. If they need to send an audio stream, and dbus is too slow,
Havoc> they'll send it another ad hoc way, but it ultimately has to get sent.
Havoc> Same for most data, it is the size it is and it needs to go where it
Havoc> needs to go, for some what-the-user-wants-to-do kind of reason.

Havoc> If apps have to, they say "I'm sorry Dave I can't do that - you
Havoc> can't software-decode 4K video on your 300mhz ARM" - of course.

So why DOES audio need to go via DBUS?  What about video?  Why
shouldn't that go via dbus as well?  

If one userspace implementation is so crappy, why can't that
implementation be tossed and a better one done?  Or why can't they
just optimize/tune it in userspace instead?  

John


^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 23:12                           ` John Stoffel
@ 2015-04-29  0:45                             ` Havoc Pennington
  2015-04-29 11:33                             ` Harald Hoyer
  2015-04-29 12:47                             ` Harald Hoyer
  2 siblings, 0 replies; 316+ messages in thread
From: Havoc Pennington @ 2015-04-29  0:45 UTC (permalink / raw)
  To: John Stoffel
  Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski,
	Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On Tue, Apr 28, 2015 at 7:12 PM, John Stoffel <john@stoffel.org> wrote:
> Havoc> Yeah. I don't know how you answer that, because the answer is
> Havoc> probably "it would be good enough for some things and not for
> Havoc> other things." It depends on whether an app is sending enough
> Havoc> data to be too slow, and it depends on the hardware, right.
>
> So what happens if we put kdbus into the kernel and it's still too
> slow?  What then?

What my above paragraph was intended to mean is: I don't understand
what it means to ask about a "too slow" fixed line here. Every time
you make it substantively faster, it works for more apps or on slower
hardware, presumably. You dial the speed, and you include or exclude
certain app ideas accordingly.

I think dbus works for lots of purposes now, despite being slow. Lots
of people are using it. In many uses, super-slow-dbus might be 1% of
the profile of whatever the user-visible functionality is, and nobody
cares how fast dbus is. In other uses, they might.

Some people are saying they would use it in more ways if it were
faster and/or available in early boot and/or whatever else. I'm not
those people, because right now I'm not working on dbus or anything
using dbus. They would have to say what's 'fast enough' for them.

"What happens if unix sockets are too slow? what then?" - it's not a
coherent question. It's always relative to what you're trying to do,
surely.

> I'm not sure I agree with this statement, just putting something into
> the kernel doesn't magically make the work go away

The kdbus guys should really explain this. I have my understanding of
it but theirs will be more accurate.

> Which is also telling you that maybe userspace could be improved more,
> before it needs to even think about going into the kernel?

I imagine people have already improved the part of userspace they are
thinking of keeping (sd-dbus, replacing libdbus) and they don't want
to rewrite dbus-daemon only to immediately discard it. (The part of
the "dbus" overall system which hasn't been rewritten and optimized is
the daemon, which could be dropped completely in kdbus-world.)

It's not especially mysterious what's slow about the existing daemon
implementation, in my opinion; it's been the same for 10 years. The
rough outline of speeding it up would be to replace libdbus with
something sd-dbus-like, and then do a round of profiling and tuning.
The 2012 email I linked to earlier had some other ideas.  But this is
a lot of work, it isn't "just" port to sd-dbus, the daemon is strongly
entangled with libdbus right now. I don't blame people for being
unmotivated on this if they believe it's a dead end.

In that same 2012 email you'll notice I advised doing exactly what
Linus suggests; do the userspace tuning rather than quote "arguing
with kernel developers":
 http://lists.freedesktop.org/archives/dbus/2012-March/015024.html

But I do admire that people felt kdbus was the right answer so have
gone for it anyway, and I do think Linux as a complete OS
(kernel+userspace) deserves a great answer in this problem space.

> LDAP is pretty damn generic, in that you can put pretty large objects
> into it, and pretty large OUs, etc.  So why would it be a candidate
> for going into the kernel?  And why is kdbus so important in the
> kernel as well?  People have talked about it needing to be there for
> bootup, but isn't that why we ripped out RAID detection and such from
> the kernel and built initramfs, so that there's LESS in the kernel,
> and more in an early userspace?  Same idea with dbus in my opinion.
>

I don't have a well-developed philosophy on what should be in the
kernel or not. That is something the kernel maintainers have to
answer. My main concern here is that people understand what dbus is
about historically, so they don't do silly stuff - whether cargo cult
keeping a 'feature' that was always a bad idea, or speeding it up by
breaking intentional and important semantics, or whatever.

When I see people saying they don't understand what dbus is because
they have no idea how a Linux workstation userspace is put together,
that's something I can help with.

When I see people saying maybe it isn't worth the complexity to put
this in the kernel if it's only an N% speedup, I can see that, I'm not
going to say that's wrong or right. It depends to me on what apps are
enabled by the N%, or whether early boot and other factors are
important.

> When Ted is saying it's hard to debug... then maybe it's a bit crappy
> in design or implementation?

Or maybe he just doesn't know how to debug it, honestly. I find the
kernel hard to debug because I know very little about it. I find the
desktop simple to debug, at least as simple as debugging millions of
lines of code can be. The difference is that I have never done kernel
debugging and I'm already familiar with how the desktop works.

dbus has tools that log every message and let you explore and
introspect everything on it, etc. - it works for me.

> So why DOES audio need to go via DBUS?  What about video?  Why
> shouldn't that go via dbus as well?
>
> If one userspace implementation is so crappy, why can't that
> implementation be tossed and a better one done?  Or why can't they
> just optimize/tune it in userspace instead?

In this email I listed what I could remember app developers bringing
up when told to use a sidechannel instead of dbus:
http://article.gmane.org/gmane.linux.kernel/1935002

I can't speak to what makes sense for audio or video, but I'm sure
people who work on those things could.

Re: why can't it be done in userspace, the only thing I'd repeat again
here is that when people mention ways to speed up the bus daemon in
userspace, they often sound like they would abandon one or more of the
semantic guarantees of dbus (usually ordering, sometimes things like
the guaranteed-correct sender information or whatever). And _maybe_
some of those guarantees are worth abandoning, but I'd be very careful
with it.

Havoc

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 23:12                           ` John Stoffel
  2015-04-29  0:45                             ` Havoc Pennington
@ 2015-04-29 11:33                             ` Harald Hoyer
  2015-04-29 12:47                             ` Harald Hoyer
  2 siblings, 0 replies; 316+ messages in thread
From: Harald Hoyer @ 2015-04-29 11:33 UTC (permalink / raw)
  To: John Stoffel, Havoc Pennington
  Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski,
	Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

On 29.04.2015 01:12, John Stoffel wrote:
>>>>>> "Havoc" == Havoc Pennington <hp@pobox.com> writes:
> 
> Havoc> On Tue, Apr 28, 2015 at 1:18 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>>> I find dbus to be extremely hard to debug when my desktop starts doing
>>> things I don't want it to do.  The fact that it might be flinging around hundreds
>>> of thousands of messages, and that this is something we want to encourage,
> 
> Havoc> This particular argument doesn't resonate with me ... if dbus
> Havoc> is hard to debug, it's not as if "ad hoc application-specific
> Havoc> sidechannel somebody cooked up" is going to be easier.
> 
> When Ted is saying it's hard to debug... then maybe it's a bit crappy
> in design or implementation?  

There is a very nice tool to debug the traffic for kdbus.

http://lists.freedesktop.org/archives/dbus/2014-March/016178.html

Also the patched wireshark makes it as easy as analyzing network traffic.

^ permalink raw reply	[flat|nested] 316+ messages in thread

* Re: [GIT PULL] kdbus for 4.1-rc1
  2015-04-28 23:12                           ` John Stoffel
  2015-04-29  0:45                             ` Havoc Pennington
  2015-04-29 11:33                             ` Harald Hoyer
@ 2015-04-29 12:47                             ` Harald Hoyer
  2015-04-29 13:33                               ` Richard Weinberger
                                                 ` (5 more replies)
  2 siblings, 6 replies; 316+ messages in thread
From: Harald Hoyer @ 2015-04-29 12:47 UTC (permalink / raw)
  To: John Stoffel, Havoc Pennington
  Cc: Theodore Ts'o, Linus Torvalds, Andy Lutomirski,
	Lukasz Skalski, Greg Kroah-Hartman, Andrew Morton, Arnd Bergmann,
	Eric W. Biederman, One Thousand Gnomes, Tom Gundersen,
	Jiri Kosina, linux-kernel, Daniel Mack, David Herrmann,
	Djalal Harouni

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 29.04.2015 01:12, John Stoffel wrote:
> LDAP is pretty damn generic, in that you can put pretty large objects into
> it, and pretty large OUs, etc.  So why would it be a candidate for going
> into the kernel?  And why is kdbus so important in the kernel as well?
> People have talked about it needing to be there for bootup, but isn't that
> why we ripped out RAID detection and such from the kernel and built
> initramfs, so that there's LESS in the kernel, and more in an early
> userspace?  Same idea with dbus in my opinion.

Let me elaborate on the initramfs/shutdown situation a little bit more,
because I have to deal with that every day.

Because of the "let's move everything to userspace" sentiment we nowadays
have the situation, that we need a lot of tools to setup the root device.

Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the network
(with bridging, bonding, etc.), the iSCSI connection, assemble the raid, the
LVM, open crypto devices, etc...
And if something goes wrong, you want to have a shell, see all the logs and
debug things.

Now over the time we moved away from simple shell scripts (without any
logging) and static compiled special versions for the initramfs to a mini
distribution in the initramfs, which simplifies maintenance and improves
reliability.

Basically you want to use the same tools in the initramfs (and shutdown)
which you already have and use in your real root, with the same configuration
files and the same interfaces and the same code paths.

Therefore systemd is started in dracut created initramfs, which starts
journald for logging. The same basic