LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Eric Paris <eparis@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Eric Paris <eparis@parisplace.org>,
	linux-kernel@vger.kernel.org, agl@google.com, tzanussi@gmail.com,
	Jason Baron <jbaron@redhat.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	2nddept-manager@sdl.hitachi.co.jp,
	Steven Rostedt <rostedt@goodmis.org>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: Using ftrace/perf as a basis for generic seccomp
Date: Thu, 3 Feb 2011 20:06:45 +0100	[thread overview]
Message-ID: <20110203190643.GC1769@nowhere> (raw)
In-Reply-To: <1296665124.3145.17.camel@localhost.localdomain>

On Wed, Feb 02, 2011 at 11:45:22AM -0500, Eric Paris wrote:
> On Wed, 2011-02-02 at 13:26 +0100, Ingo Molnar wrote:
> > * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> > 
> > > Hi Eric,
> > > 
> > > (2011/02/01 23:58), Eric Paris wrote:
> > > > On Wed, Jan 12, 2011 at 4:28 PM, Eric Paris <eparis@redhat.com> wrote:
> > > >> Some time ago Adam posted a patch to allow for a generic seccomp
> > > >> implementation (unlike the current seccomp where your choice is all
> > > >> syscalls or only read, write, sigreturn, and exit) which got little
> > > >> traction and it was suggested he instead do the same thing somehow using
> > > >> the tracing code:
> > > >> http://thread.gmane.org/gmane.linux.kernel/833556
> > > 
> > > Hm, interesting idea :)
> > > But why would you like to use tracing code? just for hooking?
> > 
> > What I suggested before was to reuse the scripting engine and the tracepoints.
> > 
> > I.e. the "seccomp restrictions" can be implemented via a filter expression - and the 
> > scripting engine could be generalized so that such 'sandboxing' code can make use of 
> > it.
> > 
> > For example, if you want to restrict a process to only allow open() syscalls to fd 4 
> > (a very restrictive sandbox), it could be done via this filter expression:
> > 
> > 	'fd == 4'
> > 
> > etc. Note that obviously the scripting engine needs to be abstracted out somewhat - 
> > but this is the basic idea, to reuse the callbacks and reuse the scripting engine 
> > for runtime filtering of syscall parameters.
> 
> Any pointers on what is involved in this abstraction?  I can work out
> the details, but I don't know the big picture well enough to even start
> to move forwards.....

In the big picture, the filtering code is very tight to the tracing code.
Creation, initialization, removal of filters is all made on top of the
trace events structures (struct ftrace_event_call) because we apply and
interpret filters on the fields of trace events, which are what we save
in a trace.

Example:

If you look at the sched switch trace events, we have several fields
like prev_comm and next_comm. These are defined in the TRACE_EVENT()
macros calls. So when we apply a filter like "prev_comm == firefox-bin",
we enter the filtering code with the trace_event structure for sched
switch events and iterate through its fields to find one called
prev_comm and then we work on top of that.
I think you won't work with trace events, so you need to make the
filtering code more tracing-agnostic.

But I think it's quite workable and shouldn't be too hard to split that
into a filtering backend. Many parts are already pretty standalone.

Also I suspect the tracepoints are not what you need. Or may be
they are. But as Masami said, the syscall tracepoint is called late.
It's workable though. The other problem is that preemption is disabled
when tracepoints are called, which is probably not what you want.
One day I think we'll need to unify the tracepoints and notifier
code but until then, better keep tracepoints for tracing.

Now once you have the filtering code more generic, you still
need an arch backend to map register contents and layout into syscall
arguments name and type. On top of which you can finally use the filtering
code. For that you can use, again, some code we use for tracing, which
are syscalls metadata: informations generated on build time
that have syscalls fields and type.
And that also needs to be split up, but it's more trivial
than the filtering part.

Note for now, filtering + syscalls metadata only works on top
of raw arguments value. Syscalls metadata don't know much
about type semantics and won't help you to dereference
syscall argument pointers. Only raw syscall parameter values.
Similarly, the filtering code can't evaluate pointer dereferencing
expression evaluation, only direct values comprehension.

But please note this is all features we want in the long term
anyway, using the kprobe expression code to intepret dereferencing,
and have more type introspection into kernel structures for
smarter syscalls metadata. And we can do that all gradually
without breaking backward.

Now with the current features you'll already have access to
a much more powerful seccomp implementation.

And if you have questions about anything, please don't hesitate.

  parent reply	other threads:[~2011-02-03 19:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-12 21:28 Eric Paris
2011-02-01 14:58 ` Eric Paris
2011-02-02 12:14   ` Masami Hiramatsu
2011-02-02 12:26     ` Ingo Molnar
2011-02-02 16:45       ` Eric Paris
2011-02-02 17:55         ` Ingo Molnar
2011-02-02 18:17           ` Steven Rostedt
2011-02-03 19:06         ` Frederic Weisbecker [this message]
2011-02-03 19:18           ` Frederic Weisbecker
2011-02-03 22:06           ` Stefan Fritsch
2011-02-03 23:10             ` Frederic Weisbecker
2011-02-04  1:50               ` Eric Paris
2011-02-04 14:31                 ` Peter Zijlstra
2011-02-04 16:29                   ` Eric Paris
2011-02-04 17:04                     ` Frederic Weisbecker
2011-02-05 11:51                       ` Stefan Fritsch
2011-02-07 12:26                         ` Peter Zijlstra
2011-02-04 16:36             ` Eric Paris
2011-02-05 11:42               ` Stefan Fritsch
2011-02-06 16:51                 ` Eric Paris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110203190643.GC1769@nowhere \
    --to=fweisbec@gmail.com \
    --cc=2nddept-manager@sdl.hitachi.co.jp \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=agl@google.com \
    --cc=eparis@parisplace.org \
    --cc=eparis@redhat.com \
    --cc=jbaron@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tzanussi@gmail.com \
    --subject='Re: Using ftrace/perf as a basis for generic seccomp' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).