LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: "Eric Paris" <eparis@redhat.com>,
	"Tom Zanussi" <tzanussi@gmail.com>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Arnaldo Carvalho de Melo" <acme@redhat.com>,
	"Li Zefan" <lizf@cn.fujitsu.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Thomas Gleixner" <tglx@linutronix.de>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	Eric Paris <eparis@parisplace.org>,
	linux-kernel@vger.kernel.org, agl@google.com, fweisbec@gmail.com,
	tzanussi@gmail.com, Jason Baron <jbaron@redhat.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	2nddept-manager@sdl.hitachi.co.jp
Subject: Re: Using ftrace/perf as a basis for generic seccomp
Date: Wed, 2 Feb 2011 18:55:56 +0100	[thread overview]
Message-ID: <20110202175556.GA13948@elte.hu> (raw)
In-Reply-To: <1296665124.3145.17.camel@localhost.localdomain>


* Eric Paris <eparis@redhat.com> wrote:

> On Wed, 2011-02-02 at 13:26 +0100, Ingo Molnar wrote:
> > * Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:
> > 
> > > Hi Eric,
> > > 
> > > (2011/02/01 23:58), Eric Paris wrote:
> > > > On Wed, Jan 12, 2011 at 4:28 PM, Eric Paris <eparis@redhat.com> wrote:
> > > >> Some time ago Adam posted a patch to allow for a generic seccomp
> > > >> implementation (unlike the current seccomp where your choice is all
> > > >> syscalls or only read, write, sigreturn, and exit) which got little
> > > >> traction and it was suggested he instead do the same thing somehow using
> > > >> the tracing code:
> > > >> http://thread.gmane.org/gmane.linux.kernel/833556
> > > 
> > > Hm, interesting idea :)
> > > But why would you like to use tracing code? just for hooking?
> > 
> > What I suggested before was to reuse the scripting engine and the tracepoints.
> > 
> > I.e. the "seccomp restrictions" can be implemented via a filter expression - and the 
> > scripting engine could be generalized so that such 'sandboxing' code can make use of 
> > it.
> > 
> > For example, if you want to restrict a process to only allow open() syscalls to fd 4 
> > (a very restrictive sandbox), it could be done via this filter expression:
> > 
> > 	'fd == 4'
> > 
> > etc. Note that obviously the scripting engine needs to be abstracted out somewhat - 
> > but this is the basic idea, to reuse the callbacks and reuse the scripting engine 
> > for runtime filtering of syscall parameters.
> 
> Any pointers on what is involved in this abstraction?  I can work out
> the details, but I don't know the big picture well enough to even start
> to move forwards.....

perf has support for these filters, so would it work with you if I gave you some 
example usage?

First you identify an interesting tracepoint - look at the list of:

   perf list | grep Tracepoint

Say we want to filter sys_close() events, so we pick:

  syscalls:sys_enter_close                     [Tracepoint event]

And record all sys_open (enter) events in the system, for one second:

   perf record -e syscalls:sys_enter_close -a sleep 1

All the recorded data will be in perf.data in cwd.

'perf report' will show a profile, and 'perf script' will show the trace output:

            perf-30558 [002] 117691.065243: sys_enter_close: fd: 0x00000016
            perf-30558 [002] 117691.065406: sys_enter_close: fd: 0x00000016
            perf-30558 [002] 117691.065443: sys_enter_close: fd: 0x00000017
            perf-30558 [002] 117691.065444: sys_enter_close: fd: 0x00000016
            [...]

Now, to record a 'filtered' event, use the --filter parameter when recording:

Available field names can be found in the 'format' file:

 cat /debug/tracing/events/syscalls/sys_close_enter/format 

 name: sys_enter_close
 ID: 402
 format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;
	field:int common_lock_depth;	offset:8;	size:4;	signed:1;

	field:int nr;	offset:12;	size:4;	signed:1;
	field:unsigned int fd;	offset:16;	size:8;	signed:0;

 print fmt: "fd: 0x%08lx", ((unsigned long)(REC->fd))

The interesting ones is:

	field:unsigned int fd;	offset:16;	size:8;	signed:0;

This is the field that represents the fd of the close(fd) call. To filter it, simply 
use it symbolically:

   perf record -e syscalls:sys_enter_close --filter 'fd==3' ./hackbench 5

As you can see it in 'perf script' output:

       hackbench-30576 [008] 117802.180002: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222056: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222064: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222065: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222067: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222069: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222070: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222071: sys_enter_close: fd: 0x00000003
       hackbench-30576 [008] 117802.222073: sys_enter_close: fd: 0x00000003

Only fd==3 events were recorded.

The filter expression engine executes in the kernel, when the event happens. The 
user-space perf tool parses the --filter parameter and passes it to the kernel as a 
string in essence. The kerner parses this into atomic predicaments which are linked 
to the event structure. When the event happens the predicaments are executed by the 
filter engine.

The expressions are simple, but rather flexible, so you can do 'fd==0||fd==1' and 
more complex expressions, etc. The engine could also be extended.

The kernel code is mostly in kernel/trace/trace_events_filter.c.

I've Cc:-ed Tom, Frederic, Steve, Li Zefan and Arnaldo who have worked on the filter 
engine, in case something is broken with this functionality or if there are other 
questions :)

Thanks,

	Ingo

  reply	other threads:[~2011-02-02 17:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-12 21:28 Eric Paris
2011-02-01 14:58 ` Eric Paris
2011-02-02 12:14   ` Masami Hiramatsu
2011-02-02 12:26     ` Ingo Molnar
2011-02-02 16:45       ` Eric Paris
2011-02-02 17:55         ` Ingo Molnar [this message]
2011-02-02 18:17           ` Steven Rostedt
2011-02-03 19:06         ` Frederic Weisbecker
2011-02-03 19:18           ` Frederic Weisbecker
2011-02-03 22:06           ` Stefan Fritsch
2011-02-03 23:10             ` Frederic Weisbecker
2011-02-04  1:50               ` Eric Paris
2011-02-04 14:31                 ` Peter Zijlstra
2011-02-04 16:29                   ` Eric Paris
2011-02-04 17:04                     ` Frederic Weisbecker
2011-02-05 11:51                       ` Stefan Fritsch
2011-02-07 12:26                         ` Peter Zijlstra
2011-02-04 16:36             ` Eric Paris
2011-02-05 11:42               ` Stefan Fritsch
2011-02-06 16:51                 ` Eric Paris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110202175556.GA13948@elte.hu \
    --to=mingo@elte.hu \
    --cc=2nddept-manager@sdl.hitachi.co.jp \
    --cc=acme@redhat.com \
    --cc=agl@google.com \
    --cc=eparis@parisplace.org \
    --cc=eparis@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=jbaron@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tzanussi@gmail.com \
    --subject='Re: Using ftrace/perf as a basis for generic seccomp' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).