LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* tracepoints for kernel/mutex.c
@ 2008-10-16 21:04 Jason Baron
2008-10-16 21:34 ` Peter Zijlstra
2008-10-16 22:10 ` Mathieu Desnoyers
0 siblings, 2 replies; 13+ messages in thread
From: Jason Baron @ 2008-10-16 21:04 UTC (permalink / raw)
To: linux-kernel; +Cc: compudj, fche, fweisbec, edwintorok, mingo, peterz
hi,
Below are 3 tracepoints I've been playing with in kernel/mutex.c using
a SystemTap script. The idea is to detect and determine the cause of
lock contention. Currently I get the following output:
<contended mutex nam> <process name and pid of the contention> <time of
contention> <pid that woke me up(caused the contention?)>
for example:
name: &inode->i_mutex, pid name: udevd, pid: 26567, jiffies: 1 cause:
26566
Using SystemTap I can now modify my script to figure out the inode number, the
file and pathname, or the backtrace and get to the bottom of the
contention.
I think this approach has a number of advantages. It has low
overhead in the off case, since its based on tracepoints. It is
minimally invasive in the code path (3 tracepoints). It also allows me
to explore data structures and parts of the kernel by simply modifying
the SystemTap script. I do not need to re-compile the kernel and reboot.
You can run the script via: $stap mutex.stp
thanks,
-Jason
diff --git a/include/trace/mutex.h b/include/trace/mutex.h
new file mode 100644
index 0000000..8d365d6
--- /dev/null
+++ b/include/trace/mutex.h
@@ -0,0 +1,10 @@
+#ifndef _TRACE_MUTEX_H
+#define _TRACE_MUTEX_H
+
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(mutex_lock_contention, TPPROTO(int pid, unsigned long time), TPARGS(pid, time));
+DEFINE_TRACE(mutex_lock_contention_length, TPPROTO(char *name, int pid, unsigned long time), TPARGS(name, pid, time));
+DEFINE_TRACE(mutex_wakeup, TPPROTO(int target_pid, int sender_pid), TPARGS(target_pid, sender_pid));
+
+#endif
diff --git a/kernel/mutex-debug.c b/kernel/mutex-debug.c
index 1d94160..847a9fb 100644
--- a/kernel/mutex-debug.c
+++ b/kernel/mutex-debug.c
@@ -97,6 +97,7 @@ void debug_mutex_init(struct mutex *lock, const char *name,
#endif
lock->owner = NULL;
lock->magic = lock;
+ lock->name = name;
}
/***
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 12c779d..62c2aab 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -18,6 +18,7 @@
#include <linux/spinlock.h>
#include <linux/interrupt.h>
#include <linux/debug_locks.h>
+#include <trace/mutex.h>
/*
* In the DEBUG case we are using the "NULL fastpath" for mutexes,
@@ -147,6 +148,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
goto done;
lock_contended(&lock->dep_map, ip);
+ trace_mutex_lock_contention((int)task->pid, jiffies);
for (;;) {
/*
@@ -182,7 +184,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
schedule();
spin_lock_mutex(&lock->wait_lock, flags);
}
-
+ trace_mutex_lock_contention_length(lock->name, (int)task->pid, jiffies);
done:
lock_acquired(&lock->dep_map);
/* got the lock - rejoice! */
@@ -256,7 +258,7 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
struct mutex_waiter, list);
debug_mutex_wake_waiter(lock, waiter);
-
+ trace_mutex_wakeup(waiter->task->pid, current->pid);
wake_up_process(waiter->task);
}
diff --git a/ltt/probes/Makefile b/ltt/probes/Makefile
index 2ee8a46..0c54d01 100644
--- a/ltt/probes/Makefile
+++ b/ltt/probes/Makefile
@@ -10,7 +10,8 @@ CFLAGS_REMOVE_rcu-trace.o = -pg
endif
obj-$(CONFIG_LTT_TRACEPROBES) += kernel-trace.o mm-trace.o fs-trace.o \
- ipc-trace.o lockdep-trace.o rcu-trace.o
+ ipc-trace.o lockdep-trace.o rcu-trace.o \
+ mutex-trace.o
ifeq ($(CONFIG_NET),y)
ifdef CONFIG_FTRACE
diff --git a/ltt/probes/mutex-trace.c b/ltt/probes/mutex-trace.c
new file mode 100644
index 0000000..3b1ebeb
--- /dev/null
+++ b/ltt/probes/mutex-trace.c
@@ -0,0 +1,43 @@
+/*
+ * ltt/probes/fs-trace.c
+ *
+ * Mutex tracepoint probes.
+ */
+
+#include <linux/module.h>
+#include <trace/mutex.h>
+
+static unsigned long mutex_times[PID_MAX_DEFAULT+1];
+static int mutex_wakeup[PID_MAX_DEFAULT+1];
+
+void probe_mutex_lock_contention(int pid, unsigned long time)
+{
+ mutex_times[pid] = (unsigned long)time;
+
+ trace_mark_tp(mutex_lock_contention, mutex_lock_contention,
+ probe_mutex_lock_contention, "pid %d time %lu", pid, time);
+}
+
+void probe_mutex_lock_contention_length(char *name, int pid, unsigned long time)
+{
+ unsigned long delay = time - mutex_times[pid];
+ int reason = mutex_wakeup[pid];
+
+ /* FIXME: why is name sometimes null? */
+ if (!name)
+ return;
+
+ trace_mark_tp(mutex_lock_contention_length, mutex_lock_contention_length,
+ probe_mutex_lock_contention_length,
+ "name %s pid %d delay %lu cause %d",
+ name, pid, delay, reason);
+}
+
+void probe_mutex_wakeup(int target_pid, int sender_pid)
+{
+ mutex_wakeup[target_pid] = sender_pid;
+
+ trace_mark_tp(mutex_wakeup, mutex_wakeup,
+ probe_mutex_wakeup, "pid %d pid %d", target_pid, sender_pid);
+}
+
# mutex.stp
probe kernel.mark("mutex_lock_contention")
{
}
probe kernel.mark("mutex_wakeup")
{
}
probe kernel.mark("mutex_lock_contention_length")
{
name = $arg1
pid = $arg2
delay = $arg3
cause = $arg4
printf("name: %s, pid name: %s, pid: %d, jiffies: %d cause: %d\n",
name, execname(), pid, delay, cause);
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-16 21:04 tracepoints for kernel/mutex.c Jason Baron
@ 2008-10-16 21:34 ` Peter Zijlstra
2008-10-16 22:14 ` Thomas Gleixner
` (2 more replies)
2008-10-16 22:10 ` Mathieu Desnoyers
1 sibling, 3 replies; 13+ messages in thread
From: Peter Zijlstra @ 2008-10-16 21:34 UTC (permalink / raw)
To: Jason Baron
Cc: linux-kernel, compudj, fche, fweisbec, edwintorok, mingo, Steven Rostedt
On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
> Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> a SystemTap script. The idea is to detect and determine the cause of
> lock contention. Currently I get the following output:
>
> <contended mutex nam> <process name and pid of the contention> <time of
> contention> <pid that woke me up(caused the contention?)>
> I think this approach has a number of advantages. It has low
> overhead in the off case, since its based on tracepoints. It is
> minimally invasive in the code path (3 tracepoints). It also allows me
> to explore data structures and parts of the kernel by simply modifying
> the SystemTap script. I do not need to re-compile the kernel and reboot.
*sigh* this is why I hate markers and all related things...
_IFF_ you want to place tracepoints, get them in the same place as the
lock-dep/stat hooks, that way you get all the locks, not only mutexes.
This is the same reason I absolutely _hate_ Edwin's rwsem tracer.
Folks, lets please start by getting the tracing infrastructure in and
those few high-level trace-points google proposed.
Until we get the basics in, I think I'm going to NAK any and all
tracepoint/marker patches.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-16 21:04 tracepoints for kernel/mutex.c Jason Baron
2008-10-16 21:34 ` Peter Zijlstra
@ 2008-10-16 22:10 ` Mathieu Desnoyers
1 sibling, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2008-10-16 22:10 UTC (permalink / raw)
To: Jason Baron; +Cc: linux-kernel, fche, fweisbec, edwintorok, mingo, peterz
* Jason Baron (jbaron@redhat.com) wrote:
> hi,
>
> Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> a SystemTap script. The idea is to detect and determine the cause of
> lock contention. Currently I get the following output:
>
> <contended mutex nam> <process name and pid of the contention> <time of
> contention> <pid that woke me up(caused the contention?)>
>
> for example:
>
> name: &inode->i_mutex, pid name: udevd, pid: 26567, jiffies: 1 cause:
> 26566
>
>
> Using SystemTap I can now modify my script to figure out the inode number, the
> file and pathname, or the backtrace and get to the bottom of the
> contention.
>
> I think this approach has a number of advantages. It has low
> overhead in the off case, since its based on tracepoints. It is
> minimally invasive in the code path (3 tracepoints). It also allows me
> to explore data structures and parts of the kernel by simply modifying
> the SystemTap script. I do not need to re-compile the kernel and reboot.
>
> You can run the script via: $stap mutex.stp
>
> thanks,
>
> -Jason
>
>
> diff --git a/include/trace/mutex.h b/include/trace/mutex.h
> new file mode 100644
> index 0000000..8d365d6
> --- /dev/null
> +++ b/include/trace/mutex.h
> @@ -0,0 +1,10 @@
> +#ifndef _TRACE_MUTEX_H
> +#define _TRACE_MUTEX_H
> +
> +#include <linux/tracepoint.h>
> +
> +DEFINE_TRACE(mutex_lock_contention, TPPROTO(int pid, unsigned long time), TPARGS(pid, time));
> +DEFINE_TRACE(mutex_lock_contention_length, TPPROTO(char *name, int pid, unsigned long time), TPARGS(name, pid, time));
> +DEFINE_TRACE(mutex_wakeup, TPPROTO(int target_pid, int sender_pid), TPARGS(target_pid, sender_pid));
> +
> +#endif
> diff --git a/kernel/mutex-debug.c b/kernel/mutex-debug.c
> index 1d94160..847a9fb 100644
> --- a/kernel/mutex-debug.c
> +++ b/kernel/mutex-debug.c
> @@ -97,6 +97,7 @@ void debug_mutex_init(struct mutex *lock, const char *name,
> #endif
> lock->owner = NULL;
> lock->magic = lock;
> + lock->name = name;
> }
>
> /***
> diff --git a/kernel/mutex.c b/kernel/mutex.c
> index 12c779d..62c2aab 100644
> --- a/kernel/mutex.c
> +++ b/kernel/mutex.c
> @@ -18,6 +18,7 @@
> #include <linux/spinlock.h>
> #include <linux/interrupt.h>
> #include <linux/debug_locks.h>
> +#include <trace/mutex.h>
>
> /*
> * In the DEBUG case we are using the "NULL fastpath" for mutexes,
> @@ -147,6 +148,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> goto done;
>
> lock_contended(&lock->dep_map, ip);
> + trace_mutex_lock_contention((int)task->pid, jiffies);
>
> for (;;) {
> /*
> @@ -182,7 +184,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> schedule();
> spin_lock_mutex(&lock->wait_lock, flags);
> }
> -
> + trace_mutex_lock_contention_length(lock->name, (int)task->pid, jiffies);
> done:
> lock_acquired(&lock->dep_map);
> /* got the lock - rejoice! */
> @@ -256,7 +258,7 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
> struct mutex_waiter, list);
>
> debug_mutex_wake_waiter(lock, waiter);
> -
> + trace_mutex_wakeup(waiter->task->pid, current->pid);
> wake_up_process(waiter->task);
> }
>
> diff --git a/ltt/probes/Makefile b/ltt/probes/Makefile
> index 2ee8a46..0c54d01 100644
> --- a/ltt/probes/Makefile
> +++ b/ltt/probes/Makefile
> @@ -10,7 +10,8 @@ CFLAGS_REMOVE_rcu-trace.o = -pg
> endif
>
> obj-$(CONFIG_LTT_TRACEPROBES) += kernel-trace.o mm-trace.o fs-trace.o \
> - ipc-trace.o lockdep-trace.o rcu-trace.o
> + ipc-trace.o lockdep-trace.o rcu-trace.o \
> + mutex-trace.o
>
> ifeq ($(CONFIG_NET),y)
> ifdef CONFIG_FTRACE
> diff --git a/ltt/probes/mutex-trace.c b/ltt/probes/mutex-trace.c
> new file mode 100644
> index 0000000..3b1ebeb
> --- /dev/null
> +++ b/ltt/probes/mutex-trace.c
> @@ -0,0 +1,43 @@
> +/*
> + * ltt/probes/fs-trace.c
> + *
Perhaps you meant mutex-trace.c ? :)
> + * Mutex tracepoint probes.
> + */
> +
> +#include <linux/module.h>
> +#include <trace/mutex.h>
> +
> +static unsigned long mutex_times[PID_MAX_DEFAULT+1];
> +static int mutex_wakeup[PID_MAX_DEFAULT+1];
> +
What happens if PID_MAX_LIMIT > PID_MAX_DEFAULT ?
What happens when a process ID gets reused (exit/fork reusing a PID) ?
Could you add this information to the task_structs or does it really
have to be separate ?
Mathieu
> +void probe_mutex_lock_contention(int pid, unsigned long time)
> +{
> + mutex_times[pid] = (unsigned long)time;
> +
> + trace_mark_tp(mutex_lock_contention, mutex_lock_contention,
> + probe_mutex_lock_contention, "pid %d time %lu", pid, time);
> +}
> +
> +void probe_mutex_lock_contention_length(char *name, int pid, unsigned long time)
> +{
> + unsigned long delay = time - mutex_times[pid];
> + int reason = mutex_wakeup[pid];
> +
> + /* FIXME: why is name sometimes null? */
> + if (!name)
> + return;
> +
> + trace_mark_tp(mutex_lock_contention_length, mutex_lock_contention_length,
> + probe_mutex_lock_contention_length,
> + "name %s pid %d delay %lu cause %d",
> + name, pid, delay, reason);
> +}
> +
> +void probe_mutex_wakeup(int target_pid, int sender_pid)
> +{
> + mutex_wakeup[target_pid] = sender_pid;
> +
> + trace_mark_tp(mutex_wakeup, mutex_wakeup,
> + probe_mutex_wakeup, "pid %d pid %d", target_pid, sender_pid);
> +}
> +
>
>
> # mutex.stp
>
> probe kernel.mark("mutex_lock_contention")
> {
> }
>
> probe kernel.mark("mutex_wakeup")
> {
> }
>
> probe kernel.mark("mutex_lock_contention_length")
> {
> name = $arg1
> pid = $arg2
> delay = $arg3
> cause = $arg4
>
> printf("name: %s, pid name: %s, pid: %d, jiffies: %d cause: %d\n",
> name, execname(), pid, delay, cause);
> }
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-16 21:34 ` Peter Zijlstra
@ 2008-10-16 22:14 ` Thomas Gleixner
2008-10-17 5:09 ` KOSAKI Motohiro
2008-10-17 14:48 ` Jason Baron
2 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2008-10-16 22:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jason Baron, linux-kernel, compudj, fche, fweisbec, edwintorok,
mingo, Steven Rostedt
On Thu, 16 Oct 2008, Peter Zijlstra wrote:
> On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
> Folks, lets please start by getting the tracing infrastructure in and
> those few high-level trace-points google proposed.
>
> Until we get the basics in, I think I'm going to NAK any and all
> tracepoint/marker patches.
Ditto.
tglx
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-16 21:34 ` Peter Zijlstra
2008-10-16 22:14 ` Thomas Gleixner
@ 2008-10-17 5:09 ` KOSAKI Motohiro
2008-10-17 8:15 ` Peter Zijlstra
2008-10-17 16:36 ` Mathieu Desnoyers
2008-10-17 14:48 ` Jason Baron
2 siblings, 2 replies; 13+ messages in thread
From: KOSAKI Motohiro @ 2008-10-17 5:09 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kosaki.motohiro, Jason Baron, linux-kernel, compudj, fche,
fweisbec, edwintorok, mingo, Steven Rostedt
Hi
> Folks, lets please start by getting the tracing infrastructure in and
> those few high-level trace-points google proposed.
>
> Until we get the basics in, I think I'm going to NAK any and all
> tracepoint/marker patches.
Could you please tell me how to get these patches?
I'd like to review it.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-17 5:09 ` KOSAKI Motohiro
@ 2008-10-17 8:15 ` Peter Zijlstra
2008-10-17 8:22 ` KOSAKI Motohiro
2008-10-17 16:36 ` Mathieu Desnoyers
1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2008-10-17 8:15 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Jason Baron, linux-kernel, compudj, fche, fweisbec, edwintorok,
mingo, Steven Rostedt
On Fri, 2008-10-17 at 14:09 +0900, KOSAKI Motohiro wrote:
> Hi
>
> > Folks, lets please start by getting the tracing infrastructure in and
> > those few high-level trace-points google proposed.
> >
> > Until we get the basics in, I think I'm going to NAK any and all
> > tracepoint/marker patches.
>
> Could you please tell me how to get these patches?
> I'd like to review it.
I think writing is more the issue than reviewing here ;-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-17 8:15 ` Peter Zijlstra
@ 2008-10-17 8:22 ` KOSAKI Motohiro
0 siblings, 0 replies; 13+ messages in thread
From: KOSAKI Motohiro @ 2008-10-17 8:22 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kosaki.motohiro, Jason Baron, linux-kernel, compudj, fche,
fweisbec, edwintorok, mingo, Steven Rostedt
> On Fri, 2008-10-17 at 14:09 +0900, KOSAKI Motohiro wrote:
> > Hi
> >
> > > Folks, lets please start by getting the tracing infrastructure in and
> > > those few high-level trace-points google proposed.
> > >
> > > Until we get the basics in, I think I'm going to NAK any and all
> > > tracepoint/marker patches.
> >
> > Could you please tell me how to get these patches?
> > I'd like to review it.
>
> I think writing is more the issue than reviewing here ;-)
yup. I understand Jason's patch issue.
therefore, I interest to alternative googler approach as you said :)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-16 21:34 ` Peter Zijlstra
2008-10-16 22:14 ` Thomas Gleixner
2008-10-17 5:09 ` KOSAKI Motohiro
@ 2008-10-17 14:48 ` Jason Baron
2008-10-17 16:43 ` Mathieu Desnoyers
2008-10-17 16:48 ` Peter Zijlstra
2 siblings, 2 replies; 13+ messages in thread
From: Jason Baron @ 2008-10-17 14:48 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, compudj, fche, fweisbec, edwintorok, mingo, Steven Rostedt
On Thu, Oct 16, 2008 at 11:34:38PM +0200, Peter Zijlstra wrote:
> On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
>
> > Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> > a SystemTap script. The idea is to detect and determine the cause of
> > lock contention. Currently I get the following output:
> >
> > <contended mutex nam> <process name and pid of the contention> <time of
> > contention> <pid that woke me up(caused the contention?)>
>
> > I think this approach has a number of advantages. It has low
> > overhead in the off case, since its based on tracepoints. It is
> > minimally invasive in the code path (3 tracepoints). It also allows me
> > to explore data structures and parts of the kernel by simply modifying
> > the SystemTap script. I do not need to re-compile the kernel and reboot.
>
> *sigh* this is why I hate markers and all related things...
>
> _IFF_ you want to place tracepoints, get them in the same place as the
> lock-dep/stat hooks, that way you get all the locks, not only mutexes.
makes sense. So we could layer lock-dep/stat on top of tracepoints? That
would potentially also make lock-dep/stat more dynamic.
>
> This is the same reason I absolutely _hate_ Edwin's rwsem tracer.
>
i'm trying to get some consensus on these types of patches. Do we
want to create a new tracer for each thing we want to trace, or add
tracepoints?
> Folks, lets please start by getting the tracing infrastructure in and
> those few high-level trace-points google proposed.
>
> Until we get the basics in, I think I'm going to NAK any and all
> tracepoint/marker patches.
>
I think that core locking functions are pretty basic...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-17 5:09 ` KOSAKI Motohiro
2008-10-17 8:15 ` Peter Zijlstra
@ 2008-10-17 16:36 ` Mathieu Desnoyers
1 sibling, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2008-10-17 16:36 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Peter Zijlstra, Jason Baron, linux-kernel, fche, fweisbec,
edwintorok, mingo, Steven Rostedt, mbligh, ltt-dev
* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> Hi
>
> > Folks, lets please start by getting the tracing infrastructure in and
> > those few high-level trace-points google proposed.
> >
> > Until we get the basics in, I think I'm going to NAK any and all
> > tracepoint/marker patches.
>
> Could you please tell me how to get these patches?
> I'd like to review it.
>
Hi Kosaki,
I am focusing on the timestamping/data relay/tracing control
infrastructure for now, so I think it is too soon to post the
instrumentation patches selelected from LTTng which correspond to the
Google instrumentation before we get something that makes that data
useful within the kernel, but if you still want to have a look, comments
are welcome.
See
git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
or
http://www.kernel.org/pub/linux/kernel/people/compudj/patches/2.6/2.6.27/patch-2.6.27-lttng-0.43.tar.bz2
Commits :
irq-handle-prepare-for-instrumentation.patch
lttng-instrumentation-irq.patch
lttng-instrumentation-scheduler.patch
lttng-instrumentation-timer.patch
lttng-instrumentation-kernel.patch
lttng-instrumentation-filemap.patch
lttng-instrumentation-swap.patch
lttng-instrumentation-socket.patch
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-17 14:48 ` Jason Baron
@ 2008-10-17 16:43 ` Mathieu Desnoyers
2008-10-17 16:58 ` Frank Ch. Eigler
2008-10-17 16:48 ` Peter Zijlstra
1 sibling, 1 reply; 13+ messages in thread
From: Mathieu Desnoyers @ 2008-10-17 16:43 UTC (permalink / raw)
To: Jason Baron
Cc: Peter Zijlstra, linux-kernel, fche, fweisbec, edwintorok, mingo,
Steven Rostedt
* Jason Baron (jbaron@redhat.com) wrote:
> On Thu, Oct 16, 2008 at 11:34:38PM +0200, Peter Zijlstra wrote:
> > On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
> >
> > > Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> > > a SystemTap script. The idea is to detect and determine the cause of
> > > lock contention. Currently I get the following output:
> > >
> > > <contended mutex nam> <process name and pid of the contention> <time of
> > > contention> <pid that woke me up(caused the contention?)>
> >
> > > I think this approach has a number of advantages. It has low
> > > overhead in the off case, since its based on tracepoints. It is
> > > minimally invasive in the code path (3 tracepoints). It also allows me
> > > to explore data structures and parts of the kernel by simply modifying
> > > the SystemTap script. I do not need to re-compile the kernel and reboot.
> >
> > *sigh* this is why I hate markers and all related things...
> >
> > _IFF_ you want to place tracepoints, get them in the same place as the
> > lock-dep/stat hooks, that way you get all the locks, not only mutexes.
>
> makes sense. So we could layer lock-dep/stat on top of tracepoints? That
> would potentially also make lock-dep/stat more dynamic.
>
> >
> > This is the same reason I absolutely _hate_ Edwin's rwsem tracer.
> >
>
> i'm trying to get some consensus on these types of patches. Do we
> want to create a new tracer for each thing we want to trace, or add
> tracepoints?
>
> > Folks, lets please start by getting the tracing infrastructure in and
> > those few high-level trace-points google proposed.
> >
> > Until we get the basics in, I think I'm going to NAK any and all
> > tracepoint/marker patches.
> >
>
> I think that core locking functions are pretty basic...
>
>
Guys, please, let's focus on the infrastructure to manage trace data
(timestamping, buffering, event ID, event type management) before going
any further in the instrumentation direction. Otherwise we will end up
adding instrumentation in the Linux kernel without any in-kernel user
(or various tiny users without overall management insfrastructure), and
this sounds like a no-go. This is what Peter and Thomas are saying by
talking of NACK. Their position is actually nothing new : Linus has been
the first to say that at the Kernel Summit 2008, and I think it makes
sense.
We will have plenty of time to optimize
markers/tracepoints/lockdep/function trace _once_ we get the data
collection right.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-17 14:48 ` Jason Baron
2008-10-17 16:43 ` Mathieu Desnoyers
@ 2008-10-17 16:48 ` Peter Zijlstra
2008-10-17 16:54 ` Mathieu Desnoyers
1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2008-10-17 16:48 UTC (permalink / raw)
To: Jason Baron
Cc: linux-kernel, compudj, fche, fweisbec, edwintorok, mingo, Steven Rostedt
On Fri, 2008-10-17 at 10:48 -0400, Jason Baron wrote:
> On Thu, Oct 16, 2008 at 11:34:38PM +0200, Peter Zijlstra wrote:
> > On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
> >
> > > Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> > > a SystemTap script. The idea is to detect and determine the cause of
> > > lock contention. Currently I get the following output:
> > >
> > > <contended mutex nam> <process name and pid of the contention> <time of
> > > contention> <pid that woke me up(caused the contention?)>
> >
> > > I think this approach has a number of advantages. It has low
> > > overhead in the off case, since its based on tracepoints. It is
> > > minimally invasive in the code path (3 tracepoints). It also allows me
> > > to explore data structures and parts of the kernel by simply modifying
> > > the SystemTap script. I do not need to re-compile the kernel and reboot.
> >
> > *sigh* this is why I hate markers and all related things...
> >
> > _IFF_ you want to place tracepoints, get them in the same place as the
> > lock-dep/stat hooks, that way you get all the locks, not only mutexes.
>
> makes sense. So we could layer lock-dep/stat on top of tracepoints? That
> would potentially also make lock-dep/stat more dynamic.
I'm afraid that won't work. Both lockdep and lockstat rely on added data
to the lock structure. But what you can do is expose the hooks as
tracepoints when lockdep/lockstat is configured.
> >
> > This is the same reason I absolutely _hate_ Edwin's rwsem tracer.
> >
>
> i'm trying to get some consensus on these types of patches. Do we
> want to create a new tracer for each thing we want to trace, or add
> tracepoints?
The only thing I'd consider is one lock-tracer that exposes all
lockdep/lockstat hooks. Any half-assed partial solution won't fly.
> > Folks, lets please start by getting the tracing infrastructure in and
> > those few high-level trace-points google proposed.
> >
> > Until we get the basics in, I think I'm going to NAK any and all
> > tracepoint/marker patches.
> >
>
> I think that core locking functions are pretty basic...
For kernel developers, yes. For userspace stuff like latencytop should
be good enough to notice something is up.
And kernel developers can recompile their kernel - that's the only way
you're going to do anything about lock contention anyway.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-17 16:48 ` Peter Zijlstra
@ 2008-10-17 16:54 ` Mathieu Desnoyers
0 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2008-10-17 16:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jason Baron, linux-kernel, fche, fweisbec, edwintorok, mingo,
Steven Rostedt
* Peter Zijlstra (peterz@infradead.org) wrote:
> On Fri, 2008-10-17 at 10:48 -0400, Jason Baron wrote:
> > On Thu, Oct 16, 2008 at 11:34:38PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
> > >
> > > > Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> > > > a SystemTap script. The idea is to detect and determine the cause of
> > > > lock contention. Currently I get the following output:
> > > >
> > > > <contended mutex nam> <process name and pid of the contention> <time of
> > > > contention> <pid that woke me up(caused the contention?)>
> > >
> > > > I think this approach has a number of advantages. It has low
> > > > overhead in the off case, since its based on tracepoints. It is
> > > > minimally invasive in the code path (3 tracepoints). It also allows me
> > > > to explore data structures and parts of the kernel by simply modifying
> > > > the SystemTap script. I do not need to re-compile the kernel and reboot.
> > >
> > > *sigh* this is why I hate markers and all related things...
> > >
> > > _IFF_ you want to place tracepoints, get them in the same place as the
> > > lock-dep/stat hooks, that way you get all the locks, not only mutexes.
> >
> > makes sense. So we could layer lock-dep/stat on top of tracepoints? That
> > would potentially also make lock-dep/stat more dynamic.
>
> I'm afraid that won't work. Both lockdep and lockstat rely on added data
> to the lock structure. But what you can do is expose the hooks as
> tracepoints when lockdep/lockstat is configured.
>
> > >
> > > This is the same reason I absolutely _hate_ Edwin's rwsem tracer.
> > >
> >
> > i'm trying to get some consensus on these types of patches. Do we
> > want to create a new tracer for each thing we want to trace, or add
> > tracepoints?
>
> The only thing I'd consider is one lock-tracer that exposes all
> lockdep/lockstat hooks. Any half-assed partial solution won't fly.
>
> > > Folks, lets please start by getting the tracing infrastructure in and
> > > those few high-level trace-points google proposed.
> > >
> > > Until we get the basics in, I think I'm going to NAK any and all
> > > tracepoint/marker patches.
> > >
> >
> > I think that core locking functions are pretty basic...
>
> For kernel developers, yes. For userspace stuff like latencytop should
> be good enough to notice something is up.
>
> And kernel developers can recompile their kernel - that's the only way
> you're going to do anything about lock contention anyway.
>
We also have to consider if a less specific instrumentation can extract
this kind of data. Scheduler instrumentation can do a big part of that
job, namely identifying the wakeup cause. We would not have the
information about it being mutex-related unless we activate lockdep
tracing, potentially with tracepoints in the lockdep handlers. I think
this would fly. But let's keep this for later.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: tracepoints for kernel/mutex.c
2008-10-17 16:43 ` Mathieu Desnoyers
@ 2008-10-17 16:58 ` Frank Ch. Eigler
0 siblings, 0 replies; 13+ messages in thread
From: Frank Ch. Eigler @ 2008-10-17 16:58 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Jason Baron, Peter Zijlstra, linux-kernel, fweisbec, edwintorok,
mingo, Steven Rostedt
Hi -
On Fri, Oct 17, 2008 at 12:43:52PM -0400, Mathieu Desnoyers wrote:
> [...]
> > > _IFF_ you want to place tracepoints, get them in the same place as the
> > > lock-dep/stat hooks, that way you get all the locks, not only mutexes.
> >
> > makes sense. So we could layer lock-dep/stat on top of tracepoints? That
> > would potentially also make lock-dep/stat more dynamic.
> Guys, please, let's focus on the infrastructure to manage trace data
> (timestamping, buffering, event ID, event type management) before
> going any further in the instrumentation direction.
Any trace data management widget design that precludes connection to
an event source as simple as tracepoints or markers is going to be a
disappointment.
> Otherwise we will end up adding instrumentation in the Linux kernel
> without any in-kernel user [...]
Connecting markers to /proc style text files has been demonstrated in
less than a hundred lines of code.
Plus, Jason's note clearly referred to another in-kernel use of this
instrumentation: the possibility of connecting lockdep via generic
tracepoints in the lock-related code rather than special-purpose
hooks. One benefit could be being able to compile in lockdep and/or
lockstat by default (activating it via a boot option). The other
would be of course the concurrent/alternative of the instrumentation
for performance-related purposes.
- FChE
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-10-17 16:59 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-16 21:04 tracepoints for kernel/mutex.c Jason Baron
2008-10-16 21:34 ` Peter Zijlstra
2008-10-16 22:14 ` Thomas Gleixner
2008-10-17 5:09 ` KOSAKI Motohiro
2008-10-17 8:15 ` Peter Zijlstra
2008-10-17 8:22 ` KOSAKI Motohiro
2008-10-17 16:36 ` Mathieu Desnoyers
2008-10-17 14:48 ` Jason Baron
2008-10-17 16:43 ` Mathieu Desnoyers
2008-10-17 16:58 ` Frank Ch. Eigler
2008-10-17 16:48 ` Peter Zijlstra
2008-10-17 16:54 ` Mathieu Desnoyers
2008-10-16 22:10 ` Mathieu Desnoyers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).