LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Using sched_clock for mmio-trace
@ 2007-02-16  1:30 Jeff Muizelaar
  2007-02-16 16:30 ` Frank Ch. Eigler
  2007-02-16 20:02 ` Andi Kleen
  0 siblings, 2 replies; 20+ messages in thread
From: Jeff Muizelaar @ 2007-02-16  1:30 UTC (permalink / raw)
  To: linux-kernel

I've built a tool with the goal of logging mmio writes and reads by
device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.

I'd like to add support for recording a time stamp on each read and
write. Unfortunately, I am not sure which clock api I should use.

I had a look at blktrace and saw that it uses 'sched_clock()' for time
stamps. However, this symbol is not exported to modules, and from what
I've read it sounds like its use is discouraged.

The question is, what api should I be using? I need something that can
be called from inside interrupt handlers, and obviously the more
accurate and the lower the overhead the better.

-Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16  1:30 Using sched_clock for mmio-trace Jeff Muizelaar
@ 2007-02-16 16:30 ` Frank Ch. Eigler
  2007-02-16 17:45   ` Daniel Walker
                     ` (2 more replies)
  2007-02-16 20:02 ` Andi Kleen
  1 sibling, 3 replies; 20+ messages in thread
From: Frank Ch. Eigler @ 2007-02-16 16:30 UTC (permalink / raw)
  To: Jeff Muizelaar; +Cc: linux-kernel

Jeff Muizelaar <jeff@infidigm.net> writes:

> I've built a tool with the goal of logging mmio writes and reads by
> device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.

FWIW, this is exactly a type of add-on trace patch that could be
mooted by adoption of the ltt/systemtap "marker" facility.  With it,
you would not need so much code (e.g. no new user-space tools at all,
reuse of common tracing buffer logic, permanently placed hooks) and
would probably get more utility.

> [...]  The question is, what [timer] api should I be using? I need
> something that can be called from inside interrupt handlers, and
> obviously the more accurate and the lower the overhead the better.

We in systemtap land have the same problem, and so far made do with
slightly postprocessed per-cpu TSC values.


- FChE

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 16:30 ` Frank Ch. Eigler
@ 2007-02-16 17:45   ` Daniel Walker
  2007-02-16 18:10     ` Jeff Muizelaar
  2007-02-16 18:30   ` Jeff Muizelaar
  2007-02-16 20:03   ` Andi Kleen
  2 siblings, 1 reply; 20+ messages in thread
From: Daniel Walker @ 2007-02-16 17:45 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Jeff Muizelaar, linux-kernel

On Fri, 2007-02-16 at 11:30 -0500, Frank Ch. Eigler wrote:
> Jeff Muizelaar <jeff@infidigm.net> writes:
> 
> > I've built a tool with the goal of logging mmio writes and reads by
> > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
> 
> FWIW, this is exactly a type of add-on trace patch that could be
> mooted by adoption of the ltt/systemtap "marker" facility.  With it,
> you would not need so much code (e.g. no new user-space tools at all,
> reuse of common tracing buffer logic, permanently placed hooks) and
> would probably get more utility.
> 
> > [...]  The question is, what [timer] api should I be using? I need
> > something that can be called from inside interrupt handlers, and
> > obviously the more accurate and the lower the overhead the better.
> 
> We in systemtap land have the same problem, and so far made do with
> slightly postprocessed per-cpu TSC values.

I've been working on a patch set (below), to expose the clocksources
used by generic time to multiple users . It would allow timestamps from
different clocks in a generic way. It's not merged, but I'd appreciate
any input either of you might have..

ftp://source.mvista.com/pub/dwalker/clocksource/

Daniel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 17:45   ` Daniel Walker
@ 2007-02-16 18:10     ` Jeff Muizelaar
  2007-02-16 18:28       ` Daniel Walker
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Muizelaar @ 2007-02-16 18:10 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Frank Ch. Eigler, linux-kernel

On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
> I've been working on a patch set (below), to expose the clocksources
> used by generic time to multiple users . It would allow timestamps from
> different clocks in a generic way. It's not merged, but I'd appreciate
> any input either of you might have..
> 
> ftp://source.mvista.com/pub/dwalker/clocksource/

Is it possible to see the resulting clocksource.h and maybe
clocksource.c after the patch set? That would make looking at it much
easier.

-Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 18:10     ` Jeff Muizelaar
@ 2007-02-16 18:28       ` Daniel Walker
  2007-02-16 19:34         ` Jeff Muizelaar
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Walker @ 2007-02-16 18:28 UTC (permalink / raw)
  To: Jeff Muizelaar; +Cc: Frank Ch. Eigler, linux-kernel

On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote:
> On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
> > I've been working on a patch set (below), to expose the clocksources
> > used by generic time to multiple users . It would allow timestamps from
> > different clocks in a generic way. It's not merged, but I'd appreciate
> > any input either of you might have..
> > 
> > ftp://source.mvista.com/pub/dwalker/clocksource/
> 
> Is it possible to see the resulting clocksource.h and maybe
> clocksource.c after the patch set? That would make looking at it much
> easier.

Well you could just apply the patch set, but I stuck them in the same
directory as above .. I'll delete them in 24 hours or so ..

At one point I replaced sched_clock() , 

ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch

The API is similar to that version, and sched_clock was the simplest
user of the API that I've done.

Daniel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 16:30 ` Frank Ch. Eigler
  2007-02-16 17:45   ` Daniel Walker
@ 2007-02-16 18:30   ` Jeff Muizelaar
  2007-02-16 18:44     ` Randy Dunlap
  2007-02-16 20:03   ` Andi Kleen
  2 siblings, 1 reply; 20+ messages in thread
From: Jeff Muizelaar @ 2007-02-16 18:30 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: linux-kernel

On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote:
> Jeff Muizelaar <jeff@infidigm.net> writes:
> 
> > I've built a tool with the goal of logging mmio writes and reads by
> > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
> 
> FWIW, this is exactly a type of add-on trace patch that could be
> mooted by adoption of the ltt/systemtap "marker" facility.  With it,
> you would not need so much code (e.g. no new user-space tools at all,
> reuse of common tracing buffer logic, permanently placed hooks) and
> would probably get more utility.

Is there more information on this "marker" facility? e.g. what is a
marker? Are they just like tracepoints?

> > [...]  The question is, what [timer] api should I be using? I need
> > something that can be called from inside interrupt handlers, and
> > obviously the more accurate and the lower the overhead the better.
> 
> We in systemtap land have the same problem, and so far made do with
> slightly postprocessed per-cpu TSC values.

Thanks, I'll have a look at systemtap does. It's the stuff
src/runtime/time.c right?

Also, you might want to move this stuff from the README to someplace
more prominent on the systemtap website. It's nice not to have to dig
far to get the source...

Download systemtap sources snapshot or from CVS:
ftp://sources.redhat.com/pub/systemtap/snapshots/ 
(or)
cvs -d :pserver:anoncvs@sources.redhat.com:/cvs/systemtap login
# enter "anoncvs" as the password
cvs -d :pserver:anoncvs@sources.redhat.com:/cvs/systemtap co src

-Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 18:30   ` Jeff Muizelaar
@ 2007-02-16 18:44     ` Randy Dunlap
  2007-02-16 19:55       ` Jeff Muizelaar
  0 siblings, 1 reply; 20+ messages in thread
From: Randy Dunlap @ 2007-02-16 18:44 UTC (permalink / raw)
  To: Jeff Muizelaar; +Cc: Frank Ch. Eigler, linux-kernel

On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote:

> On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote:
> > Jeff Muizelaar <jeff@infidigm.net> writes:
> > 
> > > I've built a tool with the goal of logging mmio writes and reads by
> > > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
> > 
> > FWIW, this is exactly a type of add-on trace patch that could be
> > mooted by adoption of the ltt/systemtap "marker" facility.  With it,
> > you would not need so much code (e.g. no new user-space tools at all,
> > reuse of common tracing buffer logic, permanently placed hooks) and
> > would probably get more utility.
> 
> Is there more information on this "marker" facility? e.g. what is a
> marker? Are they just like tracepoints?

On lkml for the past few days/months.
Look for "Linux Kernel Markers" in the subject line.

Mathieu, do you have a web site for LK Markers?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 18:28       ` Daniel Walker
@ 2007-02-16 19:34         ` Jeff Muizelaar
  2007-02-16 21:06           ` Daniel Walker
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Muizelaar @ 2007-02-16 19:34 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Frank Ch. Eigler, linux-kernel

On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote:
> On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote:
> > On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
> > > I've been working on a patch set (below), to expose the clocksources
> > > used by generic time to multiple users . It would allow timestamps from
> > > different clocks in a generic way. It's not merged, but I'd appreciate
> > > any input either of you might have..
> > > 
> > > ftp://source.mvista.com/pub/dwalker/clocksource/
> > 
> > Is it possible to see the resulting clocksource.h and maybe
> > clocksource.c after the patch set? That would make looking at it much
> > easier.
> 
> Well you could just apply the patch set, but I stuck them in the same
> directory as above .. I'll delete them in 24 hours or so ..
> 
> At one point I replaced sched_clock() , 
> 
> ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch
> 
> The API is similar to that version, and sched_clock was the simplest
> user of the API that I've done.

Ok, so it would basically be:

init()
{
	clocksource *clock = clocksource_get_best_clock();
}

trace()
{
	trace.time = cyc2ns(clock, clocksource_read(clock));
}

This seems pretty sane to me. 

The blk-trace code calibrates a per cpu offset for the sched_clock()
time (see blk_trace_check_cpu_time and blk_trace_set_ht_offsets). Does
the clocksource stuff help me with this or would I still need to do
something like that?

I also noticed that you have a clocksource_get_masked_clock() call. This
seems like a pretty awkward API to me. The first thing that came to mind
when I read the name was 'what is a masked clock'. When I realized that
it meant 'a clock w/o this flag', I still found it awkward that one has
to specify what that don't want. e.g 'I don't want a clock that is not
continuous.'

It think it would be better if you had sometime like
'clocksource_get_clock_with_features()' that took flags describing the
needed characteristics instead of the unwanted ones.

e.g.
clocksource_get_clock_with_features(CLOCKSOURCE_STABLE)
or
clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED)


instead of:

clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)

Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
anyway to specify that I want a 64bit timer, only a way to specify that
I don't.

It also isn't clear what the implications of some of the flags are:
e.g:
	NOT_CONTINUOUS - don't really have any idea what this means.
	UNSTABLE - this means the frequency can change right?
		Does PM_AFFECTED imply UNSTABLE?
	NOT_ATOMIC - does this affect me as user?
	PM_AFFECTED - it looks like the stp code deals with cpu speed
		changing. Does the clocksource code do this for me with
		cyc2ns?  If it does are there any reason I would want to
		avoid PM_AFFECTED clocks? If it doesn't how do I know
		that I need to correct it myself.

Hope this helps,

-Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 20:02 ` Andi Kleen
@ 2007-02-16 19:40   ` Jeff Muizelaar
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Muizelaar @ 2007-02-16 19:40 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On Fri, Feb 16, 2007 at 09:02:44PM +0100, Andi Kleen wrote:
> Jeff Muizelaar <jeff@infidigm.net> writes:
> > 
> > The question is, what api should I be using? I need something that can
> > be called from inside interrupt handlers, and obviously the more
> > accurate and the lower the overhead the better.
> 
> Use do_gettimeofday(). sched_clock() is not for general use
> and only for some very limited use cases and will give you
> unexpected results in several cases.
> 
> There are a few cases where gtod is still a little slow, but these
> are being addressed. In many cases it is fast. In some hardware
> it stays slow, but there is not much that can be done about that
> because of the hardware design.
> 
> It works fine from interrupt handlers and other strange contexts.

Ok, I'll probably use it then. How does the overhead of calling
do_gettimteofday() compare to doing an mmio read/write over PCI express?
i.e. is it going to be a performance problem if I call do_gettimeofday
for every mmio read/write?

Also, is there any good reason blk-trace doesn't use do_gettimeofday()? 

-Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 18:44     ` Randy Dunlap
@ 2007-02-16 19:55       ` Jeff Muizelaar
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Muizelaar @ 2007-02-16 19:55 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Frank Ch. Eigler, linux-kernel

On Fri, Feb 16, 2007 at 10:44:15AM -0800, Randy Dunlap wrote:
> On Fri, 16 Feb 2007 13:30:14 -0500 Jeff Muizelaar wrote:
> 
> > On Fri, Feb 16, 2007 at 11:30:56AM -0500, Frank Ch. Eigler wrote:
> > > Jeff Muizelaar <jeff@infidigm.net> writes:
> > > 
> > > > I've built a tool with the goal of logging mmio writes and reads by
> > > > device drivers. See http://nouveau.freedesktop.org/wiki/MmioTrace.
> > > 
> > > FWIW, this is exactly a type of add-on trace patch that could be
> > > mooted by adoption of the ltt/systemtap "marker" facility.  With it,
> > > you would not need so much code (e.g. no new user-space tools at all,
> > > reuse of common tracing buffer logic, permanently placed hooks) and
> > > would probably get more utility.
> > 
> > Is there more information on this "marker" facility? e.g. what is a
> > marker? Are they just like tracepoints?
> 
> On lkml for the past few days/months.
> Look for "Linux Kernel Markers" in the subject line.
> 
> Mathieu, do you have a web site for LK Markers?
> 

Yeah, so if I understand correctly, markers are basically compile time locations
that you can attach function calls to at run time, right?. If so, I
don't think they are of much use to me.

What I am doing is page-faulting on every read or write to an mmio
region, decoding the faulting instruction and passing the decoded
information up to userspace through relayfs.

-Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16  1:30 Using sched_clock for mmio-trace Jeff Muizelaar
  2007-02-16 16:30 ` Frank Ch. Eigler
@ 2007-02-16 20:02 ` Andi Kleen
  2007-02-16 19:40   ` Jeff Muizelaar
  1 sibling, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2007-02-16 20:02 UTC (permalink / raw)
  To: Jeff Muizelaar; +Cc: linux-kernel

Jeff Muizelaar <jeff@infidigm.net> writes:
> 
> The question is, what api should I be using? I need something that can
> be called from inside interrupt handlers, and obviously the more
> accurate and the lower the overhead the better.

Use do_gettimeofday(). sched_clock() is not for general use
and only for some very limited use cases and will give you
unexpected results in several cases.

There are a few cases where gtod is still a little slow, but these
are being addressed. In many cases it is fast. In some hardware
it stays slow, but there is not much that can be done about that
because of the hardware design.

It works fine from interrupt handlers and other strange contexts. 

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 16:30 ` Frank Ch. Eigler
  2007-02-16 17:45   ` Daniel Walker
  2007-02-16 18:30   ` Jeff Muizelaar
@ 2007-02-16 20:03   ` Andi Kleen
  2007-02-16 21:26     ` Frank Ch. Eigler
  2 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2007-02-16 20:03 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Jeff Muizelaar, linux-kernel

fche@redhat.com (Frank Ch. Eigler) writes:
> 
> We in systemtap land have the same problem, and so far made do with
> slightly postprocessed per-cpu TSC values.

90+% likely you're not solving your problem correctly this way.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 19:34         ` Jeff Muizelaar
@ 2007-02-16 21:06           ` Daniel Walker
  2007-02-16 22:10             ` Jeff Muizelaar
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Walker @ 2007-02-16 21:06 UTC (permalink / raw)
  To: Jeff Muizelaar; +Cc: Frank Ch. Eigler, linux-kernel

On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote:
> On Fri, Feb 16, 2007 at 10:28:50AM -0800, Daniel Walker wrote:
> > On Fri, 2007-02-16 at 13:10 -0500, Jeff Muizelaar wrote:
> > > On Fri, Feb 16, 2007 at 09:45:21AM -0800, Daniel Walker wrote:
> > > > I've been working on a patch set (below), to expose the clocksources
> > > > used by generic time to multiple users . It would allow timestamps from
> > > > different clocks in a generic way. It's not merged, but I'd appreciate
> > > > any input either of you might have..
> > > > 
> > > > ftp://source.mvista.com/pub/dwalker/clocksource/
> > > 
> > > Is it possible to see the resulting clocksource.h and maybe
> > > clocksource.c after the patch set? That would make looking at it much
> > > easier.
> > 
> > Well you could just apply the patch set, but I stuck them in the same
> > directory as above .. I'll delete them in 24 hours or so ..
> > 
> > At one point I replaced sched_clock() , 
> > 
> > ftp://source.mvista.com/pub/dwalker/clocksource/clocksource-v10/add_generic_sched_clock.patch
> > 
> > The API is similar to that version, and sched_clock was the simplest
> > user of the API that I've done.
> 
> Ok, so it would basically be:
> 
> init()
> {
> 	clocksource *clock = clocksource_get_best_clock();
> }
> 
> trace()
> {
> 	trace.time = cyc2ns(clock, clocksource_read(clock));
> }
> 
> This seems pretty sane to me. 

sched_clock has the down side of not taking into account rollover .. The
scheduler doesn't care if it gets a few bad timestamps now and then, but
that might screw up other code..

Also you could read cycles , and then later convert to nanoseconds when
it's needed .. Which makes taking the timestamps a little faster.

> The blk-trace code calibrates a per cpu offset for the sched_clock()
> time (see blk_trace_check_cpu_time and blk_trace_set_ht_offsets). Does
> the clocksource stuff help me with this or would I still need to do
> something like that?

Most of the clocksources are not per cpu .. But the tsc clocksource
doesn't keep any per cpu offset information. There has been some work on
syncing the TSC across cpus so I'm not sure to what extend an offset
would be needed.

> I also noticed that you have a clocksource_get_masked_clock() call. This
> seems like a pretty awkward API to me. The first thing that came to mind
> when I read the name was 'what is a masked clock'. When I realized that
> it meant 'a clock w/o this flag', I still found it awkward that one has
> to specify what that don't want. e.g 'I don't want a clock that is not
> continuous.'

yeah it is a little strange . 

> It think it would be better if you had sometime like
> 'clocksource_get_clock_with_features()' that took flags describing the
> needed characteristics instead of the unwanted ones.
> 
> e.g.
> clocksource_get_clock_with_features(CLOCKSOURCE_STABLE)
> or
> clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED)
> 

One reason I did it the other way is because it's easier to specify what
you know you don't want, than to specify all the things that you know
you do want.

Almost everyone would want every flags listed,

clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
|CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);

Gets pretty ugly .. The clocksource interface already has a positive
rating to describe the "best" clocks in the system, which is used to
return the "best" clock .. Where the maintainers of the system give each
clock a rating. I would imagine most people would just get the so called
"best" clock which has the best rating..

I'm starting to think this long flags stringing effect could happen with
negative flags also, but it's seems a lot less likely.

> instead of
> 
> clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
> clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
> 
> Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
> anyway to specify that I want a 64bit timer, only a way to specify that
> I don't.

I might add a way to get specific flags, but I still think the flags
should be mostly negative features.

> It also isn't clear what the implications of some of the flags are:
> e.g:
> 	NOT_CONTINUOUS - don't really have any idea what this means.

It means the clock uses an interrupt to extend it's precision, or it's
not a real cycle counter and depends only on an interrupt for timing.

> 	UNSTABLE - this means the frequency can change right?
> 		Does PM_AFFECTED imply UNSTABLE?

PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE
means it's already become unstable.

> 	NOT_ATOMIC - does this affect me as user?

It could .. The PIT clock for example takes a spinlock. Some users might
not want to use that clock cause they call from a context where that
isn't acceptable (LTT for example). It's also slower to read from.

> 	PM_AFFECTED - it looks like the stp code deals with cpu speed
> 		changing. Does the clocksource code do this for me with
> 		cyc2ns?  If it does are there any reason I would want to
> 		avoid PM_AFFECTED clocks? If it doesn't how do I know
> 		that I need to correct it myself.

There is a block notification system that lets you know when a clock
becomes unstable . cyc2ns doesn't compensate for frequency changes . The
TSC for example could fluctuate frequency pretty often. sched_clock for
example uses the clock no matter what the state is , which is why I
leave unstable clock in the list.

> Hope this helps,

Yeah, thanks for the feedback .

Daniel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 20:03   ` Andi Kleen
@ 2007-02-16 21:26     ` Frank Ch. Eigler
  2007-02-17 14:56       ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Frank Ch. Eigler @ 2007-02-16 21:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jeff Muizelaar, linux-kernel

Hi -

On Fri, Feb 16, 2007 at 09:03:23PM +0100, Andi Kleen wrote:
> > We in systemtap land have the same problem, and so far made do with
> > slightly postprocessed per-cpu TSC values.
> 
> 90+% likely you're not solving your problem correctly this way.

Yes, it was done as a last resort.

We need facility that we can call from even more demanding contexts
than interrupt handlers, considering that kprobes can be placed nearly
anywhere.  This is one of the reasons why we don't just use good old
do_gettimeofday(), since it takes locks and can lead to lock recursion
if parts of itself are probed.

- FChE

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 21:06           ` Daniel Walker
@ 2007-02-16 22:10             ` Jeff Muizelaar
  2007-02-16 22:47               ` Daniel Walker
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Muizelaar @ 2007-02-16 22:10 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Frank Ch. Eigler, linux-kernel

On Fri, Feb 16, 2007 at 01:06:19PM -0800, Daniel Walker wrote:
> On Fri, 2007-02-16 at 14:34 -0500, Jeff Muizelaar wrote:
> > It think it would be better if you had sometime like
> > 'clocksource_get_clock_with_features()' that took flags describing the
> > needed characteristics instead of the unwanted ones.
> > 
> > e.g.
> > clocksource_get_clock_with_features(CLOCKSOURCE_STABLE)
> > or
> > clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED)
> > 
> 
> One reason I did it the other way is because it's easier to specify what
> you know you don't want, than to specify all the things that you know
> you do want.
> 
> Almost everyone would want every flags listed,
> 
> clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
> |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);

I still meant for _with_features to have same semantics so calling:

clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
 |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);

would be equivalent to calling:

clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE
 |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS);

The only difference is that the naming is reversed. i.e. it isn't a
doulbe negative.

perhaps a better name would clocksource_get_clock_must_have() or
clocksource_get_clock_must_be()
 
> Gets pretty ugly .. The clocksource interface already has a positive
> rating to describe the "best" clocks in the system, which is used to
> return the "best" clock .. Where the maintainers of the system give each
> clock a rating. I would imagine most people would just get the so called
> "best" clock which has the best rating..
> 
> I'm starting to think this long flags stringing effect could happen with
> negative flags also, but it's seems a lot less likely.

The amount of flag stringing should be the same.

> 
> > instead of
> > 
> > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
> > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
> > 
> > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
> > anyway to specify that I want a 64bit timer, only a way to specify that
> > I don't.
> 
> I might add a way to get specific flags, but I still think the flags
> should be mostly negative features.

Yeah, the problem is that all of the features are negative except for
CLOCKSOURCE_64BIT, so you can't mask for it.

> > It also isn't clear what the implications of some of the flags are:
> > e.g:
> > 	NOT_CONTINUOUS - don't really have any idea what this means.
> 
> It means the clock uses an interrupt to extend it's precision, or it's
> not a real cycle counter and depends only on an interrupt for timing.
> 
> > 	UNSTABLE - this means the frequency can change right?
> > 		Does PM_AFFECTED imply UNSTABLE?
> 
> PM_AFFECTED means it could become unstable, and CLOCKSOURC_UNSTABLE
> means it's already become unstable.
> 
> > 	NOT_ATOMIC - does this affect me as user?
> 
> It could .. The PIT clock for example takes a spinlock. Some users might
> not want to use that clock cause they call from a context where that
> isn't acceptable (LTT for example). It's also slower to read from.
> 
> > 	PM_AFFECTED - it looks like the stp code deals with cpu speed
> > 		changing. Does the clocksource code do this for me with
> > 		cyc2ns?  If it does are there any reason I would want to
> > 		avoid PM_AFFECTED clocks? If it doesn't how do I know
> > 		that I need to correct it myself.
> 
> There is a block notification system that lets you know when a clock
> becomes unstable . cyc2ns doesn't compensate for frequency changes . The
> TSC for example could fluctuate frequency pretty often. sched_clock for
> example uses the clock no matter what the state is , which is why I
> leave unstable clock in the list.

It might be good if something like these explanations could be added as
comments beside the feature flags.

So, if I was a systemtap style user of the clocksource api, I'd still have to
do something like:

init() {
	// assume this gives me the tsc
	clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC);
	register a cpu_freq notifier
}

trace() {
	trace.time =
	compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock))
}

In this case it doesn't look like using the clocksource stuff helps this
style of user much as all that is abstracted away is really the reading
the tsc. On the other hand it sounds like if you want all this stuff
taken care of for you, most people should be using do_gettimeofday()
instead.

-Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 22:10             ` Jeff Muizelaar
@ 2007-02-16 22:47               ` Daniel Walker
  2007-02-17  4:36                 ` Jeff Muizelaar
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Walker @ 2007-02-16 22:47 UTC (permalink / raw)
  To: Jeff Muizelaar; +Cc: Frank Ch. Eigler, linux-kernel

On Fri, 2007-02-16 at 17:10 -0500, Jeff Muizelaar wrote:

> 
> I still meant for _with_features to have same semantics so calling:
> 
> clocksource_get_clock_with_features(CLOCKSOURCE_PM_UNAFFECTED|CLOCKSOURCE_STABLE
>  |CLOCKSOURC_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_CONTINUOUS);
> 
> would be equivalent to calling:
> 
> clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_UNSTABLE
>  |CLOCKSOURC_NOT_ATOMIC|CLOCKSOURCE_64BITS|CLOCKSOURCE_NOT_CONTINUOUS);
> 
> The only difference is that the naming is reversed. i.e. it isn't a
> doulbe negative.

Yes, I assumed that ..

> perhaps a better name would clocksource_get_clock_must_have() or
> clocksource_get_clock_must_be()
>  
> > Gets pretty ugly .. The clocksource interface already has a positive
> > rating to describe the "best" clocks in the system, which is used to
> > return the "best" clock .. Where the maintainers of the system give each
> > clock a rating. I would imagine most people would just get the so called
> > "best" clock which has the best rating..
> > 
> > I'm starting to think this long flags stringing effect could happen with
> > negative flags also, but it's seems a lot less likely.
> 
> The amount of flag stringing should be the same.

I don't think so .. The common case with negative flags is no flags,
then next would be CLOCKSOURCE_UNSTABLE. At most I would guess two
flags .. The other direction your likely to have people using all flags
most of the time. That's why I showed a function call with all the flags
listed.

To be clear the clocksources are already sorted with a rating (no flags
involved). 

So for example, On i386 the clocks are sorts as follows,

tsc (300)
acpi_pm (200)
pit (110)
jiffies (1)

So the bulk of the positive information is already there for you. The
tsc clock is by far the best in the system if it's stable. If you ignore
the flags, then you will always be given the tsc .

> > > instead of
> > > 
> > > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
> > > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
> > > 
> > > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
> > > anyway to specify that I want a 64bit timer, only a way to specify that
> > > I don't.
> > 
> > I might add a way to get specific flags, but I still think the flags
> > should be mostly negative features.
> 
> Yeah, the problem is that all of the features are negative except for
> CLOCKSOURCE_64BIT, so you can't mask for it.

It's meant as a negative feature. So you can mask it if you can't handle
the math .. The only 64bit clock I know off is the tsc, and it's got the
highest rating of all clocks.


> It might be good if something like these explanations could be added as
> comments beside the feature flags.

Ok .. On my TODO list ..

> So, if I was a systemtap style user of the clocksource api, I'd still have to
> do something like:
> 
> init() {
> 	// assume this gives me the tsc
> 	clock = clocksource_get_masked_clock(CLOCKSOURCE_NOT_ATOMIC);
> 	register a cpu_freq notifier
> }
> 
> trace() {
> 	trace.time =
> 	compute_cyc2_ns_by_hand_using_info_from_cpu_freq(clocksource_read(clock))
> }
> 
> In this case it doesn't look like using the clocksource stuff helps this
> style of user much as all that is abstracted away is really the reading
> the tsc. On the other hand it sounds like if you want all this stuff
> taken care of for you, most people should be using do_gettimeofday()
> instead.

Instead of using an unstable clock and trying to compensate for it, the
solution is not to use unstable clocks. The clocksource stuff does help
with that.. You can either use the tsc until it becomes unstable, or use
the acpi_pm clock which never becomes unstable. Both are used in the
same way.

> init() {
	  // gives you the acpi_pm
	  clock = clocksource_get_masked_clock(CLOCKSOURCE_PM_AFFECTED|CLOCKSOURCE_NOT_ATOMIC);
> 	register a cpu_freq notifier
> }
> 
> trace() {
> 	trace.time =
	  cyc2ns(clock, clocksource_read(clock))
> }
> 


Daniel


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 22:47               ` Daniel Walker
@ 2007-02-17  4:36                 ` Jeff Muizelaar
  0 siblings, 0 replies; 20+ messages in thread
From: Jeff Muizelaar @ 2007-02-17  4:36 UTC (permalink / raw)
  To: Daniel Walker; +Cc: Frank Ch. Eigler, linux-kernel

On Fri, Feb 16, 2007 at 02:47:45PM -0800, Daniel Walker wrote:
> > > Gets pretty ugly .. The clocksource interface already has a positive
> > > rating to describe the "best" clocks in the system, which is used to
> > > return the "best" clock .. Where the maintainers of the system give each
> > > clock a rating. I would imagine most people would just get the so called
> > > "best" clock which has the best rating..
> > > 
> > > I'm starting to think this long flags stringing effect could happen with
> > > negative flags also, but it's seems a lot less likely.
> > 
> > The amount of flag stringing should be the same.
> 
> I don't think so .. The common case with negative flags is no flags,
> then next would be CLOCKSOURCE_UNSTABLE. At most I would guess two
> flags .. The other direction your likely to have people using all flags
> most of the time. That's why I showed a function call with all the flags
> listed.

I think you still misunderstand me. The common case is still no flags.

clocksource_get_clock_must_have(0) would return clocks that are stable
and unstable. clocksource_get_clock_must_have(CLOCKSOURCE_STABLE) would
only return clocks that are stable, just like
clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE) only returns clocks
that are stable.

> > > > instead of
> > > > 
> > > > clocksource_get_clock_masked(CLOCKSOURCE_UNSTABLE)
> > > > clocksource_get_clock_masked(CLOCKSOURCE_PM_AFFECTED)
> > > > 
> > > > Especially awkward is the CLOCKSOURCE_64BIT flag, as there isn't really
> > > > anyway to specify that I want a 64bit timer, only a way to specify that
> > > > I don't.
> > > 
> > > I might add a way to get specific flags, but I still think the flags
> > > should be mostly negative features.
> > 
> > Yeah, the problem is that all of the features are negative except for
> > CLOCKSOURCE_64BIT, so you can't mask for it.
> 
> It's meant as a negative feature. So you can mask it if you can't handle
> the math .. The only 64bit clock I know off is the tsc, and it's got the
> highest rating of all clocks.

Ah ok, I see that now. Maybe CLOCKSOURCE_OVER_32BITS would be a better
name? It might convey the negativity better...

-Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-16 21:26     ` Frank Ch. Eigler
@ 2007-02-17 14:56       ` Andi Kleen
  2007-02-17 15:19         ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Andi Kleen @ 2007-02-17 14:56 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Andi Kleen, Jeff Muizelaar, linux-kernel

> This is one of the reasons why we don't just use good old
> do_gettimeofday(), since it takes locks and can lead to lock recursion
> if parts of itself are probed.

do_gettimeofday doesn't take locks.

Only restriction is that you can't single step it with long 
pauses between instructions.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-17 14:56       ` Andi Kleen
@ 2007-02-17 15:19         ` Thomas Gleixner
  2007-02-18 17:20           ` Andi Kleen
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2007-02-17 15:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Frank Ch. Eigler, Jeff Muizelaar, linux-kernel

On Sat, 2007-02-17 at 15:56 +0100, Andi Kleen wrote:
> > This is one of the reasons why we don't just use good old
> > do_gettimeofday(), since it takes locks and can lead to lock recursion
> > if parts of itself are probed.
> 
> do_gettimeofday doesn't take locks.
> 
> Only restriction is that you can't single step it with long 
> pauses between instructions.

Err, it uses read side of xtime lock, so you can not call it from a
place which write locks xtime lock.

	tglx



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Using sched_clock for mmio-trace
  2007-02-17 15:19         ` Thomas Gleixner
@ 2007-02-18 17:20           ` Andi Kleen
  0 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2007-02-18 17:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Frank Ch. Eigler, Jeff Muizelaar, linux-kernel

On Sat, Feb 17, 2007 at 04:19:58PM +0100, Thomas Gleixner wrote:
> On Sat, 2007-02-17 at 15:56 +0100, Andi Kleen wrote:
> > > This is one of the reasons why we don't just use good old
> > > do_gettimeofday(), since it takes locks and can lead to lock recursion
> > > if parts of itself are probed.
> > 
> > do_gettimeofday doesn't take locks.
> > 
> > Only restriction is that you can't single step it with long 
> > pauses between instructions.
> 
> Err, it uses read side of xtime lock, so you can not call it from a
> place which write locks xtime lock.

Err, you can -- seqlocks never deadlock.

The only thing that doesn't work is to single step with long enough
pauses with interrupts on inbetween that the sequence numbers increase:
you get a livelock then.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-02-18 17:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-16  1:30 Using sched_clock for mmio-trace Jeff Muizelaar
2007-02-16 16:30 ` Frank Ch. Eigler
2007-02-16 17:45   ` Daniel Walker
2007-02-16 18:10     ` Jeff Muizelaar
2007-02-16 18:28       ` Daniel Walker
2007-02-16 19:34         ` Jeff Muizelaar
2007-02-16 21:06           ` Daniel Walker
2007-02-16 22:10             ` Jeff Muizelaar
2007-02-16 22:47               ` Daniel Walker
2007-02-17  4:36                 ` Jeff Muizelaar
2007-02-16 18:30   ` Jeff Muizelaar
2007-02-16 18:44     ` Randy Dunlap
2007-02-16 19:55       ` Jeff Muizelaar
2007-02-16 20:03   ` Andi Kleen
2007-02-16 21:26     ` Frank Ch. Eigler
2007-02-17 14:56       ` Andi Kleen
2007-02-17 15:19         ` Thomas Gleixner
2007-02-18 17:20           ` Andi Kleen
2007-02-16 20:02 ` Andi Kleen
2007-02-16 19:40   ` Jeff Muizelaar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).