LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Mikael Pettersson <mikpe@csd.uu.se>
To: hch@infradead.org
Cc: akpm@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH][1/7] perfctr-2.7.2 for 2.6.6-mm2: core
Date: Sat, 15 May 2004 16:39:10 +0200 (MEST)	[thread overview]
Message-ID: <200405151439.i4FEdAQq001360@harpo.it.uu.se> (raw)

On Fri, 14 May 2004 15:40:05 +0100, Christoph Hellwig wrote:
>  And even without that it's a really horrible
>interface.  Any chance to do a proper fs-based interface ala oprofile?

I object to "proper" and "fs-based". If that was universally
true, then we'd only have 3 file-system related system
calls (open, read, write) and _everything_ else would
be expressed using those on various special fs:s.

Several 64-bit archs already have low-level performance
counter interfaces (non-fs based I might add) in the kernel.
This interface is no worse than those.

The per-process counters API needs to express:
- Open the perfctr state belonging to a given process
  (a real kernel process, not that process group thing)
  returning a handle (file descriptor).
  The fd is used with mmap() for low-overhead sampling,
  and as an access-rights token in the other operations.
- Alternatively, create the state and return a handle.
- Unlink the state represented by a handle from the
  process it is attached to.
- Update the state's control data. This involves CPU-specific
  data, the signal you want on overflow interrupts, and a
  mask indicating which counter sums you want to preserve
  (otherwise they're reset).
- Resume the counters. The counters are temporarily suspended
  when the overflow signal handler is invoked; the handler
  uses this operation to tell the kernel to resume the counters.
  The handler can of course choose not to do this.
- Read the counter sums from the state. This is used when
  user-space can't or doesn't want to use the mmap()ed state.
  (Old P5 and Winchip processors must do this.)
- Read the control data from the state. Used e.g. when
  the counters are accessed from a different process.

The global-mode counters API needs to express:
- Stop all counters on all CPUs.
- Write control data to a given CPU.
- Start the counters, with a given sampling interval.
- Read the control data and counter sums from a given CPU.

The CPU-specific control data needs to express:
- Which CPU-counter to map a given counter to. This is
  rarely a 1-to-1 mapping because processors tend to have
  asymmetric counters, and sometimes a large set in which
  only a few are to be used.
  User-space needs to be in charge of this mapping. This
  is NOT something the kernel should be doing behind the
  user's back, precisely because HW isn't symmetric.
  This mapping also affects the user-space sampling code.
- The per-counter control data to associate with a given counter.
  The amount of this varies considerably.
- The global control data shared by all counters.
  The amount of this varies considerably.
- The initial and restart values for interrupt-on-overflow
  counters.
- Whether to also sample the CPU's clock-like counter.

Doing all of this via file-system operations would either
require a big hierarchy of directories and files, or a smaller
hierarchy plus parsers for written textual data.

Passing struct:s works, except for binary compatibility
issues. (And since the structures must be updated to match
newer CPUs, these issues are very real.) 

>Haven't looked over much of the code yet, but the people who support
>32bit userspace on 64bit architectures will probably kill you for
>the multiplexer syscall.

The previous ioctl()-based perfctr-2.6 version supports i386
binaries on x86_64 kernels, as should this syscall() version.

Key to this is the structure marshalling code which does
several things:
- allows the kernel to add fields (e.g. for new processors)
  without affecting older user-space code
- allows user-space code to work on an older kernel whose
  structures have fewer fields (supports fewer processors),
  as long as user-space does CPU type detection and doesn't
  attempt to use e.g. P4-only fields on a P6 or K7

The pass-binary-structures-via-marshalling approach works,
but I admit it is uncommon. Converting to a pseudo-fs
interface will require a substantial amount of work and code.
Of course, I will do that if I have no choice...

/Mikael

             reply	other threads:[~2004-05-15 14:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-15 14:39 Mikael Pettersson [this message]
  -- strict thread matches above, loose matches on Subject: below --
2004-05-16 11:52 Mikael Pettersson
2004-05-15 14:40 Mikael Pettersson
2004-05-16  5:39 ` Andrew Morton
2004-05-14 14:09 Mikael Pettersson
2004-05-14 14:40 ` Christoph Hellwig
2004-05-14 22:59 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200405151439.i4FEdAQq001360@harpo.it.uu.se \
    --to=mikpe@csd.uu.se \
    --cc=akpm@osdl.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: [PATCH][1/7] perfctr-2.7.2 for 2.6.6-mm2: core' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).