LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: Ulrich Drepper <drepper@gmail.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Ulrich Drepper <drepper@redhat.com>,
	Andrew Morton <akpm@osdl.org>, netdev <netdev@vger.kernel.org>,
	Zach Brown <zach.brown@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	Chase Venters <chase.venters@clientec.com>,
	Johann Borck <johann.borck@densedata.com>
Subject: Re: [take19 1/4] kevent: Core files.
Date: Thu, 5 Oct 2006 12:57:50 +0400	[thread overview]
Message-ID: <20061005085750.GA1015@2ka.mipt.ru> (raw)
In-Reply-To: <a36005b50610041057g67dcaf73wd48d9fef88187ec6@mail.gmail.com>

On Wed, Oct 04, 2006 at 10:57:32AM -0700, Ulrich Drepper (drepper@gmail.com) wrote:
> On 10/3/06, Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
> >http://tservice.net.ru/~s0mbre/archive/kevent/evserver_kevent.c
> >http://tservice.net.ru/~s0mbre/archive/kevent/evtest.c
> 
> These are simple programs which by themselves have problems.  For
> instance, I consider a very bad idea to hardcode the size of the ring
> buffer.  Specifying macros in the header file counts as hardcoding.
> Systems grow over time and so will the demand of connections.  I have
> no problem with the kernel hardcoding the value internally (or having
> a /proc entry to select it) but programs should be able to dynamically
> learn about the value so they don't have to be recompiled.

Well, it is possible to create /sys/proc entry for that, and even now 
userspace can grow mapping ring until it is forbiden by kernel, which
means limit is reached.

Actually the whole idea with global limit of kevents does not sound very
good to me, but it is required to remove overflow in mapped buffer.

> But more problematic is that I don't see how the interfaces can be
> efficiently used in multi-threaded (or multi-process) programs.  How
> would multiple threads using the same kevent queue and running in the
> same kevent_get_events() loop work out?  How do they guarantee that
> each request is only handled once?

kqueue_dequeue_ready() is atomic and this function removes kevent from
ready queue so other thread can not get it.

> From what I see now this means a second data structure is needed to
> keep track of the state of each entry.  But even then, how do we even
> recognized used ring buffer entries?
> 
> For instance, assume two threads.  Both call get_events, one event is
> reported, both threads are woken up (which is another thing to
> consider, more later).  One thread uses ring buffer entry, the other
> goes back to sleep in get_events.  Now, how does the kernel know when
> the other thread is done working on the ring buffer entry?  There
> might be lots of entries coming in overflowing the entire buffer.
> Heck, you don't even need two threads for this scenario.

Are you talking about mapped buffer or syscall interface?
The former has special syscall kevent_wait(), which reports number of
'processed' events and first processed number, so kernel can remove all
appropriate events. The latter is described above -
kqueue_dequeue_ready() is atomic, so that event will be removed from the
ready queue and optionally from the whole kevent tree.

It is possible to work with both interfaces at the same time, since
mapped buffer contains a copy of the event, which is potentially freed
and processed by other thread. 

Actually I do not like idea of mapped ring anyway, since if application 
uses a lot of events, it will batch them into big chunks, so syscall 
overhead is negligible, if application uses small number of events, 
syscalls will be rare and will not hurt performance.

> When I was thinking about this (and discussing it in Ottawa) I was
> always assuming that we have a status field in the ring buffer entry
> which lets the userlevel code indicate whether the entry is free again
> or not.  This requires a writable mapping, yes, and potentially causes
> cache line ping-pong.  I think Zach mentioned he has some ideas about
> this.

As far as I can see, there are no other ideas on how to implement ring
buffer, so I did it like I wanted. It has some limitation indeed, but
since I do not see any other code, how can I say what is better or
worse?
 
> As for the multiple thread wakeup, I mentioned this before.  We have
> to avoid the trampling herd problem.  We cannot wakeup all waiters.
> But we also cannot assume that, without protocols, waking up just one
> for each available entry is sufficient.  So the first question is:
> what is the current policy?

It is a good practice to _not_ share the same queue between a lot of
threads. Currently all waiters are awakened.

> >AIO was removed from patchset by request of Cristoph.
> >Timers, network AIO, fs AIO, socket nortifications and poll/select
> >events work well with existing structures.
> 
> Well, excuse me if I don't take your word for it.  I agree, the AIO
> code should not be submitted along with this.  The same for any other
> code using the event handling.  But we need to check whether the
> interface is generic enough to accomodate them in a way which actually
> makes sense.  Again, think highly threaded processes or multiple
> processes sharing the same event queue.

You missed the point.
I implemented _all_ above and it does work.
Although it was removed from submission patchset.
You can find all patches on kevent homepage, they were posted to lkml@
and netdev@ too many times to miss them.
 
> >It is even possible to create variable sized kevents - each kevent
> >contain pointer to user's data, which can be considered as pointer to
> >additional area (it's size kernel implementation for given kevent type
> >can determine from other parameters or use predefined one and fetch
> >additional data in ->enqueue() callback).
> 
> That sounds interesting and certainly helps with securing the
> interface for the future.  But if there is anything we can do to avoid
> unnecessary costs we should do it, even if this means investigation
> all this further.

Ulrich, do _you_ have any ideas on how to change data structures?
Not talks about investigations and the like, but real design which
covers today's and tomorrow's needs?

Existing structures were not taken from the universe, but are result of
quite long thoughts about requests for AIO/epoll and networking
development...

-- 
	Evgeniy Polyakov

  reply	other threads:[~2006-10-05  8:58 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <115a6230591036@2ka.mipt.ru>
2006-09-12  8:41 ` [take18 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-09-12  8:41   ` [take18 1/4] kevent: Core files Evgeniy Polyakov
2006-09-12  8:41     ` [take18 2/4] kevent: poll/select() notifications Evgeniy Polyakov
2006-09-12  8:41       ` [take18 3/4] kevent: Socket notifications Evgeniy Polyakov
2006-09-12  8:41         ` [take18 4/4] kevent: Timer notifications Evgeniy Polyakov
2006-09-20  9:35 ` [take19 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-09-20  9:35   ` [take19 1/4] kevent: Core files Evgeniy Polyakov
2006-09-20  9:35     ` [take19 2/4] kevent: poll/select() notifications Evgeniy Polyakov
2006-09-20  9:35       ` [take19 3/4] kevent: Socket notifications Evgeniy Polyakov
2006-09-20  9:35         ` [take19 4/4] kevent: Timer notifications Evgeniy Polyakov
2006-10-04  6:34     ` [take19 1/4] kevent: Core files Ulrich Drepper
2006-10-04  6:48       ` Evgeniy Polyakov
2006-10-04 17:57         ` Ulrich Drepper
2006-10-05  8:57           ` Evgeniy Polyakov [this message]
2006-10-05  9:56             ` Eric Dumazet
2006-10-05 10:21               ` Evgeniy Polyakov
2006-10-05 10:45                 ` Eric Dumazet
2006-10-05 10:55                   ` Evgeniy Polyakov
2006-10-05 12:09                     ` Eric Dumazet
2006-10-05 12:37                       ` Evgeniy Polyakov
2006-10-15 23:22                         ` Ulrich Drepper
2006-10-16  7:33                           ` Evgeniy Polyakov
2006-10-16 10:16                             ` Ulrich Drepper
2006-10-16 11:23                               ` Evgeniy Polyakov
2006-10-17  5:10                           ` Johann Borck
2006-10-17  5:59                             ` Chase Venters
2006-10-17 10:42                               ` Evgeniy Polyakov
2006-10-17 13:12                                 ` Chase Venters
2006-10-17 13:35                                   ` Evgeniy Polyakov
2006-10-17 10:39                             ` Evgeniy Polyakov
2006-10-17 13:19                               ` Eric Dumazet
2006-10-17 13:42                                 ` Evgeniy Polyakov
2006-10-17 13:52                                   ` Eric Dumazet
2006-10-17 14:07                                     ` Evgeniy Polyakov
2006-10-17 14:25                                       ` Eric Dumazet
2006-10-17 15:09                                         ` Evgeniy Polyakov
2006-10-17 15:32                                           ` Eric Dumazet
2006-10-17 16:01                                             ` Evgeniy Polyakov
2006-10-17 16:26                                               ` Eric Dumazet
2006-10-17 16:35                                                 ` Evgeniy Polyakov
2006-10-17 16:45                                                   ` Eric Dumazet
2006-10-18  4:10                                                     ` Evgeniy Polyakov
2006-10-18  4:45                                                       ` Eric Dumazet
2006-10-17 15:33                                         ` Hans Henrik Happe
2006-10-05 14:01                 ` Hans Henrik Happe
2006-10-05 14:15                   ` Evgeniy Polyakov
2006-10-05 15:07                     ` Hans Henrik Happe
2006-09-22 19:22   ` [take19 0/4] kevent: Generic event handling mechanism Andrew Morton
2006-09-23  4:23     ` Evgeniy Polyakov
2006-10-04  6:09       ` Ulrich Drepper
2006-10-04  6:10         ` Ulrich Drepper
2006-10-04  6:27           ` Evgeniy Polyakov
2006-10-04  6:24         ` Evgeniy Polyakov
2006-09-26 15:54     ` Christoph Hellwig
2006-09-27  4:46       ` Evgeniy Polyakov
2006-09-27 15:09   ` Evgeniy Polyakov
2006-10-04  4:50     ` Ulrich Drepper
2006-10-04  4:55       ` Evgeniy Polyakov
2006-10-04  7:33         ` Ulrich Drepper
2006-10-04  7:48           ` Evgeniy Polyakov
2006-10-04 17:20             ` Ulrich Drepper
2006-10-05  9:02               ` Evgeniy Polyakov
2006-10-05 14:45                 ` Ulrich Drepper
2006-10-06  8:36                   ` Evgeniy Polyakov
2006-10-15 22:43                     ` Ulrich Drepper
2006-10-16  7:23                       ` Evgeniy Polyakov
2006-10-16  9:59                         ` Ulrich Drepper
2006-10-16 10:38                           ` Evgeniy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061005085750.GA1015@2ka.mipt.ru \
    --to=johnpol@2ka.mipt.ru \
    --cc=akpm@osdl.org \
    --cc=chase.venters@clientec.com \
    --cc=davem@davemloft.net \
    --cc=drepper@gmail.com \
    --cc=drepper@redhat.com \
    --cc=hch@infradead.org \
    --cc=johann.borck@densedata.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=zach.brown@oracle.com \
    --subject='Re: [take19 1/4] kevent: Core files.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).