LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Ulrich Drepper <drepper@gmail.com>,
lkml <linux-kernel@vger.kernel.org>,
David Miller <davem@davemloft.net>,
Ulrich Drepper <drepper@redhat.com>,
Andrew Morton <akpm@osdl.org>, netdev <netdev@vger.kernel.org>,
Zach Brown <zach.brown@oracle.com>,
Christoph Hellwig <hch@infradead.org>,
Chase Venters <chase.venters@clientec.com>,
Johann Borck <johann.borck@densedata.com>
Subject: Re: [take19 1/4] kevent: Core files.
Date: Thu, 5 Oct 2006 12:45:03 +0200 [thread overview]
Message-ID: <200610051245.03880.dada1@cosmosbay.com> (raw)
In-Reply-To: <20061005102106.GE1015@2ka.mipt.ru>
On Thursday 05 October 2006 12:21, Evgeniy Polyakov wrote:
> On Thu, Oct 05, 2006 at 11:56:24AM +0200, Eric Dumazet (dada1@cosmosbay.com)
> > I may be wrong, but what is currently missing for me is :
> >
> > - No hardcoded limit on the max number of events. (A process that can
> > open XXX.XXX files should be allowed to open a kevent queue with at least
> > XXX.XXX events). Right now thats not clear what happens IF the current
> > limit is reached.
>
> This forces to overflows in fixed sized memory mapped buffer.
> If we remove memory mapped buffer or will allow to have overflows (and
> thus skipped entries) keven can easily scale to that limits (tested with
> xx.xxx events though).
What is missing or not obvious is : If events are skipped because of
overflows, What happens ? Connections stuck forever ? Hope that everything
will restore itself ? Is kernel able to SIGNAL this problem to user land ?
>
> > - In order to avoid touching the whole ring buffer, it might be good to
> > be able to reset the indexes to the beginning when ring buffer is empty.
> > (So if the user land is responsive enough to consume events, only first
> > pages of the mapping would be used : that saves L1/L2 cpu caches)
>
> And what happens when there are 3 empty at the beginning and \we need to
> put there 4 ready events?
Re-read what I said : when ring buffer is empty.
When ring buffer is empty, kernel can reset index right before adding XX new
events. You read 3 events consumed, I said : When all ring buffer is empty,
because all previous events were consumed by user land, then we can reset
indexes to 0.
>
> > A plus would be
> >
> > - A working/usable mmap ring buffer implementation, but I think its not
> > mandatory. System calls are not that expensive, especially if you can
> > batch XX events per syscall (like epoll). Nice thing with a ring buffer
> > is that we touch less cache lines than say epoll that have lot of linked
> > structures.
> >
> > About mmap, I think you might want a hybrid thing :
> >
> > One writable page where userland can write its index, (and hold one or
> > more futex shared by kernel) (with appropriate thread locking in case
> > multiple threads want to dequeue events). In fast path, no syscalls are
> > needed to maintain this user index.
> >
> > XXX readonly pages (for user, but r/w for kernel), where kernel write its
> > own index, and events of course.
>
> The problem is in that xxx pages - how many can we eat per kevent
> descriptor? It is pinned memory and thus it is possible to have a DoS.
> If xxx above is not enough to store all events, we will have
> yet-another-broken behaviour like rt-signal queue overflow.
>
Re-read : I have a process that has the right to open XXX.XXX handles,
allocating XXX.XXX tcp sockets, dentries, files structures, inodes, epoll
events, its obviously already a DOS risk, but controled by 'ulimit -n'
Allocating XXX.XXX * (32 or 64) bytes is a win if I can zap epoll structures
(currently more than 256 bytes per event)
epoll structures are pinned too... what's wrong with that ?
# egrep "filp|poll|TCP|dentries|sock_inode" /proc/slabinfo |cut -c1-50
tw_sock_TCP 1302 2200 192 20 1 :
request_sock_TCP 2046 4260 128 30 1 :
TCP 151509 196910 1472 5 2 :
eventpoll_pwq 146718 199439 72 53 1 :
eventpoll_epi 146718 199360 192 20 1 :
sock_inode_cache 149182 197940 640 6 1 :
filp 149537 202515 256 15 1 :
If you want to protect from DOS, just use ulimit -n 100
Eric
next prev parent reply other threads:[~2006-10-05 10:45 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <115a6230591036@2ka.mipt.ru>
2006-09-12 8:41 ` [take18 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-09-12 8:41 ` [take18 1/4] kevent: Core files Evgeniy Polyakov
2006-09-12 8:41 ` [take18 2/4] kevent: poll/select() notifications Evgeniy Polyakov
2006-09-12 8:41 ` [take18 3/4] kevent: Socket notifications Evgeniy Polyakov
2006-09-12 8:41 ` [take18 4/4] kevent: Timer notifications Evgeniy Polyakov
2006-09-20 9:35 ` [take19 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-09-20 9:35 ` [take19 1/4] kevent: Core files Evgeniy Polyakov
2006-09-20 9:35 ` [take19 2/4] kevent: poll/select() notifications Evgeniy Polyakov
2006-09-20 9:35 ` [take19 3/4] kevent: Socket notifications Evgeniy Polyakov
2006-09-20 9:35 ` [take19 4/4] kevent: Timer notifications Evgeniy Polyakov
2006-10-04 6:34 ` [take19 1/4] kevent: Core files Ulrich Drepper
2006-10-04 6:48 ` Evgeniy Polyakov
2006-10-04 17:57 ` Ulrich Drepper
2006-10-05 8:57 ` Evgeniy Polyakov
2006-10-05 9:56 ` Eric Dumazet
2006-10-05 10:21 ` Evgeniy Polyakov
2006-10-05 10:45 ` Eric Dumazet [this message]
2006-10-05 10:55 ` Evgeniy Polyakov
2006-10-05 12:09 ` Eric Dumazet
2006-10-05 12:37 ` Evgeniy Polyakov
2006-10-15 23:22 ` Ulrich Drepper
2006-10-16 7:33 ` Evgeniy Polyakov
2006-10-16 10:16 ` Ulrich Drepper
2006-10-16 11:23 ` Evgeniy Polyakov
2006-10-17 5:10 ` Johann Borck
2006-10-17 5:59 ` Chase Venters
2006-10-17 10:42 ` Evgeniy Polyakov
2006-10-17 13:12 ` Chase Venters
2006-10-17 13:35 ` Evgeniy Polyakov
2006-10-17 10:39 ` Evgeniy Polyakov
2006-10-17 13:19 ` Eric Dumazet
2006-10-17 13:42 ` Evgeniy Polyakov
2006-10-17 13:52 ` Eric Dumazet
2006-10-17 14:07 ` Evgeniy Polyakov
2006-10-17 14:25 ` Eric Dumazet
2006-10-17 15:09 ` Evgeniy Polyakov
2006-10-17 15:32 ` Eric Dumazet
2006-10-17 16:01 ` Evgeniy Polyakov
2006-10-17 16:26 ` Eric Dumazet
2006-10-17 16:35 ` Evgeniy Polyakov
2006-10-17 16:45 ` Eric Dumazet
2006-10-18 4:10 ` Evgeniy Polyakov
2006-10-18 4:45 ` Eric Dumazet
2006-10-17 15:33 ` Hans Henrik Happe
2006-10-05 14:01 ` Hans Henrik Happe
2006-10-05 14:15 ` Evgeniy Polyakov
2006-10-05 15:07 ` Hans Henrik Happe
2006-09-22 19:22 ` [take19 0/4] kevent: Generic event handling mechanism Andrew Morton
2006-09-23 4:23 ` Evgeniy Polyakov
2006-10-04 6:09 ` Ulrich Drepper
2006-10-04 6:10 ` Ulrich Drepper
2006-10-04 6:27 ` Evgeniy Polyakov
2006-10-04 6:24 ` Evgeniy Polyakov
2006-09-26 15:54 ` Christoph Hellwig
2006-09-27 4:46 ` Evgeniy Polyakov
2006-09-27 15:09 ` Evgeniy Polyakov
2006-10-04 4:50 ` Ulrich Drepper
2006-10-04 4:55 ` Evgeniy Polyakov
2006-10-04 7:33 ` Ulrich Drepper
2006-10-04 7:48 ` Evgeniy Polyakov
2006-10-04 17:20 ` Ulrich Drepper
2006-10-05 9:02 ` Evgeniy Polyakov
2006-10-05 14:45 ` Ulrich Drepper
2006-10-06 8:36 ` Evgeniy Polyakov
2006-10-15 22:43 ` Ulrich Drepper
2006-10-16 7:23 ` Evgeniy Polyakov
2006-10-16 9:59 ` Ulrich Drepper
2006-10-16 10:38 ` Evgeniy Polyakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200610051245.03880.dada1@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@osdl.org \
--cc=chase.venters@clientec.com \
--cc=davem@davemloft.net \
--cc=drepper@gmail.com \
--cc=drepper@redhat.com \
--cc=hch@infradead.org \
--cc=johann.borck@densedata.com \
--cc=johnpol@2ka.mipt.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=zach.brown@oracle.com \
--subject='Re: [take19 1/4] kevent: Core files.' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).