LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: =?X-UNKNOWN?Q?Chris_=22=A5=AF=22_Heath?= <chris@heathens.co.nz>,
	"David Schwartz" <davids@webmaster.com>,
	dada1@cosmosbay.com,
	"Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
	linux-man@vger.kernel.org
Subject: Re: epoll design problems with common fork/exec patterns
Date: Fri, 29 Feb 2008 11:19:37 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0802291111320.31028@alien.or.mcafeemobile.com> (raw)
In-Reply-To: <cfd18e0f0802290746p3cb7efc9j72394cd77ff37829@mail.gmail.com>

On Fri, 29 Feb 2008, Michael Kerrisk wrote:

> As I think is clear, I've only given it very limited thought ;-).
> 
> The point is that the existing implementation actually supports
> "different *processes* sharing a single epoll fd and doing
> epoll_wait() over it", but the semantics are unintuitive.  It may be
> that the existing implementation was the best way of doing things.
> But when I see the strange corner cases in the semantics, I can't help
> but wonder (way too late), whether there might have been some other
> way of implementing things that led to more intuitive semantics.

Oh boy. The fact that you can have an epoll fd cross the fork boundary, 
does not mean that any indiscriminate use of it leads to sane results:

	efd = epoll_create();
	fork();
					pipe(fds);
					epoll_ctl(efd, ADD, fds[0]);
	epoll_wait(); ????
	...
	pipe(fds);
	epoll_ctl(efd, ADD, fds[0]);
					epoll_wait(); ????


It is *NOT* a matter of semantics.



> >  If the next question is "But then why we made the epoll fd inheritable?",
> >  the answer is, because it makes sense in many cases for a parent to hand
> >  over an fd set to a child.
> 
> Fair enough.
> 
> So here's an idea about how things might alternatively have been done:
> 
> a) The key for epoll entries could have been [file *, fd, PID]
> 
> b) an epoll_wait() only returns events for fds where the PID maps that
> of the caller.
> 
> c) a close of a file descriptor removes the corresponding  [file *,
> fd, PID] from the epoll set.
> 
> d) when a fork() is done, then the epoll set has a new set of keys
> added.  These are duplicates of the  [file *, fd, PID] entries for the
> parent, but with the PID of the child substituted into the new keys.
> Say the parent had PID 1000, and the child has PID 2000.  If the epoll
> set initially contained:
> 
> [X, 3, 1000]
> [Y, 4, 1000]
> 
> then after fork() we'd have:
> 
> [X, 3, 1000]
> [Y, 4, 1000]
> [X, 3, 2000]
> [Y, 4, 2000]
> 
> There is of course room for debate about the efficiency of this
> approach, I suppose.

There sure is :)



> You said elsewhere:
> 
> [[
> That'd mean placing an eventpoll custom hook into sys_close(). Looks very
> bad to me, and probably will look even worse to other kernel folks.
> Is not much a performance issue (a check to see if a file* is an eventpoll
> file is as easy as comparing the f_op pointer), but a design/style issue.
> ]]
> 
> But that wasn't very clear to me actually.  I note that filp_close()
> already has special case handling for dnotify (R.I.P.) and fcntl()
> )aka POSIX) file locks, so there was already precedent for a custom
> hook, AFAICS, and epoll is at least as worthy of special treatment as
> either of those cases.

I guess that over the time, Al became software WRT junk going there :)



- Davide



  reply	other threads:[~2008-02-29 19:19 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-27  6:22 Marc Lehmann
2007-10-27  8:23 ` Eric Dumazet
2007-10-27  8:51   ` Marc Lehmann
2007-10-27  9:22     ` Eric Dumazet
2007-10-27  9:34       ` Marc Lehmann
2007-10-27 10:23         ` Eric Dumazet
2007-10-27 10:46           ` Marc Lehmann
2007-10-27 16:59     ` Davide Libenzi
2007-10-27 17:38       ` Willy Tarreau
2007-10-27 18:01         ` Davide Libenzi
2007-10-29 22:36         ` Mark Lord
2007-10-28  4:47       ` David Schwartz
2007-10-28  9:33         ` Eric Dumazet
2007-10-28 21:04           ` David Schwartz
2007-10-29 18:55             ` Davide Libenzi
2008-02-26 15:13               ` Michael Kerrisk
2008-02-26 18:51                 ` Davide Libenzi
2008-02-27  1:30                   ` Chris "ク" Heath
2008-02-27 19:35                     ` Davide Libenzi
2008-02-28 13:12                       ` Michael Kerrisk
2008-02-28 13:23                         ` Michael Kerrisk
2008-02-28 19:34                           ` Davide Libenzi
2008-02-28 19:23                         ` Davide Libenzi
2008-02-29 15:46                           ` Michael Kerrisk
2008-02-29 19:19                             ` Davide Libenzi [this message]
2008-02-29 19:54                               ` Michael Kerrisk
2008-03-02 15:11                                 ` Sam Varshavchik
2008-03-02 21:44                                   ` Davide Libenzi
2007-10-28 18:48         ` Davide Libenzi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0802291111320.31028@alien.or.mcafeemobile.com \
    --to=davidel@xmailserver.org \
    --cc=chris@heathens.co.nz \
    --cc=dada1@cosmosbay.com \
    --cc=davids@webmaster.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=mtk.manpages@googlemail.com \
    --subject='Re: epoll design problems with common fork/exec patterns' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).