LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* unexpected extra pollout events from epoll
@ 2008-10-26 14:42 Paul P
  2008-10-26 22:07 ` Davide Libenzi
  0 siblings, 1 reply; 10+ messages in thread
From: Paul P @ 2008-10-26 14:42 UTC (permalink / raw)
  To: linux-kernel

I am programming a server using the epoll interface and have the receive portion of the server working fine, but for some reason as I implement the send portion, I noticed a few things that seem like strange behaviors in the implementation of epoll in the kernel.

I'm running Opensuse 11 and it has a 2.6.25 kernel.

The behavior that I can seeing is when I do a full read on an edge triggered fd, for some reason, it seems to be triggering an epollout event after each loop of the read events on a socket. (before I've done any writes at all to the socket)

This is very strange behavior as I would expect that the epollout event would only be triggered if I did a write and the socket recieved an ack which cleared out the send buffer.

The documentation on epollout is really sparse, so any help at all from the list would be very much appreciated.  Do I need to manually arm the epollout flag after a write?  I thought this was only necessary for level triggered epoll.

I was hoping someone more knowledgeable on the subject here might be able to help explain the epollout behavior and whether or not the extra events are normal and if so, what is the traditional way to handle these extra events in an edge triggered scenario.

Thanks!

Paul


      

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unexpected extra pollout events from epoll
  2008-10-26 14:42 unexpected extra pollout events from epoll Paul P
@ 2008-10-26 22:07 ` Davide Libenzi
  2008-10-26 22:48   ` Paul P
  0 siblings, 1 reply; 10+ messages in thread
From: Davide Libenzi @ 2008-10-26 22:07 UTC (permalink / raw)
  To: Paul P; +Cc: Linux Kernel Mailing List

On Sun, 26 Oct 2008, Paul P wrote:

> I am programming a server using the epoll interface and have the receive portion of the server working fine, but for some reason as I implement the send portion, I noticed a few things that seem like strange behaviors in the implementation of epoll in the kernel.
> 
> I'm running Opensuse 11 and it has a 2.6.25 kernel.
> 
> The behavior that I can seeing is when I do a full read on an edge 
> triggered fd, for some reason, it seems to be triggering an epollout 
> event after each loop of the read events on a socket. (before I've done 
> any writes at all to the socket)
> 
> This is very strange behavior as I would expect that the epollout event 
> would only be triggered if I did a write and the socket recieved an ack 
> which cleared out the send buffer.
> 
> The documentation on epollout is really sparse, so any help at all from 
> the list would be very much appreciated.  Do I need to manually arm the 
> epollout flag after a write?  I thought this was only necessary for 
> level triggered epoll.

The way epoll works, is by hooking into the existing kernel poll 
subsystem. It hooks into the poll wakeups, via callback, and it that way 
it knows that "something" is changed. Then it reads the status of a file 
via f_op->poll() to know the status.
What happens is that, if you listen for EPOLLIN|EPOLLOUT, when a packet 
arrives the callback hook is hit, and the file is put into a maybe-ready 
list. Maybe-ready because at the time of the callback, epoll has no clue 
of what happened.
After that, via epoll_wait(), f_op->poll() is called to get the status of 
the file, and since POLLIN|POLLOUT is returned (and since you're listening 
for EPOLLIN|EPOLLOUT), that gets reported back to you.
The POLLOUT event, by meaning a buffer-full->buffer-avail transition, did 
not really happen, but since POLLOUT is true, that gets reported back too.
This, again, since epoll has no clue of what happened at callback hit time.
I'm working on changes that will make epoll aware (by using the existing 
support for the "key" parameter of wakeups) of events at callback time, 
but this is something that is still up for discussion and definitely won't 
be in .28.
The best way to do it ATM, is to wait for POLLOUT only when really needed.




- Davide



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unexpected extra pollout events from epoll
  2008-10-26 22:07 ` Davide Libenzi
@ 2008-10-26 22:48   ` Paul P
  2008-10-26 23:12     ` Davide Libenzi
  0 siblings, 1 reply; 10+ messages in thread
From: Paul P @ 2008-10-26 22:48 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

> After that, via epoll_wait(), f_op->poll() is called to get the status of  the file, and since POLLIN|POLLOUT is returned (and since you're listening for EPOLLIN|EPOLLOUT), that gets reported back to you. The POLLOUT event, by meaning a buffer-full->buffer-avail transition, did not really happen, but since POLLOUT is true, that gets reported back too.

Ok, so make sure I understand you correctly, you're saying that currently the kernel doesn't have awareness of the difference between EPOLLIN and EPOLLOUT events because at the time of the event, both EPOLLIN/EPOLLOUT are returned from the kernel and that at least for the near term that's not going to change.  At some point, we can expect the EPOLLOUT to give the correct event, but not till later than .28.

> The best way to do it ATM, is to wait for POLLOUT only when
> really needed.

I'm a little unclear how to do this.  If I set the epoll_wait call to wait for just epollin events, that's fine.  But when I send a large buffer of data and use epoll_ctl to look for epollin|epollout events, don't I have the same problem?  

Let's say I'm sending a large buffer of data and I arm the fd to epollin|epollout (I'm adding an epollin flag because a message could come in while I'm sending)

If an event gets triggered on an fd, then I have no way of knowing if the event is from the socket being available to send data or if there is data waiting to be received since the epollin|epollout flag could be either one.  So what am I to do when I get an event?

Are you saying that I can't do sending and receiving simultaneously with epoll?  If that's the case, then is everyone simply setting the epollout flag when sending and ignoring the possibility of data coming in while data is being sent?

I didn't want to have to manually set fd's with epoll_ctl, but now I guess the epoll_one_shot flag makes more sense.  

Paul


      

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unexpected extra pollout events from epoll
  2008-10-26 22:48   ` Paul P
@ 2008-10-26 23:12     ` Davide Libenzi
  0 siblings, 0 replies; 10+ messages in thread
From: Davide Libenzi @ 2008-10-26 23:12 UTC (permalink / raw)
  To: Paul P; +Cc: Linux Kernel Mailing List

[Can you try to trim lines at 80 chars or so?]


On Sun, 26 Oct 2008, Paul P wrote:

> > After that, via epoll_wait(), f_op->poll() is called to get the status 
> > of  the file, and since POLLIN|POLLOUT is returned (and since you're 
> > listening for EPOLLIN|EPOLLOUT), that gets reported back to you. The 
> > POLLOUT event, by meaning a buffer-full->buffer-avail transition, did 
> > not really happen, but since POLLOUT is true, that gets reported back 
> > too.
> 
> Ok, so make sure I understand you correctly, you're saying that 
> currently the kernel doesn't have awareness of the difference between 
> EPOLLIN and EPOLLOUT events because at the time of the event, both 
> EPOLLIN/EPOLLOUT are returned from the kernel and that at least for the 
> near term that's not going to change.  At some point, we can expect the 
> EPOLLOUT to give the correct event, but not till later than .28.

The kernel knows the difference between EPOLLIN and EPOLLOUT, of course. 
At the moment though, such condition is not reported during wakeups, and 
this is what is going to be changing.



> > The best way to do it ATM, is to wait for POLLOUT only when
> > really needed.
> 
> I'm a little unclear how to do this.  If I set the epoll_wait call to 
> wait for just epollin events, that's fine.  But when I send a large 
> buffer of data and use epoll_ctl to look for epollin|epollout events, 
> don't I have the same problem?  

You do that by writing data until it's finished, or you get EAGAIN. If you 
get EAGAIN, you listen for EPOLLOUT.
Reading is same, but you'd wait for EPOLLIN.



- Davide



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unexpected extra pollout events from epoll
       [not found] ` <fa.KeDEgbYh8k5LzzH6uv7u00N5twU@ifi.uio.no>
@ 2008-10-27  5:59   ` Robert Hancock
  0 siblings, 0 replies; 10+ messages in thread
From: Robert Hancock @ 2008-10-27  5:59 UTC (permalink / raw)
  To: ppak_98; +Cc: Davide Libenzi, Linux Kernel Mailing List

Paul P wrote:
>> Which version of epoll do you have? The epoll_wait()
>> function does not 
>> accept an event mask (like you write above,
>> EPOLLIN|EPOLLOUT). 
> 
> lol, I was a bit tired when I wrote that.  Ok, ignore the stuff related 
> to epoll_wait in my previous post.
>  
>> As optimization, if the EPOLLOUT bit is already set, you
>> don't need to 
>> keep calling epoll_ctl(fd,MOD,EPOLLOUT).
> 
> This is good to know.
> 
> So, I've got a few questions about what happens to data that accumulates 
> while I am sending and the fd is set to EPOLLOUT?  If I am send out a 
> large buffer and incoming data wants to stream in on a full duplex 
> connection, what happens to that data when I am processing the socket 
> while it is in epollout mode?  
> 
> Is the following accurate?  When data comes in while I am sending, I guess 
> the data fills up the receive buffers until they are full and then it 
> stops accepting data until it is cleared out?  When I switch back to 
> EPOLLIN, I'm guessing that I will get a notification on that fd that there 
> is data waiting.
> 
> The other question I have is there a way to do full-duplex networking so 
> that I can receive network messages while I am sending or vice versa?  It 
> seems that the method of switching the socket between EPOLLIN and EPOLLOUT 
> means that I can't do both operations simultaneously.  Thanks

I don't quite follow. You shouldn't be switching back and forth if 
you're trying to both send and receive, you can be registered for both 
notifications at the same time and respond to whatever notifications 
that you get. If you're not trying to write anything at the moment then 
you shouldn't be registered for EPOLLOUT though, same for reading and 
EPOLLIN.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unexpected extra pollout events from epoll
  2008-10-27  1:18 ` Davide Libenzi
  2008-10-27  1:23   ` Davide Libenzi
@ 2008-10-27  3:48   ` Paul P
  1 sibling, 0 replies; 10+ messages in thread
From: Paul P @ 2008-10-27  3:48 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List

> Which version of epoll do you have? The epoll_wait()
> function does not 
> accept an event mask (like you write above,
> EPOLLIN|EPOLLOUT). 

lol, I was a bit tired when I wrote that.  Ok, ignore the stuff related 
to epoll_wait in my previous post.
 
> As optimization, if the EPOLLOUT bit is already set, you
> don't need to 
> keep calling epoll_ctl(fd,MOD,EPOLLOUT).

This is good to know.

So, I've got a few questions about what happens to data that accumulates 
while I am sending and the fd is set to EPOLLOUT?  If I am send out a 
large buffer and incoming data wants to stream in on a full duplex 
connection, what happens to that data when I am processing the socket 
while it is in epollout mode?  

Is the following accurate?  When data comes in while I am sending, I guess 
the data fills up the receive buffers until they are full and then it 
stops accepting data until it is cleared out?  When I switch back to 
EPOLLIN, I'm guessing that I will get a notification on that fd that there 
is data waiting.

The other question I have is there a way to do full-duplex networking so 
that I can receive network messages while I am sending or vice versa?  It 
seems that the method of switching the socket between EPOLLIN and EPOLLOUT 
means that I can't do both operations simultaneously.  Thanks

Paul


      

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unexpected extra pollout events from epoll
  2008-10-27  1:18 ` Davide Libenzi
@ 2008-10-27  1:23   ` Davide Libenzi
  2008-10-27  3:48   ` Paul P
  1 sibling, 0 replies; 10+ messages in thread
From: Davide Libenzi @ 2008-10-27  1:23 UTC (permalink / raw)
  To: Paul P; +Cc: Linux Kernel Mailing List

On Sun, 26 Oct 2008, Davide Libenzi wrote:

> On Sun, 26 Oct 2008, Paul P wrote:
> 
> > However, I get strange behavior when I tried adding fd's with only the 
> > EPOLLIN interest mask. If I use epoll_wait with both the EPOLLIN and 
> > EPOLLOUT interest mask, but add fd's with only the EPOLLIN interest mask,
> > I still seem to get EPOLLOUT events on the fd.
> 
> Again, how the heck do you "use epoll_wait with both the EPOLLIN and 
> EPOLLOUT"?!? There is not such a thing.

Wait? It's not that you pass EPOLLIN or EPOLLOUT to the "maxevents" 
parameter of epoll_wait()?
That's the maximum event count you want to fetch, not an event mask.



- Davide



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unexpected extra pollout events from epoll
  2008-10-27  0:58 Paul P
@ 2008-10-27  1:18 ` Davide Libenzi
  2008-10-27  1:23   ` Davide Libenzi
  2008-10-27  3:48   ` Paul P
  0 siblings, 2 replies; 10+ messages in thread
From: Davide Libenzi @ 2008-10-27  1:18 UTC (permalink / raw)
  To: Paul P; +Cc: Linux Kernel Mailing List

On Sun, 26 Oct 2008, Paul P wrote:

> > You do that by writing data until it's finished, or you
> > get EAGAIN. If you
> > get EAGAIN, you listen for EPOLLOUT.
> > Reading is same, but you'd wait for EPOLLIN.
> 
> I've got a few questions about this approach.  The most logical 
> way to do this seems to be:
> 
> 1) Leave the epoll_wait with the EPOLLIN|EPOLLOUT event flags and
>  use epoll_ctl to switch the interest mask for each fd between EPOLLIN 
> and EPOLLOUT on a per fd basis.

Which version of epoll do you have? The epoll_wait() function does not 
accept an event mask (like you write above, EPOLLIN|EPOLLOUT). It never 
had.
But yes, you'd switch interest with epoll_ctl().



> 2) When I'm ready to write, I do a write and if it does not fully 
> write and I get the EAGAIN flag, I switch the fd with epoll_ctl(fd,MOD,EPOLLOUT). 

As optimization, if the EPOLLOUT bit is already set, you don't need to 
keep calling epoll_ctl(fd,MOD,EPOLLOUT).



> However, I get strange behavior when I tried adding fd's with only the 
> EPOLLIN interest mask. If I use epoll_wait with both the EPOLLIN and 
> EPOLLOUT interest mask, but add fd's with only the EPOLLIN interest mask,
> I still seem to get EPOLLOUT events on the fd.

Again, how the heck do you "use epoll_wait with both the EPOLLIN and 
EPOLLOUT"?!? There is not such a thing.




> So, I'm a little confused.

>From the wording above, that doesn't seem like a wrong guess.



- Davide



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unexpected extra pollout events from epoll
@ 2008-10-27  0:58 Paul P
  2008-10-27  1:18 ` Davide Libenzi
  0 siblings, 1 reply; 10+ messages in thread
From: Paul P @ 2008-10-27  0:58 UTC (permalink / raw)
  To: linux-kernel

> You do that by writing data until it's finished, or you
> get EAGAIN. If you
> get EAGAIN, you listen for EPOLLOUT.
> Reading is same, but you'd wait for EPOLLIN.

I've got a few questions about this approach.  The most logical 
way to do this seems to be:

1) Leave the epoll_wait with the EPOLLIN|EPOLLOUT event flags and
 use epoll_ctl to switch the interest mask for each fd between EPOLLIN 
and EPOLLOUT on a per fd basis.

2) When I'm ready to write, I do a write and if it does not fully 
write and I get the EAGAIN flag, I switch the fd with epoll_ctl(fd,MOD,EPOLLOUT). 

However, I get strange behavior when I tried adding fd's with only the 
EPOLLIN interest mask. If I use epoll_wait with both the EPOLLIN and 
EPOLLOUT interest mask, but add fd's with only the EPOLLIN interest mask,
I still seem to get EPOLLOUT events on the fd.

Am I supposed to change the main loop with epoll_wait so that when one 
socket is reading that I switch the main loop to get EPOLLOUT events?  
That means that I'm not receiving on any fd while I'm sending, so this 
probably isn't right.

So, I'm a little confused.

Thanks in advance.

Paul



      

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: unexpected extra pollout events from epoll
       [not found] <fa.YTdIGxaBsWvyaUnxBGIS1f8F2BM@ifi.uio.no>
@ 2008-10-26 16:43 ` Robert Hancock
  0 siblings, 0 replies; 10+ messages in thread
From: Robert Hancock @ 2008-10-26 16:43 UTC (permalink / raw)
  To: ppak_98; +Cc: linux-kernel

Paul P wrote:
> I am programming a server using the epoll interface and have the receive portion of the server working fine, but for some reason as I implement the send portion, I noticed a few things that seem like strange behaviors in the implementation of epoll in the kernel.
> 
> I'm running Opensuse 11 and it has a 2.6.25 kernel.
> 
> The behavior that I can seeing is when I do a full read on an edge triggered fd, for some reason, it seems to be triggering an epollout event after each loop of the read events on a socket. (before I've done any writes at all to the socket)
> 
> This is very strange behavior as I would expect that the epollout event would only be triggered if I did a write and the socket recieved an ack which cleared out the send buffer.
> 
> The documentation on epollout is really sparse, so any help at all from the list would be very much appreciated.  Do I need to manually arm the epollout flag after a write?  I thought this was only necessary for level triggered epoll.
> 
> I was hoping someone more knowledgeable on the subject here might be able to help explain the epollout behavior and whether or not the extra events are normal and if so, what is the traditional way to handle these extra events in an edge triggered scenario.

I'm not too familiar with the edge triggered mode, but you shouldn't be 
requesting EPOLLOUT notifications if you don't care about them (i.e. if 
you are not trying to write anything).

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-10-27  5:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-26 14:42 unexpected extra pollout events from epoll Paul P
2008-10-26 22:07 ` Davide Libenzi
2008-10-26 22:48   ` Paul P
2008-10-26 23:12     ` Davide Libenzi
     [not found] <fa.YTdIGxaBsWvyaUnxBGIS1f8F2BM@ifi.uio.no>
2008-10-26 16:43 ` Robert Hancock
2008-10-27  0:58 Paul P
2008-10-27  1:18 ` Davide Libenzi
2008-10-27  1:23   ` Davide Libenzi
2008-10-27  3:48   ` Paul P
     [not found] <fa.iE6LeMsZ2b+Y7nnAUJEFSvwEiiU@ifi.uio.no>
     [not found] ` <fa.KeDEgbYh8k5LzzH6uv7u00N5twU@ifi.uio.no>
2008-10-27  5:59   ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).