LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example ...
@ 2007-03-31 20:09 Davide Libenzi
  2007-04-01 15:06 ` Avi Kivity
  0 siblings, 1 reply; 9+ messages in thread
From: Davide Libenzi @ 2007-03-31 20:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Andrew Morton, Linus Torvalds, Ingo Molnar, Suparna Bhattacharya,
	Zach Brown, Benjamin LaHaise

This is an example about how to add eventfd support to the current KAIO code,
in order to enable KAIO to post readiness events to a pollable fd
(hence compatible with POSIX select/poll). The KAIO code simply signals
the eventfd fd when events are ready, and this triggers a POLLIN in the fd.
This patch uses a reserved for future use member of the struct iocb to pass
an eventfd file descriptor, that KAIO will use to post events every time
a request completes. At that point, an aio_getevents() will return the
completed result to a struct io_event.
I made a quick test program to verify the patch, and it runs fine here:

http://www.xmailserver.org/eventfd-aio-test.c

The test program uses poll(2), but it'd, of course, work with select and
epoll too.
This can allow to schedule both block I/O and other poll-able devices requests,
and wait for results using select/poll/epoll.
In a typical scenario, an application would submit KAIO request using aio_submit(),
and will also use epoll_ctl() on the whole other class of devices (that
with the addition of signals, timers and user events, now it's pretty much
complete), and then would:

	epoll_wait(...);
	for_each_event {
		if (curr_event_is_kaiofd) {
			aio_getevents();
			dispatch_aio_events();
		} else {
			dispatch_epoll_event();
		}
	}



Signed-off-by: Davide Libenzi <davidel@xmailserver.org>



- Davide



Index: linux-2.6.21-rc5.fds/fs/aio.c
===================================================================
--- linux-2.6.21-rc5.fds.orig/fs/aio.c	2007-03-31 12:45:30.000000000 -0700
+++ linux-2.6.21-rc5.fds/fs/aio.c	2007-03-31 12:46:31.000000000 -0700
@@ -30,6 +30,7 @@
 #include <linux/highmem.h>
 #include <linux/workqueue.h>
 #include <linux/security.h>
+#include <linux/eventfd.h>
 
 #include <asm/kmap_types.h>
 #include <asm/uaccess.h>
@@ -421,6 +422,7 @@
 	req->private = NULL;
 	req->ki_iovec = NULL;
 	INIT_LIST_HEAD(&req->ki_run_list);
+	req->ki_eventfd = ERR_PTR(-EINVAL);
 
 	/* Check if the completion queue has enough free space to
 	 * accept an event from this io.
@@ -462,6 +464,8 @@
 {
 	assert_spin_locked(&ctx->ctx_lock);
 
+	if (!IS_ERR(req->ki_eventfd))
+		fput(req->ki_eventfd);
 	if (req->ki_dtor)
 		req->ki_dtor(req);
 	if (req->ki_iovec != &req->ki_inline_vec)
@@ -946,6 +950,14 @@
 		return 1;
 	}
 
+	/*
+	 * Check if the user asked us to deliver the result through an
+	 * eventfd. The eventfd_signal() function is safe to be called
+	 * from IRQ context.
+	 */
+	if (!IS_ERR(iocb->ki_eventfd))
+		eventfd_signal(iocb->ki_eventfd, 1);
+
 	info = &ctx->ring_info;
 
 	/* add a completion event to the ring buffer.
@@ -1555,6 +1567,19 @@
 		fput(file);
 		return -EAGAIN;
 	}
+	if (iocb->aio_resfd != 0) {
+		/*
+		 * If the aio_resfd field of the iocb is not zero, get an
+		 * instance of the file* now. The file descriptor must be
+		 * an eventfd() fd, and will be signaled for each completed
+		 * event using the eventfd_signal() function.
+		 */
+		req->ki_eventfd = eventfd_fget((int) iocb->aio_resfd);
+		if (IS_ERR(req->ki_eventfd)) {
+			ret = PTR_ERR(req->ki_eventfd);
+			goto out_put_req;
+		}
+	}
 
 	req->ki_filp = file;
 	ret = put_user(req->ki_key, &user_iocb->aio_key);
Index: linux-2.6.21-rc5.fds/include/linux/aio.h
===================================================================
--- linux-2.6.21-rc5.fds.orig/include/linux/aio.h	2007-03-31 12:45:30.000000000 -0700
+++ linux-2.6.21-rc5.fds/include/linux/aio.h	2007-03-31 12:46:31.000000000 -0700
@@ -119,6 +119,12 @@
 
 	struct list_head	ki_list;	/* the aio core uses this
 						 * for cancellation */
+
+	/*
+	 * If the aio_resfd field of the userspace iocb is not zero,
+	 * this is the underlying file* to deliver event to.
+	 */
+	struct file		*ki_eventfd;
 };
 
 #define is_sync_kiocb(iocb)	((iocb)->ki_key == KIOCB_SYNC_KEY)
Index: linux-2.6.21-rc5.fds/include/linux/aio_abi.h
===================================================================
--- linux-2.6.21-rc5.fds.orig/include/linux/aio_abi.h	2007-03-31 12:45:30.000000000 -0700
+++ linux-2.6.21-rc5.fds/include/linux/aio_abi.h	2007-03-31 12:46:31.000000000 -0700
@@ -84,7 +84,11 @@
 
 	/* extra parameters */
 	__u64	aio_reserved2;	/* TODO: use this for a (struct sigevent *) */
-	__u64	aio_reserved3;
+	__u32	aio_reserved3;
+	/*
+	 * If different from 0, this is an eventfd to deliver AIO results to
+	 */
+	__u32	aio_resfd;
 }; /* 64 bytes */
 
 #undef IFBIG


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example ...
  2007-03-31 20:09 [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example Davide Libenzi
@ 2007-04-01 15:06 ` Avi Kivity
  2007-04-01 17:07   ` Davide Libenzi
  0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2007-04-01 15:06 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linux Kernel Mailing List, Andrew Morton, Linus Torvalds,
	Ingo Molnar, Suparna Bhattacharya, Zach Brown, Benjamin LaHaise

Davide Libenzi wrote:
> This is an example about how to add eventfd support to the current KAIO code,
> in order to enable KAIO to post readiness events to a pollable fd
> (hence compatible with POSIX select/poll). The KAIO code simply signals
> the eventfd fd when events are ready, and this triggers a POLLIN in the fd.
> This patch uses a reserved for future use member of the struct iocb to pass
> an eventfd file descriptor, that KAIO will use to post events every time
> a request completes. At that point, an aio_getevents() will return the
> completed result to a struct io_event.
> I made a quick test program to verify the patch, and it runs fine here:
>
> Index: linux-2.6.21-rc5.fds/include/linux/aio_abi.h
> ===================================================================
> --- linux-2.6.21-rc5.fds.orig/include/linux/aio_abi.h	2007-03-31 12:45:30.000000000 -0700
> +++ linux-2.6.21-rc5.fds/include/linux/aio_abi.h	2007-03-31 12:46:31.000000000 -0700
> @@ -84,7 +84,11 @@
>  
>  	/* extra parameters */
>  	__u64	aio_reserved2;	/* TODO: use this for a (struct sigevent *) */
> -	__u64	aio_reserved3;
> +	__u32	aio_reserved3;
> +	/*
> +	 * If different from 0, this is an eventfd to deliver AIO results to
> +	 */
> +	__u32	aio_resfd;
>  }; /* 64 bytes */
>  

What is the motivation for adding aio_resfd to an individual iocb 
instead of the entire io context?  It seems redundant, as you can 
already create multiple io contexts to wait on.

[also, minor nit: sys_eventfd() can legitimately return 0, but you're 
banning its use in aio.  userspace could easily dup() and close(), but...]

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example ...
  2007-04-01 15:06 ` Avi Kivity
@ 2007-04-01 17:07   ` Davide Libenzi
  2007-04-01 17:23     ` Linus Torvalds
  2007-04-01 18:42     ` Avi Kivity
  0 siblings, 2 replies; 9+ messages in thread
From: Davide Libenzi @ 2007-04-01 17:07 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linux Kernel Mailing List, Andrew Morton, Linus Torvalds,
	Ingo Molnar, Suparna Bhattacharya, Zach Brown, Benjamin LaHaise

On Sun, 1 Apr 2007, Avi Kivity wrote:

> What is the motivation for adding aio_resfd to an individual iocb instead of
> the entire io context?  It seems redundant, as you can already create multiple
> io contexts to wait on.

To add it to the context, you need to either change the context create API 
(I think no-go here), or add a new syscall just to handle that.
Doing it in the iocb gives finer grained setup, but can be more work for 
the user that wants to use it for all the iocbs.


> [also, minor nit: sys_eventfd() can legitimately return 0, but you're banning
> its use in aio.  userspace could easily dup() and close(), but...]

Noone said that binary compatibility comes free ;)
I'm truly open to other ways of integration, but I thought that binary 
compatibility is the key here.



- Davide



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example ...
  2007-04-01 17:07   ` Davide Libenzi
@ 2007-04-01 17:23     ` Linus Torvalds
  2007-04-01 17:31       ` Davide Libenzi
  2007-04-01 18:42     ` Avi Kivity
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2007-04-01 17:23 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Avi Kivity, Linux Kernel Mailing List, Andrew Morton,
	Ingo Molnar, Suparna Bhattacharya, Zach Brown, Benjamin LaHaise



On Sun, 1 Apr 2007, Davide Libenzi wrote:
> 
> > [also, minor nit: sys_eventfd() can legitimately return 0, but you're banning
> > its use in aio.  userspace could easily dup() and close(), but...]
> 
> Noone said that binary compatibility comes free ;)
> I'm truly open to other ways of integration, but I thought that binary 
> compatibility is the key here.

I would suggest taking "aio_reserved3" and turning it into

	__u32 aio_flags;
	__u32 aio_resfd;

if so. And then some bit in the "flags" field has to be set (say, 
"IOCB_FLAG_RESFB") to enable this codepath, instead of expecting that 
aio_resfd == 0 is special.

It's generally always a good thing to have a "flags" field anyway, exactly 
because it allows for future expansion. 

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example ...
  2007-04-01 17:23     ` Linus Torvalds
@ 2007-04-01 17:31       ` Davide Libenzi
  0 siblings, 0 replies; 9+ messages in thread
From: Davide Libenzi @ 2007-04-01 17:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Avi Kivity, Linux Kernel Mailing List, Andrew Morton,
	Ingo Molnar, Suparna Bhattacharya, Zach Brown, Benjamin LaHaise

On Sun, 1 Apr 2007, Linus Torvalds wrote:

> On Sun, 1 Apr 2007, Davide Libenzi wrote:
> > 
> > > [also, minor nit: sys_eventfd() can legitimately return 0, but you're banning
> > > its use in aio.  userspace could easily dup() and close(), but...]
> > 
> > Noone said that binary compatibility comes free ;)
> > I'm truly open to other ways of integration, but I thought that binary 
> > compatibility is the key here.
> 
> I would suggest taking "aio_reserved3" and turning it into
> 
> 	__u32 aio_flags;
> 	__u32 aio_resfd;
> 
> if so. And then some bit in the "flags" field has to be set (say, 
> "IOCB_FLAG_RESFB") to enable this codepath, instead of expecting that 
> aio_resfd == 0 is special.
> 
> It's generally always a good thing to have a "flags" field anyway, exactly 
> because it allows for future expansion. 

Makes sense. I was actually thinking of using the higher bit to signal 
that, but a flags field makes more sense.



- Davide



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example ...
  2007-04-01 17:07   ` Davide Libenzi
  2007-04-01 17:23     ` Linus Torvalds
@ 2007-04-01 18:42     ` Avi Kivity
  2007-04-01 18:49       ` Davide Libenzi
  1 sibling, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2007-04-01 18:42 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linux Kernel Mailing List, Andrew Morton, Linus Torvalds,
	Ingo Molnar, Suparna Bhattacharya, Zach Brown, Benjamin LaHaise

Davide Libenzi wrote:
> On Sun, 1 Apr 2007, Avi Kivity wrote:
>
>   
>> What is the motivation for adding aio_resfd to an individual iocb instead of
>> the entire io context?  It seems redundant, as you can already create multiple
>> io contexts to wait on.
>>     
>
> To add it to the context, you need to either change the context create API 
> (I think no-go here), or add a new syscall just to handle that.
> Doing it in the iocb gives finer grained setup, but can be more work for 
> the user that wants to use it for all the iocbs.
>   

I think it's a bit too fine grained, and a new system call 
(io_bindfd()?) would be easier to use.  In addition, you would move the 
eventfd_fget() out of the submission path.

For the users that want fine grained control, they can already call 
io_setup() multiple times and submit iocbs to different completion rings.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example ...
  2007-04-01 18:42     ` Avi Kivity
@ 2007-04-01 18:49       ` Davide Libenzi
  2007-04-01 20:11         ` Avi Kivity
  0 siblings, 1 reply; 9+ messages in thread
From: Davide Libenzi @ 2007-04-01 18:49 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linux Kernel Mailing List, Andrew Morton, Linus Torvalds,
	Ingo Molnar, Suparna Bhattacharya, Zach Brown, Benjamin LaHaise

On Sun, 1 Apr 2007, Avi Kivity wrote:

> Davide Libenzi wrote:
> > On Sun, 1 Apr 2007, Avi Kivity wrote:
> > 
> >   
> > > What is the motivation for adding aio_resfd to an individual iocb instead
> > > of
> > > the entire io context?  It seems redundant, as you can already create
> > > multiple
> > > io contexts to wait on.
> > >     
> > 
> > To add it to the context, you need to either change the context create API
> > (I think no-go here), or add a new syscall just to handle that.
> > Doing it in the iocb gives finer grained setup, but can be more work for the
> > user that wants to use it for all the iocbs.
> >   
> 
> I think it's a bit too fine grained, and a new system call (io_bindfd()?)
> would be easier to use.  In addition, you would move the eventfd_fget() out of
> the submission path.

IMO the cost of the eventfd_fget() (have you seen it?) is not worth adding 
a new syscall.
Actually, the flags field that Linus suggested may be given an extra meaning of 
"bind to ctx", that'd solve the problem w/out new syscalls.



- Davide



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example ...
  2007-04-01 18:49       ` Davide Libenzi
@ 2007-04-01 20:11         ` Avi Kivity
  2007-04-01 20:20           ` Davide Libenzi
  0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2007-04-01 20:11 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Linux Kernel Mailing List, Andrew Morton, Linus Torvalds,
	Ingo Molnar, Suparna Bhattacharya, Zach Brown, Benjamin LaHaise

Davide Libenzi wrote:

  

>> I think it's a bit too fine grained, and a new system call (io_bindfd()?)
>> would be easier to use.  In addition, you would move the eventfd_fget() out of
>> the submission path.
>>     
>
> IMO the cost of the eventfd_fget() (have you seen it?) is not worth adding 
> a new syscall.
>   

There's an atomic op there (and another on the way out).  Probably on a 
busy cacheline.  Still it's probably lost in the noise.

Regardless of that, I think that specifying the fd per submission is 
wrong.  It feels like a setup thing that needs to be done once.  We 
shouldn't skimp on syscalls, especially on something as important as 
unifying the async event model.


> Actually, the flags field that Linus suggested may be given an extra meaning of 
> "bind to ctx", that'd solve the problem w/out new syscalls.
>
>   

I don't see how.  It's still per submission.  You could do it on the 
first iocb, but that's just adding warts to the API.

You could add an IO_CMD_BIND_EVENTFD, but that feels wrong too, as it 
isn't really an I/O command.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example ...
  2007-04-01 20:11         ` Avi Kivity
@ 2007-04-01 20:20           ` Davide Libenzi
  0 siblings, 0 replies; 9+ messages in thread
From: Davide Libenzi @ 2007-04-01 20:20 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Linux Kernel Mailing List, Andrew Morton, Linus Torvalds,
	Ingo Molnar, Suparna Bhattacharya, Zach Brown, Benjamin LaHaise

On Sun, 1 Apr 2007, Avi Kivity wrote:

> Davide Libenzi wrote:
> 
>  
> > > I think it's a bit too fine grained, and a new system call (io_bindfd()?)
> > > would be easier to use.  In addition, you would move the eventfd_fget()
> > > out of
> > > the submission path.
> > >     
> > 
> > IMO the cost of the eventfd_fget() (have you seen it?) is not worth adding a
> > new syscall.
> >   
> 
> There's an atomic op there (and another on the way out).  Probably on a busy
> cacheline.  Still it's probably lost in the noise.
> 
> Regardless of that, I think that specifying the fd per submission is wrong.
> It feels like a setup thing that needs to be done once.  We shouldn't skimp on
> syscalls, especially on something as important as unifying the async event
> model.

For userspace that wants to do that (that is, having iocbs signaled to an 
eventfd), do not really matter much. They have their own iocb setup function
and they'd simply set a flag more and set the resfd.
The global setting may be good for some users, with others that may 
complain in having to create another ctx just to set an fd.
Again, IMO is not worth a new API. What do other folks think?




- Davide



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-04-01 20:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-31 20:09 [patch 13/13] signal/timer/event fds v9 - KAIO eventfd support example Davide Libenzi
2007-04-01 15:06 ` Avi Kivity
2007-04-01 17:07   ` Davide Libenzi
2007-04-01 17:23     ` Linus Torvalds
2007-04-01 17:31       ` Davide Libenzi
2007-04-01 18:42     ` Avi Kivity
2007-04-01 18:49       ` Davide Libenzi
2007-04-01 20:11         ` Avi Kivity
2007-04-01 20:20           ` Davide Libenzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).