Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Matthew Wilcox <willy@infradead.org>,
	Andy Lutomirski <luto@amacapital.net>
Cc: Stefano Garzarella <sgarzare@redhat.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Kees Cook <keescook@chromium.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	strace-devel@lists.strace.io, io-uring@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: strace of io_uring events?
Date: Wed, 15 Jul 2020 22:42:04 +0300	[thread overview]
Message-ID: <7c09f6af-653f-db3f-2378-02dca2bc07f7@gmail.com> (raw)
In-Reply-To: <20200715171130.GG12769@casper.infradead.org>

On 15/07/2020 20:11, Matthew Wilcox wrote:
> On Wed, Jul 15, 2020 at 07:35:50AM -0700, Andy Lutomirski wrote:
>>> On Jul 15, 2020, at 4:12 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>
>>> <feff>Hi,
> 
> feff?  Are we doing WTF-16 in email now?  ;-)
> 
>>>
>>> This thread is to discuss the possibility of stracing requests
>>> submitted through io_uring.   I'm not directly involved in io_uring
>>> development, so I'm posting this out of  interest in using strace on
>>> processes utilizing io_uring.
>>>
>>> io_uring gives the developer a way to bypass the syscall interface,
>>> which results in loss of information when tracing.  This is a strace
>>> fragment on  "io_uring-cp" from liburing:
>>>
>>> io_uring_enter(5, 40, 0, 0, NULL, 8)    = 40
>>> io_uring_enter(5, 1, 0, 0, NULL, 8)     = 1
>>> io_uring_enter(5, 1, 0, 0, NULL, 8)     = 1
>>> ...
>>>
>>> What really happens are read + write requests.  Without that
>>> information the strace output is mostly useless.
>>>
>>> This loss of information is not new, e.g. calls through the vdso or
>>> futext fast paths are also invisible to strace.  But losing filesystem
>>> I/O calls are a major blow, imo.

To clear details for those who are not familiar with io_uring:

io_uring has a pair of queues, submission (SQ) and completion queues (CQ),
both shared between kernel and user spaces. The userspace submits requests
by filling a chunk of memory in SQ. The kernel picks up SQ entries in
(syscall io_uring_enter) or asynchronously by polling SQ.

CQ entries are filled by the kernel completely asynchronously and
in parallel. Some users just poll CQ to get them, but also have a way
to wait for them.

>>>
>>> What do people think?
>>>
>>> From what I can tell, listing the submitted requests on
>>> io_uring_enter() would not be hard.  Request completion is
>>> asynchronous, however, and may not require  io_uring_enter() syscall.
>>> Am I correct?

Both, submission and completion sides may not require a syscall.

>>>
>>> Is there some existing tracing infrastructure that strace could use to
>>> get async completion events?  Should we be introducing one?

There are static trace points covering all needs.

And if not used the whole thing have to be zero-overhead. Otherwise
there is perf, which is zero-overhead, and this IMHO won't fly. 

>>
>> Let’s add some seccomp folks. We probably also want to be able to run
>> seccomp-like filters on io_uring requests. So maybe io_uring should
>> call into seccomp-and-tracing code for each action.
> 
> Adding Stefano since he had a complementary proposal for iouring
> restrictions that weren't exactly seccomp.
> 

-- 
Pavel Begunkov

  reply	other threads:[~2020-07-15 19:43 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15 11:12 Miklos Szeredi
2020-07-15 14:35 ` Andy Lutomirski
2020-07-15 17:11   ` Matthew Wilcox
2020-07-15 19:42     ` Pavel Begunkov [this message]
2020-07-15 20:09       ` Miklos Szeredi
2020-07-15 20:20         ` Pavel Begunkov
2020-07-15 23:07           ` Kees Cook
2020-07-16 13:14             ` Stefano Garzarella
2020-07-16 15:12               ` Kees Cook
2020-07-17  8:01                 ` Stefano Garzarella
2020-07-21 15:27                   ` Andy Lutomirski
2020-07-21 15:31                     ` Jens Axboe
2020-07-21 17:23                       ` Andy Lutomirski
2020-07-21 17:30                         ` Jens Axboe
2020-07-21 17:44                           ` Andy Lutomirski
2020-07-21 18:39                             ` Jens Axboe
2020-07-21 19:44                               ` Andy Lutomirski
2020-07-21 19:48                                 ` Jens Axboe
2020-07-21 19:56                                 ` Andres Freund
2020-07-21 19:37                         ` Andres Freund
2020-07-21 15:58                     ` Stefano Garzarella
2020-07-23 10:39                       ` Stefan Hajnoczi
2020-07-23 13:37                       ` Colin Walters
2020-07-24  7:25                         ` Stefano Garzarella
2020-07-16 13:17             ` Aleksa Sarai
2020-07-16 15:19               ` Kees Cook
2020-07-17  8:17               ` Cyril Hrubis
2020-07-16 16:24             ` Andy Lutomirski
2020-07-16  0:12     ` tytso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c09f6af-653f-db3f-2378-02dca2bc07f7@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=io-uring@vger.kernel.org \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=miklos@szeredi.hu \
    --cc=sgarzare@redhat.com \
    --cc=strace-devel@lists.strace.io \
    --cc=willy@infradead.org \
    --subject='Re: strace of io_uring events?' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).