LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Paolo Valente <paolo.valente@linaro.org>
Cc: "Srivatsa S. Bhat" <srivatsa@csail.mit.edu>,
	linux-fsdevel@vger.kernel.org,
	linux-block <linux-block@vger.kernel.org>,
	linux-ext4@vger.kernel.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, axboe@kernel.dk, jack@suse.cz,
	jmoyer@redhat.com, amakhalov@vmware.com, anishs@vmware.com,
	srivatsab@vmware.com
Subject: Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller
Date: Sat, 18 May 2019 15:28:47 -0400	[thread overview]
Message-ID: <20190518192847.GB14277@mit.edu> (raw)
In-Reply-To: <1812E450-14EF-4D5A-8F31-668499E13652@linaro.org>

On Sat, May 18, 2019 at 08:39:54PM +0200, Paolo Valente wrote:
> I've addressed these issues in my last batch of improvements for
> BFQ, which landed in the upcoming 5.2. If you give it a try, and
> still see the problem, then I'll be glad to reproduce it, and
> hopefully fix it for you.

Hi Paolo, I'm curious if you could give a quick summary about what you
changed in BFQ?

I was considering adding support so that if userspace calls fsync(2)
or fdatasync(2), to attach the process's CSS to the transaction, and
then charge all of the journal metadata writes the process's CSS.  If
there are multiple fsync's batched into the transaction, the first
process which forced the early transaction commit would get charged
the entire journal write.  OTOH, journal writes are sequential I/O, so
the amount of disk time for writing the journal is going to be
relatively small, and especially, the fact that work from other
cgroups is going to be minimal, especially if hadn't issued an
fsync().

In the case where you have three cgroups all issuing fsync(2) and they
all landed in the same jbd2 transaction thanks to commit batching, in
the ideal world we would split up the disk time usage equally across
those three cgroups.  But it's probably not worth doing that...

That being said, we probably do need some BFQ support, since in the
case where we have multiple processes doing buffered writes w/o fsync,
we do charnge the data=ordered writeback to each block cgroup.  Worse,
the commit can't complete until the all of the data integrity
writebacks have completed.  And if there are N cgroups with dirty
inodes, and slice_idle set to 8ms, there is going to be 8*N ms worth
of idle time tacked onto the commit time.

If we charge the journal I/O to the cgroup, and there's only one
process doing the 

   dd if=/dev/zero of=/root/test.img bs=512 count=10000 oflags=dsync

then we don't need to worry about this failure mode, since both the
journal I/O and the data writeback will be hitting the same cgroup.
But that's arguably an artificial use case, and much more commonly
there will be multiple cgroups all trying to at least some file system
I/O.

						- Ted

  reply	other threads:[~2019-05-18 19:29 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-17 22:16 Srivatsa S. Bhat
2019-05-18 18:39 ` Paolo Valente
2019-05-18 19:28   ` Theodore Ts'o [this message]
2019-05-20  9:15     ` Jan Kara
2019-05-20 10:45       ` Paolo Valente
2019-05-21 16:48       ` Theodore Ts'o
2019-05-21 18:19         ` Josef Bacik
2019-05-21 19:10           ` Theodore Ts'o
2019-05-20 10:38     ` Paolo Valente
2019-05-21  7:38       ` Andrea Righi
2019-05-18 20:50   ` Srivatsa S. Bhat
2019-05-20 10:19     ` Paolo Valente
2019-05-20 22:45       ` Srivatsa S. Bhat
2019-05-21  6:23         ` Paolo Valente
2019-05-21  7:19           ` Srivatsa S. Bhat
2019-05-21  9:10           ` Jan Kara
2019-05-21 16:31             ` Theodore Ts'o
2019-05-21 11:25       ` Paolo Valente
2019-05-21 13:20         ` Paolo Valente
2019-05-21 16:21           ` Paolo Valente
2019-05-21 17:38             ` Paolo Valente
2019-05-21 22:51               ` Srivatsa S. Bhat
2019-05-22  8:05                 ` Paolo Valente
2019-05-22  9:02                   ` Srivatsa S. Bhat
2019-05-22  9:12                     ` Paolo Valente
2019-05-22 10:02                       ` Srivatsa S. Bhat
2019-05-22  9:09                   ` Paolo Valente
2019-05-22 10:01                     ` Srivatsa S. Bhat
2019-05-22 10:54                       ` Paolo Valente
2019-05-23  2:30                         ` Srivatsa S. Bhat
2019-05-23  9:19                           ` Paolo Valente
2019-05-23 17:22                             ` Paolo Valente
2019-05-23 23:43                               ` Srivatsa S. Bhat
2019-05-24  6:51                                 ` Paolo Valente
2019-05-24  7:56                                   ` Paolo Valente
2019-05-29  1:09                                   ` Srivatsa S. Bhat
2019-05-29  7:41                                     ` Paolo Valente
2019-05-30  8:29                                       ` Srivatsa S. Bhat
2019-05-30 10:45                                         ` Paolo Valente
2019-06-02  7:04                                           ` Srivatsa S. Bhat
2019-06-11 22:34                                             ` Srivatsa S. Bhat
2019-06-12 13:04                                               ` Jan Kara
2019-06-12 19:36                                                 ` Srivatsa S. Bhat
2019-06-13  6:02                                                   ` Greg Kroah-Hartman
2019-06-13 19:03                                                     ` Srivatsa S. Bhat
2019-06-13  8:20                                                   ` Jan Kara
2019-06-13 19:05                                                     ` Srivatsa S. Bhat
2019-06-13  8:37                                                   ` Jens Axboe
2019-06-13  5:46                                               ` Paolo Valente
2019-06-13 19:13                                                 ` Srivatsa S. Bhat
2019-05-23 23:32                           ` Srivatsa S. Bhat
2019-05-30  8:38                             ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190518192847.GB14277@mit.edu \
    --to=tytso@mit.edu \
    --cc=amakhalov@vmware.com \
    --cc=anishs@vmware.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    --cc=srivatsa@csail.mit.edu \
    --cc=srivatsab@vmware.com \
    --subject='Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).