LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Divyesh Shah <dpshah@google.com>
Cc: Ryo Tsuruta <ryov@valinux.co.jp>,
linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org,
virtualization@lists.linux-foundation.org, jens.axboe@oracle.com,
taka@valinux.co.jp, righi.andrea@gmail.com,
s-uchida@ap.jp.nec.com, fernando@oss.ntt.co.jp,
balbir@linux.vnet.ibm.com, akpm@linux-foundation.org,
menage@google.com, ngupta@google.com, riel@redhat.com,
jmoyer@redhat.com, peterz@infradead.org,
Fabio Checconi <fchecconi@gmail.com>,
paolo.valente@unimore.it, Nauman Rafique <nauman@google.com>
Subject: Re: [patch 0/4] [RFC] Another proportional weight IO controller
Date: Fri, 14 Nov 2008 11:05:25 -0500 [thread overview]
Message-ID: <20081114160525.GE24624@redhat.com> (raw)
In-Reply-To: <af41c7c40811131457w472e4a86tb5344cc1d3d366fb@mail.gmail.com>
On Thu, Nov 13, 2008 at 02:57:29PM -0800, Divyesh Shah wrote:
[..]
> > > > Ryo, do you still want to stick to two level scheduling? Given the problem
> > > > of it breaking down underlying scheduler's assumptions, probably it makes
> > > > more sense to the IO control at each individual IO scheduler.
> > >
> > > Vivek,
> > > I agree with you that 2 layer scheduler *might* invalidate some
> > > IO scheduler assumptions (though some testing might help here to
> > > confirm that). However, one big concern I have with proportional
> > > division at the IO scheduler level is that there is no means of doing
> > > admission control at the request queue for the device. What we need is
> > > request queue partitioning per cgroup.
> > > Consider that I want to divide my disk's bandwidth among 3
> > > cgroups(A, B and C) equally. But say some tasks in the cgroup A flood
> > > the disk with IO requests and completely use up all of the requests in
> > > the rq resulting in the following IOs to be blocked on a slot getting
> > > empty in the rq thus affecting their overall latency. One might argue
> > > that over the long term though we'll get equal bandwidth division
> > > between these cgroups. But now consider that cgroup A has tasks that
> > > always storm the disk with large number of IOs which can be a problem
> > > for other cgroups.
> > > This actually becomes an even larger problem when we want to
> > > support high priority requests as they may get blocked behind other
> > > lower priority requests which have used up all the available requests
> > > in the rq. With request queue division we can achieve this easily by
> > > having tasks requiring high priority IO belong to a different cgroup.
> > > dm-ioband and any other 2-level scheduler can do this easily.
> > >
> >
> > Hi Divyesh,
> >
> > I understand that request descriptors can be a bottleneck here. But that
> > should be an issue even today with CFQ where a low priority process
> > consume lots of request descriptors and prevent higher priority process
> > from submitting the request.
>
> Yes that is true and that is one of the main reasons why I would lean
> towards 2-level scheduler coz you get request queue division as well.
>
> I think you already said it and I just
> > reiterated it.
> >
> > I think in that case we need to do something about request descriptor
> > allocation instead of relying on 2nd level of IO scheduler.
> > At this point I am not sure what to do. May be we can take feedback from the
> > respective queue (like cfqq) of submitting application and if it is already
> > backlogged beyond a certain limit, then we can put that application to sleep
> > and stop it from consuming excessive amount of request descriptors
> > (despite the fact that we have free request descriptors).
>
> This should be done per-cgroup rather than per-process.
>
Yep, per cgroup limit will make more sense. get_request() already calls
elv_may_queue() to get a feedback from IO scheduler. May be here IO
scheduler can make a decision how many request descriptors are already
allocated to this cgroup. And if the queue is congested, then IO scheduler
can deny the fresh request allocation.
Thanks
Vivek
next prev parent reply other threads:[~2008-11-14 16:07 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-06 15:30 vgoyal
2008-11-06 15:30 ` [patch 1/4] io controller: documentation vgoyal
2008-11-07 2:32 ` KAMEZAWA Hiroyuki
2008-11-07 14:27 ` Vivek Goyal
2008-11-10 2:48 ` Li Zefan
2008-11-10 13:44 ` Vivek Goyal
2008-11-06 15:30 ` [patch 2/4] io controller: biocgroup implementation vgoyal
2008-11-07 2:50 ` KAMEZAWA Hiroyuki
2008-11-07 4:19 ` Hirokazu Takahashi
2008-11-07 14:44 ` Vivek Goyal
2008-11-06 15:30 ` [patch 3/4] io controller: Core IO controller implementation logic vgoyal
2008-11-07 3:21 ` KAMEZAWA Hiroyuki
2008-11-07 14:50 ` Vivek Goyal
2008-11-08 2:35 ` [patch 3/4] io controller: Core IO controller implementationlogic KAMEZAWA Hiroyuki
2008-11-11 8:50 ` [patch 3/4] io controller: Core IO controller implementation logic Gui Jianfeng
2008-11-06 15:30 ` [patch 4/4] io controller: Put IO controller to use in device mapper and standard make_request() function vgoyal
2008-11-06 15:49 ` [patch 0/4] [RFC] Another proportional weight IO controller Peter Zijlstra
2008-11-06 16:01 ` Vivek Goyal
2008-11-06 16:16 ` Peter Zijlstra
2008-11-06 16:39 ` Vivek Goyal
2008-11-06 16:52 ` Peter Zijlstra
2008-11-06 16:57 ` Rik van Riel
2008-11-06 17:11 ` Peter Zijlstra
2008-11-07 0:41 ` Dave Chinner
2008-11-07 10:31 ` Peter Zijlstra
2008-11-09 9:40 ` Dave Chinner
2008-11-06 17:08 ` Vivek Goyal
2008-11-06 23:07 ` Nauman Rafique
2008-11-07 14:19 ` Vivek Goyal
2008-11-07 21:36 ` Nauman Rafique
2008-11-10 14:11 ` Vivek Goyal
2008-11-11 19:55 ` Nauman Rafique
2008-11-11 22:30 ` Vivek Goyal
2008-11-12 21:20 ` Nauman Rafique
2008-11-13 13:49 ` Fabio Checconi
2008-11-13 18:08 ` Vivek Goyal
2008-11-13 19:15 ` Fabio Checconi
2008-11-13 22:27 ` Nauman Rafique
2008-11-13 23:10 ` Fabio Checconi
2008-11-14 4:58 ` Satoshi UCHIDA
2008-11-14 8:02 ` Peter Zijlstra
2008-11-14 10:06 ` Satoshi UCHIDA
2008-11-06 16:47 ` Rik van Riel
2008-11-07 2:36 ` Gui Jianfeng
2008-11-07 13:38 ` Vivek Goyal
2008-11-13 9:05 ` Ryo Tsuruta
2008-11-13 15:58 ` Vivek Goyal
2008-11-13 18:41 ` Divyesh Shah
2008-11-13 21:46 ` Vivek Goyal
2008-11-13 22:57 ` Divyesh Shah
2008-11-14 16:05 ` Vivek Goyal [this message]
2008-11-14 22:44 ` Nauman Rafique
2008-11-17 14:23 ` Vivek Goyal
2008-11-18 2:02 ` Li Zefan
2008-11-18 5:01 ` Nauman Rafique
2008-11-18 7:42 ` Li Zefan
2008-11-18 22:23 ` Nauman Rafique
2008-11-18 12:05 ` Fabio Checconi
2008-11-18 14:07 ` Vivek Goyal
2008-11-18 14:41 ` Fabio Checconi
2008-11-18 19:12 ` Jens Axboe
2008-11-18 19:47 ` Vivek Goyal
2008-11-18 21:14 ` Fabio Checconi
2008-11-19 1:52 ` Aaron Carroll
2008-11-19 10:17 ` Fabio Checconi
2008-11-19 11:06 ` Fabio Checconi
2008-11-20 4:45 ` Aaron Carroll
2008-11-20 6:56 ` Fabio Checconi
2008-11-19 14:30 ` Jens Axboe
2008-11-19 15:52 ` Fabio Checconi
2008-11-18 23:07 ` Nauman Rafique
2008-11-19 14:24 ` Jens Axboe
2008-11-20 0:12 ` Divyesh Shah
2008-11-20 8:16 ` Jens Axboe
2008-11-20 13:40 ` Vivek Goyal
2008-11-20 19:54 ` Nauman Rafique
2008-11-20 21:15 ` Vivek Goyal
2008-11-20 22:42 ` Nauman Rafique
2008-11-21 15:22 ` Vivek Goyal
2008-11-26 6:40 ` Fernando Luis Vázquez Cao
2008-11-26 15:18 ` Vivek Goyal
2008-11-20 21:31 ` Vivek Goyal
2008-11-21 3:05 ` Fabio Checconi
2008-11-21 14:58 ` Vivek Goyal
2008-11-21 15:21 ` Fabio Checconi
2008-11-18 22:33 ` Nauman Rafique
2008-11-18 23:44 ` Fabio Checconi
2008-11-19 7:09 ` Paolo Valente
2008-11-13 22:13 ` Vivek Goyal
2008-11-20 9:20 ` Ryo Tsuruta
2008-11-20 13:47 ` Vivek Goyal
2008-11-25 2:33 ` Ryo Tsuruta
2008-11-25 16:27 ` Vivek Goyal
2008-11-25 22:38 ` Nauman Rafique
2008-11-26 14:06 ` Paolo Valente
2008-11-26 19:41 ` Nauman Rafique
2008-11-26 22:21 ` Fabio Checconi
2008-11-26 11:55 ` Fernando Luis Vázquez Cao
2008-11-26 12:47 ` Ryo Tsuruta
2008-11-26 16:08 ` Vivek Goyal
2008-11-27 8:43 ` Fernando Luis Vázquez Cao
2008-11-28 3:09 ` Ryo Tsuruta
2008-11-28 13:33 ` Ryo Tsuruta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081114160525.GE24624@redhat.com \
--to=vgoyal@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=dpshah@google.com \
--cc=fchecconi@gmail.com \
--cc=fernando@oss.ntt.co.jp \
--cc=jens.axboe@oracle.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=nauman@google.com \
--cc=ngupta@google.com \
--cc=paolo.valente@unimore.it \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=taka@valinux.co.jp \
--cc=virtualization@lists.linux-foundation.org \
--subject='Re: [patch 0/4] [RFC] Another proportional weight IO controller' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).