LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Nauman Rafique <nauman@google.com>
Cc: Jens Axboe <jens.axboe@oracle.com>,
Divyesh Shah <dpshah@google.com>,
Fabio Checconi <fchecconi@gmail.com>,
Li Zefan <lizf@cn.fujitsu.com>, Ryo Tsuruta <ryov@valinux.co.jp>,
linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org,
virtualization@lists.linux-foundation.org, taka@valinux.co.jp,
righi.andrea@gmail.com, s-uchida@ap.jp.nec.com,
fernando@oss.ntt.co.jp, balbir@linux.vnet.ibm.com,
akpm@linux-foundation.org, menage@google.com, ngupta@google.com,
riel@redhat.com, jmoyer@redhat.com, peterz@infradead.org,
paolo.valente@unimore.it
Subject: Re: [patch 0/4] [RFC] Another proportional weight IO controller
Date: Fri, 21 Nov 2008 10:22:23 -0500 [thread overview]
Message-ID: <20081121152223.GE3111@redhat.com> (raw)
In-Reply-To: <e98e18940811201442s787a346em4ada30bcb1badfe6@mail.gmail.com>
On Thu, Nov 20, 2008 at 02:42:38PM -0800, Nauman Rafique wrote:
[..]
> >> It seems that we have a solution if we can figure out a way to share
> >> cgroup code between different schedulers. I am thinking how other
> >> schedulers (AS, Deadline, No-op) would use cgroups. Will they have
> >> proportional division between requests from different cgroups? And use
> >> their own policy (e.g deadline scheduling) within a cgroup? How about
> >> if we have both threads and cgroups at a particular level? I think
> >> putting all threads in a default cgroup seems like a reasonable choice
> >> in this case.
> >>
> >> Here is a high level design that comes to mind.
> >>
> >> Put proportional division code and state in common code. Each level of
> >> the hierarchy which has more than one cgroup would have some state
> >> maintained in common code. At leaf level of hiearchy, we can have a
> >> cgroup specific scheduler (created when a cgroup is created). We can
> >> choose a different scheduler for each cgroup (we can have a no-op for
> >> one cgroup while cfq for another).
> >
> > I am not sure that I understand the different scheduler for each cgroup
> > aspect of it. What's the need? It makes things even more complicated I
> > think.
>
> With the design I had in my mind, it seemed like that would come for
> free. But if it does not, I completely agree with you that its not as
> important.
>
> >
> > But moving proportional division code out of particular scheduler and make
> > it common makes sense.
> >
> > Looking at BFQ, I was thinking that we can just keep large part of the
> > code. This common code can think of everything as scheduling entity. This
> > scheduling entity (SE) will be defined by underlying scheduler depending on
> > how queue management is done by underlying scheduler. So for CFQ, at
> > each level, an SE can be either task or group. For the schedulers which
> > don't maintain separate queues for tasks, it will simply be group at all
> > levels.
>
> So the structure of hierarchy would be dependent on the underlying scheduler?
>
Kind of. In fact it will depend on cgroup hierarchy and dependent on
underlying scheduler.
> >
> > We probably can employ B-WFQ2+ to provide hierarchical fairness between
> > secheduling entities of this tree. Common layer will do the scheduling of
> > entities (without knowing what is contained inside) and underlying scheduler
> > will take care of dispatching the requests from the scheduled entity.
> > (It could be a task queue for CFQ or a group queue for other schedulers).
> >
> > The tricky part would be how to abstract it in a clean way. It should lead
> > to reduced code in CFQ/BFQ because B-WFQ2+ logic will be put into a
> > common layer (for large part).
>
> How about this plan:
> 1 Start with CFQ patched with some BFQ like patches (This is what we
> will have if Jens takes some of Fabio's patches). This will have no
> cgroup related logic (correct me if I am wrong).
> 2 Repeat proportional scheduling logic for cgroups in the common
> layer, without touching the code produced in step 1. That means that
> we will have WF2Q+ used for scheduling cgroup time slices proportional
> to weight in the common code. If CFQ (step 1 output) is used as
> scheduler, WF2Q+ would be used there too, but to schedule time slices
> (in proportion to priorities?) between different threads. Common code
> logic will be completely oblivious of the actual scheduler used
> (patched CFQ, Deadline, AS etc).
I think once you start using WF2Q+ in common layer, CFQ will have to get
rid of that code. (Remember in case of CFQ, we will have a tree which
has got both task and groups as Scheduling Entity). So common layer code
can select the next entity to be dispatched base on WFQ2+ and then
CFQ will decide which request to dispatch with-in that scheduling entity.
So may be we can start with bfq and try to break the code in two pieces.
One common code and one scheduler specific code. Then try to make use
of common code in deadline or anticipatory to see if things work fine. If,
that works, then we can get to CFQ to make use of common code. By that
time CFQ should have Fabio's changes. I think that will include WF2Q+
algorithm also (At least to provide faireness among taks, and not the
hierarchical thing). Once common layer WF2Q+ works well, we can get rid
of WF2Q+ from CFQ and try to complete the picture.
> cgroup tracking has to be implemented as part of step 2. The good
> thing is that step 2 can proceed independent of step 1, as the output
> of step 1 will have the same interface as the existing CFQ scheduler.
>
Agreed. any kind of tracking based on bio and not the task context shall
have to be done later, once we have come up with common layer code.
These are very vague high level ideas. Devil lies in details. :-) I will
get started to see how feasible the common layer code idea is.
Thanks
Vivek
next prev parent reply other threads:[~2008-11-21 15:24 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-06 15:30 vgoyal
2008-11-06 15:30 ` [patch 1/4] io controller: documentation vgoyal
2008-11-07 2:32 ` KAMEZAWA Hiroyuki
2008-11-07 14:27 ` Vivek Goyal
2008-11-10 2:48 ` Li Zefan
2008-11-10 13:44 ` Vivek Goyal
2008-11-06 15:30 ` [patch 2/4] io controller: biocgroup implementation vgoyal
2008-11-07 2:50 ` KAMEZAWA Hiroyuki
2008-11-07 4:19 ` Hirokazu Takahashi
2008-11-07 14:44 ` Vivek Goyal
2008-11-06 15:30 ` [patch 3/4] io controller: Core IO controller implementation logic vgoyal
2008-11-07 3:21 ` KAMEZAWA Hiroyuki
2008-11-07 14:50 ` Vivek Goyal
2008-11-08 2:35 ` [patch 3/4] io controller: Core IO controller implementationlogic KAMEZAWA Hiroyuki
2008-11-11 8:50 ` [patch 3/4] io controller: Core IO controller implementation logic Gui Jianfeng
2008-11-06 15:30 ` [patch 4/4] io controller: Put IO controller to use in device mapper and standard make_request() function vgoyal
2008-11-06 15:49 ` [patch 0/4] [RFC] Another proportional weight IO controller Peter Zijlstra
2008-11-06 16:01 ` Vivek Goyal
2008-11-06 16:16 ` Peter Zijlstra
2008-11-06 16:39 ` Vivek Goyal
2008-11-06 16:52 ` Peter Zijlstra
2008-11-06 16:57 ` Rik van Riel
2008-11-06 17:11 ` Peter Zijlstra
2008-11-07 0:41 ` Dave Chinner
2008-11-07 10:31 ` Peter Zijlstra
2008-11-09 9:40 ` Dave Chinner
2008-11-06 17:08 ` Vivek Goyal
2008-11-06 23:07 ` Nauman Rafique
2008-11-07 14:19 ` Vivek Goyal
2008-11-07 21:36 ` Nauman Rafique
2008-11-10 14:11 ` Vivek Goyal
2008-11-11 19:55 ` Nauman Rafique
2008-11-11 22:30 ` Vivek Goyal
2008-11-12 21:20 ` Nauman Rafique
2008-11-13 13:49 ` Fabio Checconi
2008-11-13 18:08 ` Vivek Goyal
2008-11-13 19:15 ` Fabio Checconi
2008-11-13 22:27 ` Nauman Rafique
2008-11-13 23:10 ` Fabio Checconi
2008-11-14 4:58 ` Satoshi UCHIDA
2008-11-14 8:02 ` Peter Zijlstra
2008-11-14 10:06 ` Satoshi UCHIDA
2008-11-06 16:47 ` Rik van Riel
2008-11-07 2:36 ` Gui Jianfeng
2008-11-07 13:38 ` Vivek Goyal
2008-11-13 9:05 ` Ryo Tsuruta
2008-11-13 15:58 ` Vivek Goyal
2008-11-13 18:41 ` Divyesh Shah
2008-11-13 21:46 ` Vivek Goyal
2008-11-13 22:57 ` Divyesh Shah
2008-11-14 16:05 ` Vivek Goyal
2008-11-14 22:44 ` Nauman Rafique
2008-11-17 14:23 ` Vivek Goyal
2008-11-18 2:02 ` Li Zefan
2008-11-18 5:01 ` Nauman Rafique
2008-11-18 7:42 ` Li Zefan
2008-11-18 22:23 ` Nauman Rafique
2008-11-18 12:05 ` Fabio Checconi
2008-11-18 14:07 ` Vivek Goyal
2008-11-18 14:41 ` Fabio Checconi
2008-11-18 19:12 ` Jens Axboe
2008-11-18 19:47 ` Vivek Goyal
2008-11-18 21:14 ` Fabio Checconi
2008-11-19 1:52 ` Aaron Carroll
2008-11-19 10:17 ` Fabio Checconi
2008-11-19 11:06 ` Fabio Checconi
2008-11-20 4:45 ` Aaron Carroll
2008-11-20 6:56 ` Fabio Checconi
2008-11-19 14:30 ` Jens Axboe
2008-11-19 15:52 ` Fabio Checconi
2008-11-18 23:07 ` Nauman Rafique
2008-11-19 14:24 ` Jens Axboe
2008-11-20 0:12 ` Divyesh Shah
2008-11-20 8:16 ` Jens Axboe
2008-11-20 13:40 ` Vivek Goyal
2008-11-20 19:54 ` Nauman Rafique
2008-11-20 21:15 ` Vivek Goyal
2008-11-20 22:42 ` Nauman Rafique
2008-11-21 15:22 ` Vivek Goyal [this message]
2008-11-26 6:40 ` Fernando Luis Vázquez Cao
2008-11-26 15:18 ` Vivek Goyal
2008-11-20 21:31 ` Vivek Goyal
2008-11-21 3:05 ` Fabio Checconi
2008-11-21 14:58 ` Vivek Goyal
2008-11-21 15:21 ` Fabio Checconi
2008-11-18 22:33 ` Nauman Rafique
2008-11-18 23:44 ` Fabio Checconi
2008-11-19 7:09 ` Paolo Valente
2008-11-13 22:13 ` Vivek Goyal
2008-11-20 9:20 ` Ryo Tsuruta
2008-11-20 13:47 ` Vivek Goyal
2008-11-25 2:33 ` Ryo Tsuruta
2008-11-25 16:27 ` Vivek Goyal
2008-11-25 22:38 ` Nauman Rafique
2008-11-26 14:06 ` Paolo Valente
2008-11-26 19:41 ` Nauman Rafique
2008-11-26 22:21 ` Fabio Checconi
2008-11-26 11:55 ` Fernando Luis Vázquez Cao
2008-11-26 12:47 ` Ryo Tsuruta
2008-11-26 16:08 ` Vivek Goyal
2008-11-27 8:43 ` Fernando Luis Vázquez Cao
2008-11-28 3:09 ` Ryo Tsuruta
2008-11-28 13:33 ` Ryo Tsuruta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081121152223.GE3111@redhat.com \
--to=vgoyal@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=balbir@linux.vnet.ibm.com \
--cc=containers@lists.linux-foundation.org \
--cc=dpshah@google.com \
--cc=fchecconi@gmail.com \
--cc=fernando@oss.ntt.co.jp \
--cc=jens.axboe@oracle.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=nauman@google.com \
--cc=ngupta@google.com \
--cc=paolo.valente@unimore.it \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
--cc=taka@valinux.co.jp \
--cc=virtualization@lists.linux-foundation.org \
--subject='Re: [patch 0/4] [RFC] Another proportional weight IO controller' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).