LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Satoshi UCHIDA" <s-uchida@ap.jp.nec.com>
To: "'Paul Menage'" <menage@google.com>,
	<linux-kernel@vger.kernel.org>,
	<containers@lists.linux-foundation.org>
Cc: <axboe@kernel.dk>, <tom-sugawara@ap.jp.nec.com>,
	<m-takahashi@ex.jp.nec.com>
Subject: [RFC][v2][patch 0/12][CFQ-cgroup]Yet another I/O bandwidth controlling subsystem for CGroups based on CFQ
Date: Thu, 3 Apr 2008 16:09:12 +0900	[thread overview]
Message-ID: <005d01c89559$9e538200$dafa8600$@jp.nec.com> (raw)
In-Reply-To: <6599ad830804021541s3c1e3197y77d87f63bf47e4b3@mail.gmail.com>

This patchset modified a name of subsystem (from "cfq_cgroup" to "cfq")
and a checking in create function.


This patchset introduce "Yet Another" I/O bandwidth controlling
subsystem for cgroups based on CFQ (called 2 layer CFQ).

The idea of 2 layer CFQ is to build fairness control per group on the top of existing CFQ control.
We add a new data structure called CFQ meta-data on the top of
cfqd in order to control I/O bandwidth for cgroups.
CFQ meta-data control cfq_datas by service tree (rb-tree) and
CFQ algorithm when synchronous I/O.
An active cfqd controls queue for cfq by service tree.
Namely, the CFQ meta-data control traditional CFQ data.
the CFQ data runs conventionally.

           cfqmd     cfqmd     (cfqmd = cfq meta-data)
            |          |
  cfqc  -- cfqd ----- cfqd     (cfqd = cfq data,
            |          |        cfqc = cfq cgroup data)
  cfqc  --[cfqd]----- cfqd
            ↑
     conventional control.


This patchset is gainst 2.6.25-rc2-mm1.


Last week, we found a patchset from Vasily Tarasov (Open VZ) that
posted to LKML.
   [RFC][PATCH 0/9] cgroups: block: cfq: I/O bandwidth controlling subsystem for CGroups based on CFQ
  http://lwn.net/Articles/274652/

Our subsystem and  Vasily's one are similar on the point of modifying
the CFQ subsystem, but they are different on the point of the layer of
implementation. Vasily's subsystem add a new layer for cgroup between
cfqd and cfqq, but our subsystem add a new layer for cgroup on the top
of cfqd.

The different of implementation from OpenVZ's one are:
   * top layer algorithm is also based on service tree, and
   * top layer program is stored in the different file (block/cfq-cgroup.c).

We hope to discuss not which is better implementation, but what is the
best way to implement I/O bandwidth control based on CFQ here.

Please give us your comments, questions and suggestions.



Finally, we introduce a usage of our implementation.

* Preparation for using 2 layer CFQ

 1. Adopt this patchset to kernel 2.6.25-rc2-mm1.

 2. Build kernel with CFQ-CGROUP option.

 3. Restart new kernel.

 4. Mount cfq_cgroup special device to device directory.
    ex.
      mkdir /dev/cgroup
      mount -t cgroup -o cfq cfq /dev/cgroup


* Usage of grouping control.
 - Create New group
      Make new directory under /dev/cgroup.
      For example, the following command genrerates a 'test1' group.
          mkdir /dev/cgroup/test1

 - Insert task to group
      Write process id(pid) on "tasks" entry in the corresponding group.
      For example, the following command sets task with pid 1100 into test1 group.
         echo 1100 > /dev/cgroup/test1/tasks
      Child tasks of this tasks is also inserted into test1 group.

 - Change I/O priority of group
     Write priority on "cfq.ioprio" entry in corresponding group.
     For example, the following command sets priority of rank 2 to 'test1' group.
         echo 2 > /dev/cgroup/test1/tasks
     I/O priority for cgroups takes the value from 0 to 7. It is same as
     existing per-task CFQ.

     
 - Change I/O priority of task
     Use existing "ionice" command.


* Example
 Two I/O load (dd command) runs some conditions.
  
 - When they are same group and same priority,

   program
     #!/bin/sh
     echo $$ > /dev/cgroup/tasks
     echo $$ > /dev/cgroup/test/tasks
     ionice -c 2 -n 3 dd if=/internal/data1 of=/dev/null bs=1M count=1K &
     ionice -c 2 -n 3 dd if=/internal/data2 of=/dev/null bs=1M count=1K &
     echo $$ > /dev/cgroup/test2/tasks
     echo $$ > /dev/cgroup/tasks
    
   result
     1024+0 records in
     1024+0 records out
     1073741824 bytes (1.1 GB) copied, 27.7676 s, 38.7 MB/s
     1024+0 records in
     1024+0 records out
     1073741824 bytes (1.1 GB) copied, 28.8482 s, 37.2 MB/s

    These tasks was fair, therefore they finished at similar time.


 - When they are same group and different priorities (0 and 7),

    program
      #!/bin/sh
      echo $$ > /dev/cgroup/tasks
      echo $$ > /dev/cgroup/test/tasks
      ionice -c 2 -n 0 dd if=/internal/data1 of=/dev/null bs=1M count=1K &
      ionice -c 2 -n 7 dd if=/internal/data2 of=/dev/null bs=1M count=1K &
      echo $$ > /dev/cgroup/test2/tasks
      echo $$ > /dev/cgroup/tasks

    result
      1024+0 records in
      1024+0 records out
      1073741824 bytes (1.1 GB) copied, 18.8373 s, 57.0 MB/s
      1024+0 records in
      1024+0 records out
      1073741824 bytes (1.1 GB) copied, 28.108 s, 38.2 MB/s


     The first task (copy data1) had high priority, therefore it finished at fast.
 
 - When they are different groups and different priorities (0 and 7),

    program
      #!/bin/sh
      echo $$ > /dev/cgroup/tasks
      echo $$ > /dev/cgroup/test/tasks
      ionice -c 2 -n 0 dd if=/internal/data1 of=/dev/null bs=1M count=1K 
      echo $$ > /dev/cgroup/test2/tasks
      ionice -c 2 -n 7 dd if=/internal/data2 of=/dev/null bs=1M count=1K 
      echo $$ > /dev/cgroup/tasks

    result
      1024+0 records in
      1024+0 records out
      1073741824 bytes (1.1 GB) copied, 28.1661 s, 38.1 MB/s
      1024+0 records in
      1024+0 records out
      1073741824 bytes (1.1 GB) copied, 28.8486 s, 37.2 MB/s

     The first task (copy data1) had  high priority, but they finished at similar time.
     Because their groups had same priority.

 - When they are different groups with different priorities (7 and 0)
   and same priority,

    program
      #!/bin/sh
      echo $$ > /dev/cgroup/tasks
      echo 7 > /dev/cgroup/test/cfq.ioprio
      echo $$ > /dev/cgroup/test/tasks
      ionice -c 2 -n 0 dd if=/internal/data1 of=/dev/null bs=1M count=1K >& test1.log &
      echo 0 > /dev/cgroup/test2/cfq.ioprio
      echo $$ > /dev/cgroup/test2/tasks
      ionice -c 2 -n 7 dd if=/internal/data2 of=/dev/null bs=1M count=1K >& test2.log &
      echo $$ > /dev/cgroup/tasks

    result
      === test1.log ===
        1024+0 records in
        1024+0 records out
        1073741824 bytes (1.1 GB) copied, 27.3971 s, 39.2 MB/s
      === test2.log ===
        1024+0 records in
        1024+0 records out
        1073741824 bytes (1.1 GB) copied, 17.3837 s, 61.8 MB/s

     This first task (copy data1) had high priority, but they finished at late.
     Because its group had low priority.


=====
 Satoshi UHICDA
   NEC Corporation.



  parent reply	other threads:[~2008-04-03  7:09 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-01  9:22 [RFC][patch 0/11][CFQ-cgroup]Yet " Satoshi UCHIDA
2008-04-01  9:27 ` [RFC][patch 1/11][CFQ-cgroup] Add Configuration Satoshi UCHIDA
2008-04-01  9:30 ` [RFC][patch 2/11][CFQ-cgroup] Move header file Satoshi UCHIDA
2008-04-01  9:32 ` [RFC][patch 3/11][CFQ-cgroup] Introduce cgroup subsystem Satoshi UCHIDA
2008-04-02 22:41   ` Paul Menage
2008-04-03  2:31     ` Satoshi UCHIDA
2008-04-03  2:39       ` Li Zefan
2008-04-03 15:31       ` Paul Menage
2008-04-03  7:09     ` Satoshi UCHIDA [this message]
2008-04-03  7:11       ` [PATCH] [RFC][patch 1/12][CFQ-cgroup] Add Configuration Satoshi UCHIDA
2008-04-03  7:12       ` [RFC][patch 2/11][CFQ-cgroup] Move header file Satoshi UCHIDA
2008-04-03  7:12       ` [RFC][patch 3/12][CFQ-cgroup] Introduce cgroup subsystem Satoshi UCHIDA
2008-04-03  7:13       ` [PATCH] [RFC][patch 4/12][CFQ-cgroup] Add ioprio entry Satoshi UCHIDA
2008-04-03  7:14       ` [RFC][patch 5/12][CFQ-cgroup] Create cfq driver unique data Satoshi UCHIDA
2008-04-03  7:14       ` [RFC][patch 6/12][CFQ-cgroup] Add cfq optional operation framework Satoshi UCHIDA
2008-04-03  7:15       ` [RFC][patch 7/12][CFQ-cgroup] Add new control layer over traditional control layer Satoshi UCHIDA
2008-04-03  7:15       ` [RFC][patch 8/12][CFQ-cgroup] Control cfq_data per driver Satoshi UCHIDA
2008-04-03  7:16       ` [RFC][patch 9/12][CFQ-cgroup] Control cfq_data per cgroup Satoshi UCHIDA
2008-04-03  7:16       ` [PATCH] [RFC][patch 10/12][CFQ-cgroup] Search cfq_data when not connected Satoshi UCHIDA
2008-04-03  7:17       ` [RFC][patch 11/12][CFQ-cgroup] Control service tree: Main functions Satoshi UCHIDA
2008-04-03  7:18       ` [RFC][patch 12/12][CFQ-cgroup] entry/remove active cfq_data Satoshi UCHIDA
2008-04-25  9:54       ` [RFC][v2][patch 0/12][CFQ-cgroup]Yet another I/O bandwidth controlling subsystem for CGroups based on CFQ Ryo Tsuruta
2008-04-25 21:37         ` [Devel] " Florian Westphal
2008-04-29  0:44           ` Ryo Tsuruta
2008-05-09 10:17         ` Satoshi UCHIDA
2008-05-12  3:10           ` Ryo Tsuruta
2008-05-12 15:33             ` Ryo Tsuruta
2008-05-22 13:04               ` Ryo Tsuruta
2008-05-23  2:53                 ` Satoshi UCHIDA
2008-05-26  2:46                   ` Ryo Tsuruta
2008-05-27 11:32                     ` Satoshi UCHIDA
2008-05-30 10:37                       ` Andrea Righi
2008-06-18  9:48                         ` Satoshi UCHIDA
2008-06-18 22:33                           ` Andrea Righi
2008-06-22 17:04                           ` Andrea Righi
2008-06-03  8:15                       ` Ryo Tsuruta
2008-06-26  4:49                         ` Satoshi UCHIDA
2008-04-01  9:33 ` [RFC][patch 4/11][CFQ-cgroup] Create cfq driver unique data Satoshi UCHIDA
2008-04-01  9:35 ` [RFC][patch 5/11][CFQ-cgroup] Add cfq optional operation framework Satoshi UCHIDA
2008-04-01  9:36 ` [RFC][patch 6/11][CFQ-cgroup] Add new control layer over traditional control layer Satoshi UCHIDA
2008-04-01  9:37 ` [RFC][patch 7/11][CFQ-cgroup] Control cfq_data per driver Satoshi UCHIDA
2008-04-01  9:38 ` [RFC][patch 8/11][CFQ-cgroup] Control cfq_data per cgroup Satoshi UCHIDA
2008-04-03 15:35   ` Paul Menage
2008-04-04  6:20     ` Satoshi UCHIDA
2008-04-04  9:00       ` Paul Menage
2008-04-04  9:46         ` Satoshi UCHIDA
2008-04-01  9:40 ` [RFC][patch 9/11][CFQ-cgroup] Search cfq_data when not connected Satoshi UCHIDA
2008-04-01  9:41 ` [RFC][patch 10/11][CFQ-cgroup] Control service tree: Main functions Satoshi UCHIDA
2008-04-01  9:42 ` [RFC][patch 11/11][CFQ-cgroup] entry/remove active cfq_data Satoshi UCHIDA

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='005d01c89559$9e538200$dafa8600$@jp.nec.com' \
    --to=s-uchida@ap.jp.nec.com \
    --cc=axboe@kernel.dk \
    --cc=containers@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m-takahashi@ex.jp.nec.com \
    --cc=menage@google.com \
    --cc=tom-sugawara@ap.jp.nec.com \
    --subject='Re: [RFC][v2][patch 0/12][CFQ-cgroup]Yet another I/O bandwidth controlling subsystem for CGroups based on CFQ' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).