LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 2] Avoid preferential treatment of groups that aren't backlogged
@ 2011-02-11 19:39 Chad Talbott
  2011-02-14 16:41 ` Vivek Goyal
  0 siblings, 1 reply; 8+ messages in thread
From: Chad Talbott @ 2011-02-11 19:39 UTC (permalink / raw)
  To: jaxboe, vgoyal; +Cc: guijianfeng, mrubin, teravest, jmoyer, linux-kernel

Problem: If a group isn't backlogged, we remove it from the service
tree. When it becomes active again, it gets either the minimum vtime
of the tree or gets put at the "back of the line." That is true even
when the group was idle for a very small time, and it consumed some IO
time right before it became idle.  If group has very small weight, it
can end up using more disk time than its fair share. Conversely, if it
has a very large weight, being put at the back of the vtime line is a
large penalty which prevents it consuming its share.

Solution: We solve the problem by assigning the group its old vtime if
it has not been idle long enough. Otherise we assign it the service
tree's min vtime.

Complications: When an entire service tree becomes completely idle, we
lose the vtime state. All the old vtime values are not relevant any
more. For example, consider the case when the service tree is idle and
a brand new group sends IO. That group would have an old vtime value
of zero, but the service tree's vtime would become closer to zero. In
such a case, it would be unfair for the older groups to get a much
higher old vtime stored in them.

We solve that issue by keeping a generation number that counts the
number of instances when the service tree becomes completely
empty. The generation number is stored in each group too. If a group
becomes backlogged after a service tree has been empty, we compare its
stored generation number with the current service tree generation
number, and discard the old vtime if the group generation number is
stale.

The preemption specific code is taken care of automatically because we
allow preemption checks after inserting a group back into the service
tree and assigning it an appropriate vtime.

Signed-off-by: Chad Talbott <ctalbott@google.com>
---
 block/cfq-iosched.c |   24 +++++++++++++-----------
 1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 501ffdf..f09f3fe 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -178,6 +178,7 @@ struct cfq_group {
 
 	/* group service_tree key */
 	u64 vdisktime;
+	u64 generation_num;
 	unsigned int weight;
 
 	/* number of cfqq currently on this group */
@@ -300,6 +301,9 @@ struct cfq_data {
 	/* List of cfq groups being managed on this device*/
 	struct hlist_head cfqg_list;
 	struct rcu_head rcu;
+
+	/* Generation number, counts service tree empty events */
+	u64 active_generation;
 };
 
 static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
@@ -873,18 +877,13 @@ cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
 	if (!RB_EMPTY_NODE(&cfqg->rb_node))
 		return;
 
-	/*
-	 * Currently put the group at the end. Later implement something
-	 * so that groups get lesser vtime based on their weights, so that
-	 * if group does not loose all if it was not continously backlogged.
-	 */
-	n = rb_last(&st->rb);
-	if (n) {
-		__cfqg = rb_entry_cfqg(n);
-		cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
-	} else
+	if (cfqd->active_generation > cfqg->generation_num)
 		cfqg->vdisktime = st->min_vdisktime;
-
+	else
+		/* We assume that vdisktime was not modified when the task
+		   was off the service tree.
+		 */
+		cfqg->vdisktime = max(st->min_vdisktime, cfqg->vdisktime);
 	__cfq_group_service_tree_add(st, cfqg);
 	st->total_weight += cfqg->weight;
 }
@@ -906,6 +905,9 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
 	if (!RB_EMPTY_NODE(&cfqg->rb_node))
 		cfq_rb_erase(&cfqg->rb_node, st);
 	cfqg->saved_workload_slice = 0;
+	cfqg->generation_num = cfqd->active_generation;
+	if (RB_EMPTY_ROOT(&st->rb))
+		cfqd->active_generation++;
 	cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
 }
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2] Avoid preferential treatment of groups that aren't backlogged
  2011-02-11 19:39 [PATCH 2] Avoid preferential treatment of groups that aren't backlogged Chad Talbott
@ 2011-02-14 16:41 ` Vivek Goyal
  2011-02-14 18:01   ` [PATCH v3] " Chad Talbott
  0 siblings, 1 reply; 8+ messages in thread
From: Vivek Goyal @ 2011-02-14 16:41 UTC (permalink / raw)
  To: Chad Talbott; +Cc: jaxboe, guijianfeng, mrubin, teravest, jmoyer, linux-kernel

On Fri, Feb 11, 2011 at 11:39:30AM -0800, Chad Talbott wrote:

[..]
>  static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
> @@ -873,18 +877,13 @@ cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
>  	if (!RB_EMPTY_NODE(&cfqg->rb_node))
>  		return;
>  
> -	/*
> -	 * Currently put the group at the end. Later implement something
> -	 * so that groups get lesser vtime based on their weights, so that
> -	 * if group does not loose all if it was not continously backlogged.
> -	 */
> -	n = rb_last(&st->rb);
> -	if (n) {
> -		__cfqg = rb_entry_cfqg(n);
> -		cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
> -	} else
> +	if (cfqd->active_generation > cfqg->generation_num)
>  		cfqg->vdisktime = st->min_vdisktime;
> -
> +	else
> +		/* We assume that vdisktime was not modified when the task
> +		   was off the service tree.
> +		 */
> +		cfqg->vdisktime = max(st->min_vdisktime, cfqg->vdisktime);

I think above usage of max() also will give wrong results upon wrapping.
So how about using max_vdisktime() instead? I know that results are not
catastrophic when it happens, still lets use the right thing.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3] Avoid preferential treatment of groups that aren't backlogged
  2011-02-14 16:41 ` Vivek Goyal
@ 2011-02-14 18:01   ` Chad Talbott
  2011-02-14 18:07     ` Vivek Goyal
  2011-02-18 21:48     ` Vivek Goyal
  0 siblings, 2 replies; 8+ messages in thread
From: Chad Talbott @ 2011-02-14 18:01 UTC (permalink / raw)
  To: jaxboe, vgoyal; +Cc: guijianfeng, mrubin, teravest, jmoyer, linux-kernel

Problem: If a group isn't backlogged, we remove it from the service
tree. When it becomes active again, it gets either the minimum vtime
of the tree or gets put at the "back of the line." That is true even
when the group was idle for a very small time, and it consumed some IO
time right before it became idle.  If group has very small weight, it
can end up using more disk time than its fair share. Conversely, if it
has a very large weight, being put at the back of the vtime line is a
large penalty which prevents it consuming its share.

Solution: We solve the problem by assigning the group its old vtime if
it has not been idle long enough. Otherise we assign it the service
tree's min vtime.

Complications: When an entire service tree becomes completely idle, we
lose the vtime state. All the old vtime values are not relevant any
more. For example, consider the case when the service tree is idle and
a brand new group sends IO. That group would have an old vtime value
of zero, but the service tree's vtime would become closer to zero. In
such a case, it would be unfair for the older groups to get a much
higher old vtime stored in them.

We solve that issue by keeping a generation number that counts the
number of instances when the service tree becomes completely
empty. The generation number is stored in each group too. If a group
becomes backlogged after a service tree has been empty, we compare its
stored generation number with the current service tree generation
number, and discard the old vtime if the group generation number is
stale.

The preemption specific code is taken care of automatically because we
allow preemption checks after inserting a group back into the service
tree and assigning it an appropriate vtime.

Signed-off-by: Chad Talbott <ctalbott@google.com>
---
 block/cfq-iosched.c |   25 ++++++++++++++-----------
 1 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 501ffdf..216d87b 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -178,6 +178,7 @@ struct cfq_group {
 
 	/* group service_tree key */
 	u64 vdisktime;
+	u64 generation_num;
 	unsigned int weight;
 
 	/* number of cfqq currently on this group */
@@ -300,6 +301,9 @@ struct cfq_data {
 	/* List of cfq groups being managed on this device*/
 	struct hlist_head cfqg_list;
 	struct rcu_head rcu;
+
+	/* Generation number, counts service tree empty events */
+	u64 active_generation;
 };
 
 static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
@@ -873,18 +877,14 @@ cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
 	if (!RB_EMPTY_NODE(&cfqg->rb_node))
 		return;
 
-	/*
-	 * Currently put the group at the end. Later implement something
-	 * so that groups get lesser vtime based on their weights, so that
-	 * if group does not loose all if it was not continously backlogged.
-	 */
-	n = rb_last(&st->rb);
-	if (n) {
-		__cfqg = rb_entry_cfqg(n);
-		cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
-	} else
+	if (cfqd->active_generation > cfqg->generation_num)
 		cfqg->vdisktime = st->min_vdisktime;
-
+	else
+		/* We assume that vdisktime was not modified when the task
+		   was off the service tree.
+		 */
+		cfqg->vdisktime = max_vdisktime(st->min_vdisktime,
+						cfqg->vdisktime);
 	__cfq_group_service_tree_add(st, cfqg);
 	st->total_weight += cfqg->weight;
 }
@@ -906,6 +906,9 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
 	if (!RB_EMPTY_NODE(&cfqg->rb_node))
 		cfq_rb_erase(&cfqg->rb_node, st);
 	cfqg->saved_workload_slice = 0;
+	cfqg->generation_num = cfqd->active_generation;
+	if (RB_EMPTY_ROOT(&st->rb))
+		cfqd->active_generation++;
 	cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
 }
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Avoid preferential treatment of groups that aren't backlogged
  2011-02-14 18:01   ` [PATCH v3] " Chad Talbott
@ 2011-02-14 18:07     ` Vivek Goyal
  2011-02-18 21:48     ` Vivek Goyal
  1 sibling, 0 replies; 8+ messages in thread
From: Vivek Goyal @ 2011-02-14 18:07 UTC (permalink / raw)
  To: Chad Talbott; +Cc: jaxboe, guijianfeng, mrubin, teravest, jmoyer, linux-kernel

On Mon, Feb 14, 2011 at 10:01:11AM -0800, Chad Talbott wrote:
> Problem: If a group isn't backlogged, we remove it from the service
> tree. When it becomes active again, it gets either the minimum vtime
> of the tree or gets put at the "back of the line." That is true even
> when the group was idle for a very small time, and it consumed some IO
> time right before it became idle.  If group has very small weight, it
> can end up using more disk time than its fair share. Conversely, if it
> has a very large weight, being put at the back of the vtime line is a
> large penalty which prevents it consuming its share.
> 
> Solution: We solve the problem by assigning the group its old vtime if
> it has not been idle long enough. Otherise we assign it the service
> tree's min vtime.
> 
> Complications: When an entire service tree becomes completely idle, we
> lose the vtime state. All the old vtime values are not relevant any
> more. For example, consider the case when the service tree is idle and
> a brand new group sends IO. That group would have an old vtime value
> of zero, but the service tree's vtime would become closer to zero. In
> such a case, it would be unfair for the older groups to get a much
> higher old vtime stored in them.
> 
> We solve that issue by keeping a generation number that counts the
> number of instances when the service tree becomes completely
> empty. The generation number is stored in each group too. If a group
> becomes backlogged after a service tree has been empty, we compare its
> stored generation number with the current service tree generation
> number, and discard the old vtime if the group generation number is
> stale.
> 
> The preemption specific code is taken care of automatically because we
> allow preemption checks after inserting a group back into the service
> tree and assigning it an appropriate vtime.
> 
> Signed-off-by: Chad Talbott <ctalbott@google.com>
> ---

Thanks Chad. This patch looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

>  block/cfq-iosched.c |   25 ++++++++++++++-----------
>  1 files changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 501ffdf..216d87b 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -178,6 +178,7 @@ struct cfq_group {
>  
>  	/* group service_tree key */
>  	u64 vdisktime;
> +	u64 generation_num;
>  	unsigned int weight;
>  
>  	/* number of cfqq currently on this group */
> @@ -300,6 +301,9 @@ struct cfq_data {
>  	/* List of cfq groups being managed on this device*/
>  	struct hlist_head cfqg_list;
>  	struct rcu_head rcu;
> +
> +	/* Generation number, counts service tree empty events */
> +	u64 active_generation;
>  };
>  
>  static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
> @@ -873,18 +877,14 @@ cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
>  	if (!RB_EMPTY_NODE(&cfqg->rb_node))
>  		return;
>  
> -	/*
> -	 * Currently put the group at the end. Later implement something
> -	 * so that groups get lesser vtime based on their weights, so that
> -	 * if group does not loose all if it was not continously backlogged.
> -	 */
> -	n = rb_last(&st->rb);
> -	if (n) {
> -		__cfqg = rb_entry_cfqg(n);
> -		cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
> -	} else
> +	if (cfqd->active_generation > cfqg->generation_num)
>  		cfqg->vdisktime = st->min_vdisktime;
> -
> +	else
> +		/* We assume that vdisktime was not modified when the task
> +		   was off the service tree.
> +		 */
> +		cfqg->vdisktime = max_vdisktime(st->min_vdisktime,
> +						cfqg->vdisktime);
>  	__cfq_group_service_tree_add(st, cfqg);
>  	st->total_weight += cfqg->weight;
>  }
> @@ -906,6 +906,9 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
>  	if (!RB_EMPTY_NODE(&cfqg->rb_node))
>  		cfq_rb_erase(&cfqg->rb_node, st);
>  	cfqg->saved_workload_slice = 0;
> +	cfqg->generation_num = cfqd->active_generation;
> +	if (RB_EMPTY_ROOT(&st->rb))
> +		cfqd->active_generation++;
>  	cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
>  }
>  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Avoid preferential treatment of groups that aren't backlogged
  2011-02-14 18:01   ` [PATCH v3] " Chad Talbott
  2011-02-14 18:07     ` Vivek Goyal
@ 2011-02-18 21:48     ` Vivek Goyal
  2011-02-18 22:16       ` Chad Talbott
  1 sibling, 1 reply; 8+ messages in thread
From: Vivek Goyal @ 2011-02-18 21:48 UTC (permalink / raw)
  To: Chad Talbott; +Cc: jaxboe, guijianfeng, mrubin, teravest, jmoyer, linux-kernel

On Mon, Feb 14, 2011 at 10:01:11AM -0800, Chad Talbott wrote:
> Problem: If a group isn't backlogged, we remove it from the service
> tree. When it becomes active again, it gets either the minimum vtime
> of the tree or gets put at the "back of the line." That is true even
> when the group was idle for a very small time, and it consumed some IO
> time right before it became idle.  If group has very small weight, it
> can end up using more disk time than its fair share. Conversely, if it
> has a very large weight, being put at the back of the vtime line is a
> large penalty which prevents it consuming its share.
> 
> Solution: We solve the problem by assigning the group its old vtime if
> it has not been idle long enough. Otherise we assign it the service
> tree's min vtime.
> 
> Complications: When an entire service tree becomes completely idle, we
> lose the vtime state. All the old vtime values are not relevant any
> more. For example, consider the case when the service tree is idle and
> a brand new group sends IO. That group would have an old vtime value
> of zero, but the service tree's vtime would become closer to zero. In
> such a case, it would be unfair for the older groups to get a much
> higher old vtime stored in them.
> 
> We solve that issue by keeping a generation number that counts the
> number of instances when the service tree becomes completely
> empty. The generation number is stored in each group too. If a group
> becomes backlogged after a service tree has been empty, we compare its
> stored generation number with the current service tree generation
> number, and discard the old vtime if the group generation number is
> stale.
> 
> The preemption specific code is taken care of automatically because we
> allow preemption checks after inserting a group back into the service
> tree and assigning it an appropriate vtime.
> 
> Signed-off-by: Chad Talbott <ctalbott@google.com>
> ---

Chad, 

Got following compile warning with this patch.

  CC      block/cfq-iosched.o
block/cfq-iosched.c: In function ‘cfq_group_service_tree_add’:
block/cfq-iosched.c:874: warning: unused variable ‘n’
block/cfq-iosched.c:873: warning: unused variable ‘__cfqg’

Lets get rid of some unused variables. May be generate a V4 of the patch.

Thanks
Vivek

>  block/cfq-iosched.c |   25 ++++++++++++++-----------
>  1 files changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 501ffdf..216d87b 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -178,6 +178,7 @@ struct cfq_group {
>  
>  	/* group service_tree key */
>  	u64 vdisktime;
> +	u64 generation_num;
>  	unsigned int weight;
>  
>  	/* number of cfqq currently on this group */
> @@ -300,6 +301,9 @@ struct cfq_data {
>  	/* List of cfq groups being managed on this device*/
>  	struct hlist_head cfqg_list;
>  	struct rcu_head rcu;
> +
> +	/* Generation number, counts service tree empty events */
> +	u64 active_generation;
>  };
>  
>  static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
> @@ -873,18 +877,14 @@ cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
>  	if (!RB_EMPTY_NODE(&cfqg->rb_node))
>  		return;
>  
> -	/*
> -	 * Currently put the group at the end. Later implement something
> -	 * so that groups get lesser vtime based on their weights, so that
> -	 * if group does not loose all if it was not continously backlogged.
> -	 */
> -	n = rb_last(&st->rb);
> -	if (n) {
> -		__cfqg = rb_entry_cfqg(n);
> -		cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
> -	} else
> +	if (cfqd->active_generation > cfqg->generation_num)
>  		cfqg->vdisktime = st->min_vdisktime;
> -
> +	else
> +		/* We assume that vdisktime was not modified when the task
> +		   was off the service tree.
> +		 */
> +		cfqg->vdisktime = max_vdisktime(st->min_vdisktime,
> +						cfqg->vdisktime);
>  	__cfq_group_service_tree_add(st, cfqg);
>  	st->total_weight += cfqg->weight;
>  }
> @@ -906,6 +906,9 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
>  	if (!RB_EMPTY_NODE(&cfqg->rb_node))
>  		cfq_rb_erase(&cfqg->rb_node, st);
>  	cfqg->saved_workload_slice = 0;
> +	cfqg->generation_num = cfqd->active_generation;
> +	if (RB_EMPTY_ROOT(&st->rb))
> +		cfqd->active_generation++;
>  	cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
>  }
>  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] Avoid preferential treatment of groups that aren't backlogged
  2011-02-18 21:48     ` Vivek Goyal
@ 2011-02-18 22:16       ` Chad Talbott
  2011-02-18 22:19         ` [PATCH v4] " Chad Talbott
  0 siblings, 1 reply; 8+ messages in thread
From: Chad Talbott @ 2011-02-18 22:16 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: jaxboe, guijianfeng, mrubin, teravest, jmoyer, linux-kernel

On Fri, Feb 18, 2011 at 1:48 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> Chad,
>
> Got following compile warning with this patch.
>
>  CC      block/cfq-iosched.o
> block/cfq-iosched.c: In function ‘cfq_group_service_tree_add’:
> block/cfq-iosched.c:874: warning: unused variable ‘n’
> block/cfq-iosched.c:873: warning: unused variable ‘__cfqg’
>
> Lets get rid of some unused variables. May be generate a V4 of the patch.

Good catch, thanks Vivek.  I'll send a new patch.

Chad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4] Avoid preferential treatment of groups that aren't backlogged
  2011-02-18 22:16       ` Chad Talbott
@ 2011-02-18 22:19         ` Chad Talbott
  2011-02-21 15:32           ` Vivek Goyal
  0 siblings, 1 reply; 8+ messages in thread
From: Chad Talbott @ 2011-02-18 22:19 UTC (permalink / raw)
  To: jaxboe, vgoyal; +Cc: guijianfeng, mrubin, teravest, jmoyer, linux-kernel

Problem: If a group isn't backlogged, we remove it from the service
tree. When it becomes active again, it gets either the minimum vtime
of the tree or gets put at the "back of the line." That is true even
when the group was idle for a very small time, and it consumed some IO
time right before it became idle.  If group has very small weight, it
can end up using more disk time than its fair share. Conversely, if it
has a very large weight, being put at the back of the vtime line is a
large penalty which prevents it consuming its share.

Solution: We solve the problem by assigning the group its old vtime if
it has not been idle long enough. Otherise we assign it the service
tree's min vtime.

Complications: When an entire service tree becomes completely idle, we
lose the vtime state. All the old vtime values are not relevant any
more. For example, consider the case when the service tree is idle and
a brand new group sends IO. That group would have an old vtime value
of zero, but the service tree's vtime would become closer to zero. In
such a case, it would be unfair for the older groups to get a much
higher old vtime stored in them.

We solve that issue by keeping a generation number that counts the
number of instances when the service tree becomes completely
empty. The generation number is stored in each group too. If a group
becomes backlogged after a service tree has been empty, we compare its
stored generation number with the current service tree generation
number, and discard the old vtime if the group generation number is
stale.

The preemption specific code is taken care of automatically because we
allow preemption checks after inserting a group back into the service
tree and assigning it an appropriate vtime.

Signed-off-by: Chad Talbott <ctalbott@google.com>
---
 block/cfq-iosched.c |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 501ffdf..8a1b82f 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -178,6 +178,7 @@ struct cfq_group {
 
 	/* group service_tree key */
 	u64 vdisktime;
+	u64 generation_num;
 	unsigned int weight;
 
 	/* number of cfqq currently on this group */
@@ -300,6 +301,9 @@ struct cfq_data {
 	/* List of cfq groups being managed on this device*/
 	struct hlist_head cfqg_list;
 	struct rcu_head rcu;
+
+	/* Generation number, counts service tree empty events */
+	u64 active_generation;
 };
 
 static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
@@ -866,25 +870,19 @@ static void
 cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
 {
 	struct cfq_rb_root *st = &cfqd->grp_service_tree;
-	struct cfq_group *__cfqg;
-	struct rb_node *n;
 
 	cfqg->nr_cfqq++;
 	if (!RB_EMPTY_NODE(&cfqg->rb_node))
 		return;
 
-	/*
-	 * Currently put the group at the end. Later implement something
-	 * so that groups get lesser vtime based on their weights, so that
-	 * if group does not loose all if it was not continously backlogged.
-	 */
-	n = rb_last(&st->rb);
-	if (n) {
-		__cfqg = rb_entry_cfqg(n);
-		cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
-	} else
+	if (cfqd->active_generation > cfqg->generation_num)
 		cfqg->vdisktime = st->min_vdisktime;
-
+	else
+		/* We assume that vdisktime was not modified when the task
+		   was off the service tree.
+		 */
+		cfqg->vdisktime = max_vdisktime(st->min_vdisktime,
+						cfqg->vdisktime);
 	__cfq_group_service_tree_add(st, cfqg);
 	st->total_weight += cfqg->weight;
 }
@@ -906,6 +904,9 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
 	if (!RB_EMPTY_NODE(&cfqg->rb_node))
 		cfq_rb_erase(&cfqg->rb_node, st);
 	cfqg->saved_workload_slice = 0;
+	cfqg->generation_num = cfqd->active_generation;
+	if (RB_EMPTY_ROOT(&st->rb))
+		cfqd->active_generation++;
 	cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
 }
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] Avoid preferential treatment of groups that aren't backlogged
  2011-02-18 22:19         ` [PATCH v4] " Chad Talbott
@ 2011-02-21 15:32           ` Vivek Goyal
  0 siblings, 0 replies; 8+ messages in thread
From: Vivek Goyal @ 2011-02-21 15:32 UTC (permalink / raw)
  To: Chad Talbott; +Cc: jaxboe, guijianfeng, mrubin, teravest, jmoyer, linux-kernel

On Fri, Feb 18, 2011 at 02:19:16PM -0800, Chad Talbott wrote:
> Problem: If a group isn't backlogged, we remove it from the service
> tree. When it becomes active again, it gets either the minimum vtime
> of the tree or gets put at the "back of the line." That is true even
> when the group was idle for a very small time, and it consumed some IO
> time right before it became idle.  If group has very small weight, it
> can end up using more disk time than its fair share. Conversely, if it
> has a very large weight, being put at the back of the vtime line is a
> large penalty which prevents it consuming its share.
> 
> Solution: We solve the problem by assigning the group its old vtime if
> it has not been idle long enough. Otherise we assign it the service
> tree's min vtime.
> 
> Complications: When an entire service tree becomes completely idle, we
> lose the vtime state. All the old vtime values are not relevant any
> more. For example, consider the case when the service tree is idle and
> a brand new group sends IO. That group would have an old vtime value
> of zero, but the service tree's vtime would become closer to zero. In
> such a case, it would be unfair for the older groups to get a much
> higher old vtime stored in them.
> 
> We solve that issue by keeping a generation number that counts the
> number of instances when the service tree becomes completely
> empty. The generation number is stored in each group too. If a group
> becomes backlogged after a service tree has been empty, we compare its
> stored generation number with the current service tree generation
> number, and discard the old vtime if the group generation number is
> stale.
> 
> The preemption specific code is taken care of automatically because we
> allow preemption checks after inserting a group back into the service
> tree and assigning it an appropriate vtime.
> 
> Signed-off-by: Chad Talbott <ctalbott@google.com>

Looks good to me.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
>  block/cfq-iosched.c |   27 ++++++++++++++-------------
>  1 files changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 501ffdf..8a1b82f 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -178,6 +178,7 @@ struct cfq_group {
>  
>  	/* group service_tree key */
>  	u64 vdisktime;
> +	u64 generation_num;
>  	unsigned int weight;
>  
>  	/* number of cfqq currently on this group */
> @@ -300,6 +301,9 @@ struct cfq_data {
>  	/* List of cfq groups being managed on this device*/
>  	struct hlist_head cfqg_list;
>  	struct rcu_head rcu;
> +
> +	/* Generation number, counts service tree empty events */
> +	u64 active_generation;
>  };
>  
>  static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd);
> @@ -866,25 +870,19 @@ static void
>  cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
>  {
>  	struct cfq_rb_root *st = &cfqd->grp_service_tree;
> -	struct cfq_group *__cfqg;
> -	struct rb_node *n;
>  
>  	cfqg->nr_cfqq++;
>  	if (!RB_EMPTY_NODE(&cfqg->rb_node))
>  		return;
>  
> -	/*
> -	 * Currently put the group at the end. Later implement something
> -	 * so that groups get lesser vtime based on their weights, so that
> -	 * if group does not loose all if it was not continously backlogged.
> -	 */
> -	n = rb_last(&st->rb);
> -	if (n) {
> -		__cfqg = rb_entry_cfqg(n);
> -		cfqg->vdisktime = __cfqg->vdisktime + CFQ_IDLE_DELAY;
> -	} else
> +	if (cfqd->active_generation > cfqg->generation_num)
>  		cfqg->vdisktime = st->min_vdisktime;
> -
> +	else
> +		/* We assume that vdisktime was not modified when the task
> +		   was off the service tree.
> +		 */
> +		cfqg->vdisktime = max_vdisktime(st->min_vdisktime,
> +						cfqg->vdisktime);
>  	__cfq_group_service_tree_add(st, cfqg);
>  	st->total_weight += cfqg->weight;
>  }
> @@ -906,6 +904,9 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
>  	if (!RB_EMPTY_NODE(&cfqg->rb_node))
>  		cfq_rb_erase(&cfqg->rb_node, st);
>  	cfqg->saved_workload_slice = 0;
> +	cfqg->generation_num = cfqd->active_generation;
> +	if (RB_EMPTY_ROOT(&st->rb))
> +		cfqd->active_generation++;
>  	cfq_blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
>  }
>  

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-02-21 15:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-11 19:39 [PATCH 2] Avoid preferential treatment of groups that aren't backlogged Chad Talbott
2011-02-14 16:41 ` Vivek Goyal
2011-02-14 18:01   ` [PATCH v3] " Chad Talbott
2011-02-14 18:07     ` Vivek Goyal
2011-02-18 21:48     ` Vivek Goyal
2011-02-18 22:16       ` Chad Talbott
2011-02-18 22:19         ` [PATCH v4] " Chad Talbott
2011-02-21 15:32           ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).