LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [Patch 1/2] cciss: fix for 2TB support
@ 2007-02-21 21:10 Mike Miller (OS Dev)
2007-02-22 3:14 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Mike Miller (OS Dev) @ 2007-02-21 21:10 UTC (permalink / raw)
To: jens.axboe, akpm; +Cc: linux-kernel, linux-scsi, gregkh
Patch 1/2
This patch changes the way we determine if a logical volume is larger than 2TB. The
original test looked for a total_size of 0. Originally we added 1 to the total_size.
That would make our read_capacity return size 0 for >2TB lv's. We assumed that we
could not have a lv size of 0 so it seemed OK until we were in a clustered system. The
backup node would see a size of 0 due to the reservation on the drive. That caused
the driver to switch to 16-byte CDB's which are not supported on older controllers.
After that everything was broken.
It may seem petty but I don't see the value in trying to determine if the LBA is
beyond the 2TB boundary. That's why when we switch we use 16-byte CDB's for all
read/write operations.
Please consider this for inclusion.
Signed-off-by: Mike Miller <mike.miller@hp.com>
------------------------------------------------------------------------------------------
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index 05dfe35..916aab0 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1291,13 +1291,19 @@ static void cciss_update_drive_info(int
if (inq_buff == NULL)
goto mem_msg;
+ /* testing to see if 16-byte CDBs are already being used */
+ if (h->cciss_read == CCISS_READ_16) {
+ cciss_read_capacity_16(h->ctlr, drv_index, 1,
+ &total_size, &block_size);
+ goto geo_inq;
+ }
+
cciss_read_capacity(ctlr, drv_index, 1,
&total_size, &block_size);
- /* total size = last LBA + 1 */
- /* FFFFFFFF + 1 = 0, cannot have a logical volume of size 0 */
- /* so we assume this volume this must be >2TB in size */
- if (total_size == (__u32) 0) {
+ /* if read_capacity returns all F's this volume is >2TB in size */
+ /* so we switch to 16-byte CDB's for all read/write ops */
+ if (total_size == 0xFFFFFFFF) {
cciss_read_capacity_16(ctlr, drv_index, 1,
&total_size, &block_size);
h->cciss_read = CCISS_READ_16;
@@ -1306,6 +1312,7 @@ static void cciss_update_drive_info(int
h->cciss_read = CCISS_READ_10;
h->cciss_write = CCISS_WRITE_10;
}
+geo_inq:
cciss_geometry_inquiry(ctlr, drv_index, 1, total_size, block_size,
inq_buff, &h->drv[drv_index]);
@@ -1917,13 +1924,14 @@ static void cciss_geometry_inquiry(int c
drv->raid_level = inq_buff->data_byte[8];
}
drv->block_size = block_size;
- drv->nr_blocks = total_size;
+ drv->nr_blocks = total_size + 1;
t = drv->heads * drv->sectors;
if (t > 1) {
- unsigned rem = sector_div(total_size, t);
+ sector_t real_size = total_size + 1;
+ unsigned long rem = sector_div(real_size, t);
if (rem)
- total_size++;
- drv->cylinders = total_size;
+ real_size++;
+ drv->cylinders = real_size;
}
} else { /* Get geometry failed */
printk(KERN_WARNING "cciss: reading geometry failed\n");
@@ -1953,16 +1961,16 @@ cciss_read_capacity(int ctlr, int logvol
ctlr, buf, sizeof(ReadCapdata_struct),
1, logvol, 0, NULL, TYPE_CMD);
if (return_code == IO_OK) {
- *total_size = be32_to_cpu(*(__u32 *) buf->total_size)+1;
+ *total_size = be32_to_cpu(*(__u32 *) buf->total_size);
*block_size = be32_to_cpu(*(__u32 *) buf->block_size);
} else { /* read capacity command failed */
printk(KERN_WARNING "cciss: read capacity failed\n");
*total_size = 0;
*block_size = BLOCK_SIZE;
}
- if (*total_size != (__u32) 0)
+ if (*total_size != 0)
printk(KERN_INFO " blocks= %llu block_size= %d\n",
- (unsigned long long)*total_size, *block_size);
+ (unsigned long long)*total_size+1, *block_size);
kfree(buf);
return;
}
@@ -1989,7 +1997,7 @@ cciss_read_capacity_16(int ctlr, int log
1, logvol, 0, NULL, TYPE_CMD);
}
if (return_code == IO_OK) {
- *total_size = be64_to_cpu(*(__u64 *) buf->total_size)+1;
+ *total_size = be64_to_cpu(*(__u64 *) buf->total_size);
*block_size = be32_to_cpu(*(__u32 *) buf->block_size);
} else { /* read capacity command failed */
printk(KERN_WARNING "cciss: read capacity failed\n");
@@ -1997,7 +2005,7 @@ cciss_read_capacity_16(int ctlr, int log
*block_size = BLOCK_SIZE;
}
printk(KERN_INFO " blocks= %llu block_size= %d\n",
- (unsigned long long)*total_size, *block_size);
+ (unsigned long long)*total_size+1, *block_size);
kfree(buf);
return;
}
@@ -3119,8 +3127,9 @@ #endif /* CCISS_DEBUG */
}
cciss_read_capacity(cntl_num, i, 0, &total_size, &block_size);
- /* total_size = last LBA + 1 */
- if(total_size == (__u32) 0) {
+ /* If read_capacity returns all F's the logical is >2TB */
+ /* so we switch to 16-byte CDBs for all read/write ops */
+ if(total_size == 0xFFFFFFFF) {
cciss_read_capacity_16(cntl_num, i, 0,
&total_size, &block_size);
hba[cntl_num]->cciss_read = CCISS_READ_16;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 1/2] cciss: fix for 2TB support
2007-02-21 21:10 [Patch 1/2] cciss: fix for 2TB support Mike Miller (OS Dev)
@ 2007-02-22 3:14 ` Andrew Morton
2007-02-22 7:31 ` [PATCH] Speedup divides by cpu_power in scheduler Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Andrew Morton @ 2007-02-22 3:14 UTC (permalink / raw)
To: mike.miller
Cc: Mike Miller (OS Dev), jens.axboe, linux-kernel, linux-scsi, gregkh
On Wed, 21 Feb 2007 15:10:39 -0600 "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> wrote:
> Patch 1/2
>
> This patch changes the way we determine if a logical volume is larger than 2TB. The
> original test looked for a total_size of 0. Originally we added 1 to the total_size.
> That would make our read_capacity return size 0 for >2TB lv's. We assumed that we
> could not have a lv size of 0 so it seemed OK until we were in a clustered system. The
> backup node would see a size of 0 due to the reservation on the drive. That caused
> the driver to switch to 16-byte CDB's which are not supported on older controllers.
> After that everything was broken.
> It may seem petty but I don't see the value in trying to determine if the LBA is
> beyond the 2TB boundary. That's why when we switch we use 16-byte CDB's for all
> read/write operations.
> Please consider this for inclusion.
>
> ...
>
> + if (total_size == 0xFFFFFFFF) {
I seem to remember having already questioned this. total_size is sector_t, which
can be either 32-bit or 64-bit. Are you sure that comparison works as
intended in both cases?
> + if(total_size == 0xFFFFFFFF) {
> cciss_read_capacity_16(cntl_num, i, 0,
> &total_size, &block_size);
> hba[cntl_num]->cciss_read = CCISS_READ_16;
Here too.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] Speedup divides by cpu_power in scheduler
2007-02-22 3:14 ` Andrew Morton
@ 2007-02-22 7:31 ` Eric Dumazet
2007-02-22 7:56 ` Ingo Molnar
2007-02-22 8:19 ` [PATCH, take 2] " Eric Dumazet
2007-02-22 16:51 ` [Patch 1/2] cciss: fix for 2TB support Mike Miller (OS Dev)
2007-02-22 20:18 ` Mike Miller (OS Dev)
2 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2007-02-22 7:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1348 bytes --]
I noticed expensive divides done in try_to_wakeup() and find_busiest_group()
on a bi dual core Opteron machine (total of 4 cores), moderatly loaded (15.000
context switch per second)
oprofile numbers :
CPU: AMD64 processors, speed 2600.05 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 50000
samples % symbol name
...
613914 1.0498 try_to_wake_up
834 0.0013 :ffffffff80227ae1: div %rcx
77513 0.1191 :ffffffff80227ae4: mov %rax,%r11
608893 1.0413 find_busiest_group
1841 0.0031 :ffffffff802260bf: div %rdi
140109 0.2394 :ffffffff802260c2: test %sil,%sil
Some of these divides can use the reciprocal divides we introduced some time
ago (currently used in slab AFAIK)
We can assume a load will fit in a 32bits number, because with a
SCHED_LOAD_SCALE=128 value, its still a theorical limit of 33554432
When/if we reach this limit one day, probably cpus will have a fast hardware
divide and we can zap the reciprocal divide trick.
I did not convert the divide in cpu_avg_load_per_task(), because tracking
nr_running changes may be not worth it ? We could use a static table of 32
reciprocal values but it would add a conditional branch and table lookup.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: sched_use_reciprocal_divide.patch --]
[-- Type: text/plain, Size: 4359 bytes --]
--- linux-2.6.21-rc1/include/linux/sched.h 2007-02-21 21:08:32.000000000 +0100
+++ linux-2.6.21-rc1-ed/include/linux/sched.h 2007-02-22 08:53:26.000000000 +0100
@@ -669,7 +669,12 @@ struct sched_group {
* CPU power of this group, SCHED_LOAD_SCALE being max power for a
* single CPU. This is read only (except for setup, hotplug CPU).
*/
- unsigned long cpu_power;
+ unsigned int cpu_power;
+ /*
+ * reciprocal value of cpu_power to avoid expensive divides
+ * (see include/linux/reciprocal_div.h)
+ */
+ u32 reciprocal_cpu_power;
};
struct sched_domain {
--- linux-2.6.21-rc1/kernel/sched.c.orig 2007-02-21 21:10:54.000000000 +0100
+++ linux-2.6.21-rc1-ed/kernel/sched.c 2007-02-22 08:46:56.000000000 +0100
@@ -52,6 +52,7 @@
#include <linux/tsacct_kern.h>
#include <linux/kprobes.h>
#include <linux/delayacct.h>
+#include <linux/reciprocal_div.h>
#include <asm/tlb.h>
#include <asm/unistd.h>
@@ -182,6 +183,26 @@ static unsigned int static_prio_timeslic
}
/*
+ * Divide a load by a sched group cpu_power : (load / sg->cpu_power)
+ * Since cpu_power is a 'constant', we can use a reciprocal divide.
+ */
+static inline u32 sg_div_cpu_power(const struct sched_group *sg, u32 load)
+{
+ return reciprocal_divide(load, sg->reciprocal_cpu_power);
+}
+/*
+ * Each time a sched group cpu_power is changed,
+ * we must compute its reciprocal value
+ */
+static inline void sg_inc_cpu_power(struct sched_group *sg, u32 val)
+{
+ sg->cpu_power += val;
+ BUG_ON(sg->cpu_power == 0);
+ sg->reciprocal_cpu_power = reciprocal_value(sg->cpu_power);
+}
+
+
+/*
* task_timeslice() scales user-nice values [ -20 ... 0 ... 19 ]
* to time slice values: [800ms ... 100ms ... 5ms]
*
@@ -1241,7 +1262,8 @@ find_idlest_group(struct sched_domain *s
}
/* Adjust by relative CPU power of the group */
- avg_load = (avg_load * SCHED_LOAD_SCALE) / group->cpu_power;
+ avg_load = sg_div_cpu_power(group,
+ avg_load * SCHED_LOAD_SCALE);
if (local_group) {
this_load = avg_load;
@@ -2355,7 +2377,8 @@ find_busiest_group(struct sched_domain *
total_pwr += group->cpu_power;
/* Adjust by relative CPU power of the group */
- avg_load = (avg_load * SCHED_LOAD_SCALE) / group->cpu_power;
+ avg_load = sg_div_cpu_power(group,
+ avg_load * SCHED_LOAD_SCALE);
group_capacity = group->cpu_power / SCHED_LOAD_SCALE;
@@ -2510,8 +2533,8 @@ small_imbalance:
pwr_now /= SCHED_LOAD_SCALE;
/* Amount of load we'd subtract */
- tmp = busiest_load_per_task * SCHED_LOAD_SCALE /
- busiest->cpu_power;
+ tmp = sg_div_cpu_power(busiest,
+ busiest_load_per_task * SCHED_LOAD_SCALE);
if (max_load > tmp)
pwr_move += busiest->cpu_power *
min(busiest_load_per_task, max_load - tmp);
@@ -2519,10 +2542,11 @@ small_imbalance:
/* Amount of load we'd add */
if (max_load * busiest->cpu_power <
busiest_load_per_task * SCHED_LOAD_SCALE)
- tmp = max_load * busiest->cpu_power / this->cpu_power;
+ tmp = sg_div_cpu_power(this,
+ max_load * busiest->cpu_power);
else
- tmp = busiest_load_per_task * SCHED_LOAD_SCALE /
- this->cpu_power;
+ tmp = sg_div_cpu_power(this,
+ busiest_load_per_task * SCHED_LOAD_SCALE);
pwr_move += this->cpu_power *
min(this_load_per_task, this_load + tmp);
pwr_move /= SCHED_LOAD_SCALE;
@@ -6352,7 +6376,7 @@ next_sg:
continue;
}
- sg->cpu_power += sd->groups->cpu_power;
+ sg_inc_cpu_power(sg, sd->groups->cpu_power);
}
sg = sg->next;
if (sg != group_head)
@@ -6427,6 +6451,8 @@ static void init_sched_groups_power(int
child = sd->child;
+ sd->groups->cpu_power = 0;
+
/*
* For perf policy, if the groups in child domain share resources
* (for example cores sharing some portions of the cache hierarchy
@@ -6437,18 +6463,16 @@ static void init_sched_groups_power(int
if (!child || (!(sd->flags & SD_POWERSAVINGS_BALANCE) &&
(child->flags &
(SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES)))) {
- sd->groups->cpu_power = SCHED_LOAD_SCALE;
+ sg_inc_cpu_power(sd->groups, SCHED_LOAD_SCALE);
return;
}
- sd->groups->cpu_power = 0;
-
/*
* add cpu_power of each child group to this groups cpu_power
*/
group = child->groups;
do {
- sd->groups->cpu_power += group->cpu_power;
+ sg_inc_cpu_power(sd->groups, group->cpu_power);
group = group->next;
} while (group != child->groups);
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Speedup divides by cpu_power in scheduler
2007-02-22 7:31 ` [PATCH] Speedup divides by cpu_power in scheduler Eric Dumazet
@ 2007-02-22 7:56 ` Ingo Molnar
2007-02-22 8:19 ` [PATCH, take 2] " Eric Dumazet
1 sibling, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2007-02-22 7:56 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Morton, linux-kernel
* Eric Dumazet <dada1@cosmosbay.com> wrote:
> I noticed expensive divides done in try_to_wakeup() and
> find_busiest_group() on a bi dual core Opteron machine (total of 4
> cores), moderatly loaded (15.000 context switch per second)
>
> oprofile numbers :
nice patch! Ack for -mm testing:
Acked-by: Ingo Molnar <mingo@elte.hu>
one general suggestion: could you rename ->cpu_power to ->__cpu_power?
That makes it perfectly clear that this field's semantics have changed
and that it should never be manipulated directly without also changing
->reciprocal_cpu_power, and will also flag any out of tree code
trivially.
> + * Divide a load by a sched group cpu_power : (load / sg->cpu_power)
> + * Since cpu_power is a 'constant', we can use a reciprocal divide.
> + */
> +static inline u32 sg_div_cpu_power(const struct sched_group *sg, u32 load)
> +{
> + return reciprocal_divide(load, sg->reciprocal_cpu_power);
> +}
> +/*
> + * Each time a sched group cpu_power is changed,
> + * we must compute its reciprocal value
> + */
> +static inline void sg_inc_cpu_power(struct sched_group *sg, u32 val)
> +{
> + sg->cpu_power += val;
> + BUG_ON(sg->cpu_power == 0);
> + sg->reciprocal_cpu_power = reciprocal_value(sg->cpu_power);
> +}
Could you remove the BUG_ON() - it will most likely cause the
non-inlining of these functions if CONFIG_CC_OPTIMIZE_FOR_SIZE=y and
CONFIG_FORCED_INLINING is disabled (which is a popular combination in
distro kernels, it reduces the kernel's size by over 30%). And it's not
like we'll be able to overlook a divide by zero crash in
reciprocal_value() anyway, if cpu_power were to be zero ;-)
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH, take 2] Speedup divides by cpu_power in scheduler
2007-02-22 8:19 ` [PATCH, take 2] " Eric Dumazet
@ 2007-02-22 8:19 ` Ingo Molnar
0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2007-02-22 8:19 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Andrew Morton, linux-kernel
* Eric Dumazet <dada1@cosmosbay.com> wrote:
> Ingo suggested to rename cpu_power to __cpu_power to make clear it
> should not be modified without changing its reciprocal value too.
thanks,
Acked-by: Ingo Molnar <mingo@elte.hu>
> I did not convert the divide in cpu_avg_load_per_task(), because
> tracking nr_running changes may be not worth it ? We could use a
> static table of 32 reciprocal values but it would add a conditional
> branch and table lookup.
not worth it i think. Lets wait for it to show up in an oprofile? (if
ever)
Ingo
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH, take 2] Speedup divides by cpu_power in scheduler
2007-02-22 7:31 ` [PATCH] Speedup divides by cpu_power in scheduler Eric Dumazet
2007-02-22 7:56 ` Ingo Molnar
@ 2007-02-22 8:19 ` Eric Dumazet
2007-02-22 8:19 ` Ingo Molnar
1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2007-02-22 8:19 UTC (permalink / raw)
To: Ingo Molnar, Andrew Morton; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1484 bytes --]
I noticed expensive divides done in try_to_wakeup() and find_busiest_group()
on a bi dual core Opteron machine (total of 4 cores), moderatly loaded (15.000
context switch per second)
oprofile numbers :
CPU: AMD64 processors, speed 2600.05 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
mask of 0x00 (No unit mask) count 50000
samples % symbol name
...
613914 1.0498 try_to_wake_up
834 0.0013 :ffffffff80227ae1: div %rcx
77513 0.1191 :ffffffff80227ae4: mov %rax,%r11
608893 1.0413 find_busiest_group
1841 0.0031 :ffffffff802260bf: div %rdi
140109 0.2394 :ffffffff802260c2: test %sil,%sil
Some of these divides can use the reciprocal divides we introduced some time
ago (currently used in slab AFAIK)
We can assume a load will fit in a 32bits number, because with a
SCHED_LOAD_SCALE=128 value, its still a theorical limit of 33554432
When/if we reach this limit one day, probably cpus will have a fast hardware
divide and we can zap the reciprocal divide trick.
Ingo suggested to rename cpu_power to __cpu_power to make clear it should not
be modified without changing its reciprocal value too.
I did not convert the divide in cpu_avg_load_per_task(), because tracking
nr_running changes may be not worth it ? We could use a static table of 32
reciprocal values but it would add a conditional branch and table lookup.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
[-- Attachment #2: sched_use_reciprocal_divide.patch --]
[-- Type: text/plain, Size: 6320 bytes --]
--- linux-2.6.21-rc1/include/linux/sched.h 2007-02-21 21:08:32.000000000 +0100
+++ linux-2.6.21-rc1-ed/include/linux/sched.h 2007-02-22 10:12:00.000000000 +0100
@@ -668,8 +668,14 @@ struct sched_group {
/*
* CPU power of this group, SCHED_LOAD_SCALE being max power for a
* single CPU. This is read only (except for setup, hotplug CPU).
+ * Note : Never change cpu_power without recompute its reciprocal
*/
- unsigned long cpu_power;
+ unsigned int __cpu_power;
+ /*
+ * reciprocal value of cpu_power to avoid expensive divides
+ * (see include/linux/reciprocal_div.h)
+ */
+ u32 reciprocal_cpu_power;
};
struct sched_domain {
--- linux-2.6.21-rc1/kernel/sched.c 2007-02-21 21:10:54.000000000 +0100
+++ linux-2.6.21-rc1-ed/kernel/sched.c 2007-02-22 10:12:00.000000000 +0100
@@ -52,6 +52,7 @@
#include <linux/tsacct_kern.h>
#include <linux/kprobes.h>
#include <linux/delayacct.h>
+#include <linux/reciprocal_div.h>
#include <asm/tlb.h>
#include <asm/unistd.h>
@@ -182,6 +183,25 @@ static unsigned int static_prio_timeslic
}
/*
+ * Divide a load by a sched group cpu_power : (load / sg->__cpu_power)
+ * Since cpu_power is a 'constant', we can use a reciprocal divide.
+ */
+static inline u32 sg_div_cpu_power(const struct sched_group *sg, u32 load)
+{
+ return reciprocal_divide(load, sg->reciprocal_cpu_power);
+}
+/*
+ * Each time a sched group cpu_power is changed,
+ * we must compute its reciprocal value
+ */
+static inline void sg_inc_cpu_power(struct sched_group *sg, u32 val)
+{
+ sg->__cpu_power += val;
+ sg->reciprocal_cpu_power = reciprocal_value(sg->__cpu_power);
+}
+
+
+/*
* task_timeslice() scales user-nice values [ -20 ... 0 ... 19 ]
* to time slice values: [800ms ... 100ms ... 5ms]
*
@@ -1241,7 +1261,8 @@ find_idlest_group(struct sched_domain *s
}
/* Adjust by relative CPU power of the group */
- avg_load = (avg_load * SCHED_LOAD_SCALE) / group->cpu_power;
+ avg_load = sg_div_cpu_power(group,
+ avg_load * SCHED_LOAD_SCALE);
if (local_group) {
this_load = avg_load;
@@ -2352,12 +2373,13 @@ find_busiest_group(struct sched_domain *
}
total_load += avg_load;
- total_pwr += group->cpu_power;
+ total_pwr += group->__cpu_power;
/* Adjust by relative CPU power of the group */
- avg_load = (avg_load * SCHED_LOAD_SCALE) / group->cpu_power;
+ avg_load = sg_div_cpu_power(group,
+ avg_load * SCHED_LOAD_SCALE);
- group_capacity = group->cpu_power / SCHED_LOAD_SCALE;
+ group_capacity = group->__cpu_power / SCHED_LOAD_SCALE;
if (local_group) {
this_load = avg_load;
@@ -2468,8 +2490,8 @@ group_next:
max_pull = min(max_load - avg_load, max_load - busiest_load_per_task);
/* How much load to actually move to equalise the imbalance */
- *imbalance = min(max_pull * busiest->cpu_power,
- (avg_load - this_load) * this->cpu_power)
+ *imbalance = min(max_pull * busiest->__cpu_power,
+ (avg_load - this_load) * this->__cpu_power)
/ SCHED_LOAD_SCALE;
/*
@@ -2503,27 +2525,28 @@ small_imbalance:
* moving them.
*/
- pwr_now += busiest->cpu_power *
+ pwr_now += busiest->__cpu_power *
min(busiest_load_per_task, max_load);
- pwr_now += this->cpu_power *
+ pwr_now += this->__cpu_power *
min(this_load_per_task, this_load);
pwr_now /= SCHED_LOAD_SCALE;
/* Amount of load we'd subtract */
- tmp = busiest_load_per_task * SCHED_LOAD_SCALE /
- busiest->cpu_power;
+ tmp = sg_div_cpu_power(busiest,
+ busiest_load_per_task * SCHED_LOAD_SCALE);
if (max_load > tmp)
- pwr_move += busiest->cpu_power *
+ pwr_move += busiest->__cpu_power *
min(busiest_load_per_task, max_load - tmp);
/* Amount of load we'd add */
- if (max_load * busiest->cpu_power <
+ if (max_load * busiest->__cpu_power <
busiest_load_per_task * SCHED_LOAD_SCALE)
- tmp = max_load * busiest->cpu_power / this->cpu_power;
+ tmp = sg_div_cpu_power(this,
+ max_load * busiest->__cpu_power);
else
- tmp = busiest_load_per_task * SCHED_LOAD_SCALE /
- this->cpu_power;
- pwr_move += this->cpu_power *
+ tmp = sg_div_cpu_power(this,
+ busiest_load_per_task * SCHED_LOAD_SCALE);
+ pwr_move += this->__cpu_power *
min(this_load_per_task, this_load + tmp);
pwr_move /= SCHED_LOAD_SCALE;
@@ -5486,7 +5509,7 @@ static void sched_domain_debug(struct sc
break;
}
- if (!group->cpu_power) {
+ if (!group->__cpu_power) {
printk("\n");
printk(KERN_ERR "ERROR: domain->cpu_power not "
"set\n");
@@ -5663,7 +5686,7 @@ init_sched_build_groups(cpumask_t span,
continue;
sg->cpumask = CPU_MASK_NONE;
- sg->cpu_power = 0;
+ sg->__cpu_power = 0;
for_each_cpu_mask(j, span) {
if (group_fn(j, cpu_map, NULL) != group)
@@ -6352,7 +6375,7 @@ next_sg:
continue;
}
- sg->cpu_power += sd->groups->cpu_power;
+ sg_inc_cpu_power(sg, sd->groups->__cpu_power);
}
sg = sg->next;
if (sg != group_head)
@@ -6427,6 +6450,8 @@ static void init_sched_groups_power(int
child = sd->child;
+ sd->groups->__cpu_power = 0;
+
/*
* For perf policy, if the groups in child domain share resources
* (for example cores sharing some portions of the cache hierarchy
@@ -6437,18 +6462,16 @@ static void init_sched_groups_power(int
if (!child || (!(sd->flags & SD_POWERSAVINGS_BALANCE) &&
(child->flags &
(SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES)))) {
- sd->groups->cpu_power = SCHED_LOAD_SCALE;
+ sg_inc_cpu_power(sd->groups, SCHED_LOAD_SCALE);
return;
}
- sd->groups->cpu_power = 0;
-
/*
* add cpu_power of each child group to this groups cpu_power
*/
group = child->groups;
do {
- sd->groups->cpu_power += group->cpu_power;
+ sg_inc_cpu_power(sd->groups, group->__cpu_power);
group = group->next;
} while (group != child->groups);
}
@@ -6608,7 +6631,7 @@ static int build_sched_domains(const cpu
sd = &per_cpu(node_domains, j);
sd->groups = sg;
}
- sg->cpu_power = 0;
+ sg->__cpu_power = 0;
sg->cpumask = nodemask;
sg->next = sg;
cpus_or(covered, covered, nodemask);
@@ -6636,7 +6659,7 @@ static int build_sched_domains(const cpu
"Can not alloc domain group for node %d\n", j);
goto error;
}
- sg->cpu_power = 0;
+ sg->__cpu_power = 0;
sg->cpumask = tmp;
sg->next = prev->next;
cpus_or(covered, covered, tmp);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 1/2] cciss: fix for 2TB support
2007-02-22 3:14 ` Andrew Morton
2007-02-22 7:31 ` [PATCH] Speedup divides by cpu_power in scheduler Eric Dumazet
@ 2007-02-22 16:51 ` Mike Miller (OS Dev)
2007-02-22 21:24 ` Andrew Morton
2007-02-22 20:18 ` Mike Miller (OS Dev)
2 siblings, 1 reply; 15+ messages in thread
From: Mike Miller (OS Dev) @ 2007-02-22 16:51 UTC (permalink / raw)
To: Andrew Morton; +Cc: mike.miller, jens.axboe, linux-kernel, linux-scsi, gregkh
On Wed, Feb 21, 2007 at 07:14:27PM -0800, Andrew Morton wrote:
> On Wed, 21 Feb 2007 15:10:39 -0600 "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> wrote:
>
> > Patch 1/2
> >
> > This patch changes the way we determine if a logical volume is larger than 2TB. The
> > original test looked for a total_size of 0. Originally we added 1 to the total_size.
> > That would make our read_capacity return size 0 for >2TB lv's. We assumed that we
> > could not have a lv size of 0 so it seemed OK until we were in a clustered system. The
> > backup node would see a size of 0 due to the reservation on the drive. That caused
> > the driver to switch to 16-byte CDB's which are not supported on older controllers.
> > After that everything was broken.
> > It may seem petty but I don't see the value in trying to determine if the LBA is
> > beyond the 2TB boundary. That's why when we switch we use 16-byte CDB's for all
> > read/write operations.
> > Please consider this for inclusion.
> >
> > ...
> >
> > + if (total_size == 0xFFFFFFFF) {
>
> I seem to remember having already questioned this. total_size is sector_t, which
> can be either 32-bit or 64-bit. Are you sure that comparison works as
> intended in both cases?
>
>
> > + if(total_size == 0xFFFFFFFF) {
> > cciss_read_capacity_16(cntl_num, i, 0,
> > &total_size, &block_size);
> > hba[cntl_num]->cciss_read = CCISS_READ_16;
>
> Here too.
It has worked in all of the configs I've tested. Should I change it from sector_t to a
__64? I have not tested all possible configs.
-- mikem
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 1/2] cciss: fix for 2TB support
2007-02-22 3:14 ` Andrew Morton
2007-02-22 7:31 ` [PATCH] Speedup divides by cpu_power in scheduler Eric Dumazet
2007-02-22 16:51 ` [Patch 1/2] cciss: fix for 2TB support Mike Miller (OS Dev)
@ 2007-02-22 20:18 ` Mike Miller (OS Dev)
2007-02-22 21:22 ` Miller, Mike (OS Dev)
2 siblings, 1 reply; 15+ messages in thread
From: Mike Miller (OS Dev) @ 2007-02-22 20:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: mike.miller, jens.axboe, linux-kernel, linux-scsi, gregkh
On Wed, Feb 21, 2007 at 07:14:27PM -0800, Andrew Morton wrote:
> >
> > + if (total_size == 0xFFFFFFFF) {
>
> I seem to remember having already questioned this. total_size is sector_t, which
> can be either 32-bit or 64-bit. Are you sure that comparison works as
> intended in both cases?
>
>
> > + if(total_size == 0xFFFFFFFF) {
> > cciss_read_capacity_16(cntl_num, i, 0,
> > &total_size, &block_size);
> > hba[cntl_num]->cciss_read = CCISS_READ_16;
>
> Here too.
Andrew,
Using this test program and changing the type of x to int, long, long long signed and
unsigned the comparison always worked on x86, x86_64, and ia64. It looks to me like
the comparsion will always do what we expect. Unless you see some other problem.
#include <stdio.h>
int main(int argc, char *argv[])
{
unsigned long long x;
x = 0x00000000ffffffff;
printf(sizeof(x) == 8 ?
"x = %lld, sizeof(x) = %d\n" :
"x = %ld, sizeof(x) = %d\n", x, sizeof(x));
if (x == 0xffffffff)
printf("equal\n");
else
printf("not equal\n");
}
-- mikem
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [Patch 1/2] cciss: fix for 2TB support
2007-02-22 20:18 ` Mike Miller (OS Dev)
@ 2007-02-22 21:22 ` Miller, Mike (OS Dev)
0 siblings, 0 replies; 15+ messages in thread
From: Miller, Mike (OS Dev) @ 2007-02-22 21:22 UTC (permalink / raw)
To: Mike Miller (OS Dev), Andrew Morton
Cc: jens.axboe, linux-kernel, linux-scsi, gregkh
> -----Original Message-----
> From: Mike Miller (OS Dev) [mailto:mikem@beardog.cca.cpqcorp.net]
>
> Andrew,
> Using this test program and changing the type of x to int,
> long, long long signed and unsigned the comparison always
> worked on x86, x86_64, and ia64. It looks to me like the
> comparsion will always do what we expect. Unless you see some
> other problem.
>
>
> #include <stdio.h>
>
> int main(int argc, char *argv[])
> {
> unsigned long long x;
>
> x = 0x00000000ffffffff;
>
> printf(sizeof(x) == 8 ?
> "x = %lld, sizeof(x) = %d\n" :
> "x = %ld, sizeof(x) = %d\n", x, sizeof(x));
> if (x == 0xffffffff)
> printf("equal\n");
> else
> printf("not equal\n");
>
> }
>
> -- mikem
>
BTW: also changed x to be 8 f's, 16 f's, and 8 and 8 as shown.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 1/2] cciss: fix for 2TB support
2007-02-22 16:51 ` [Patch 1/2] cciss: fix for 2TB support Mike Miller (OS Dev)
@ 2007-02-22 21:24 ` Andrew Morton
2007-02-22 21:41 ` James Bottomley
0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2007-02-22 21:24 UTC (permalink / raw)
To: Mike Miller (OS Dev)
Cc: mike.miller, jens.axboe, linux-kernel, linux-scsi, gregkh
> On Thu, 22 Feb 2007 10:51:23 -0600 "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> wrote:
> On Wed, Feb 21, 2007 at 07:14:27PM -0800, Andrew Morton wrote:
> > On Wed, 21 Feb 2007 15:10:39 -0600 "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> wrote:
> >
> > > Patch 1/2
> > > + if (total_size == 0xFFFFFFFF) {
> >
> > I seem to remember having already questioned this. total_size is sector_t, which
> > can be either 32-bit or 64-bit. Are you sure that comparison works as
> > intended in both cases?
> >
> >
> > > + if(total_size == 0xFFFFFFFF) {
> > > cciss_read_capacity_16(cntl_num, i, 0,
> > > &total_size, &block_size);
> > > hba[cntl_num]->cciss_read = CCISS_READ_16;
> >
> > Here too.
> It has worked in all of the configs I've tested. Should I change it from sector_t to a
> __64? I have not tested all possible configs.
>
I'd suggest using -1: that just works.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 1/2] cciss: fix for 2TB support
2007-02-22 21:24 ` Andrew Morton
@ 2007-02-22 21:41 ` James Bottomley
2007-02-22 22:02 ` Mike Miller (OS Dev)
0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2007-02-22 21:41 UTC (permalink / raw)
To: Andrew Morton
Cc: Mike Miller (OS Dev),
mike.miller, jens.axboe, linux-kernel, linux-scsi, gregkh
On Thu, 2007-02-22 at 13:24 -0800, Andrew Morton wrote:
> > On Thu, 22 Feb 2007 10:51:23 -0600 "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> wrote:
> > On Wed, Feb 21, 2007 at 07:14:27PM -0800, Andrew Morton wrote:
> > > On Wed, 21 Feb 2007 15:10:39 -0600 "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> wrote:
> > >
> > > > Patch 1/2
> > > > + if (total_size == 0xFFFFFFFF) {
> > >
> > > I seem to remember having already questioned this. total_size is sector_t, which
> > > can be either 32-bit or 64-bit. Are you sure that comparison works as
> > > intended in both cases?
> > >
> > >
> > > > + if(total_size == 0xFFFFFFFF) {
> > > > cciss_read_capacity_16(cntl_num, i, 0,
> > > > &total_size, &block_size);
> > > > hba[cntl_num]->cciss_read = CCISS_READ_16;
> > >
> > > Here too.
> > It has worked in all of the configs I've tested. Should I change it from sector_t to a
> > __64? I have not tested all possible configs.
> >
>
> I'd suggest using -1: that just works.
Actually, no, that won't work.
This is a SCSI heuristic for determining when to use the 16 byte version
of the read capacity command. The 10 byte command can only return 32
bits of information (this is in sectors, so it returns up to 2TB of
bytes).
The heuristic requirement is that if the size is exactly 0xffffffff then
you should try the 16 byte command (which can return 64 bits of
information). If that fails then you assume the 0xfffffff is a real
size otherwize, you assume it was truncated and take the real result
from the 16 byte command.
You can see a far more elaborate version of this in operation in
sd.c:sd_read_capacity().
The only thing I'd suggest is to use 0xFFFFFFFFULL as the constant to
prevent sign extension issues.
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 1/2] cciss: fix for 2TB support
2007-02-22 21:41 ` James Bottomley
@ 2007-02-22 22:02 ` Mike Miller (OS Dev)
2007-02-22 22:06 ` James Bottomley
0 siblings, 1 reply; 15+ messages in thread
From: Mike Miller (OS Dev) @ 2007-02-22 22:02 UTC (permalink / raw)
To: James Bottomley
Cc: Andrew Morton, mike.miller, jens.axboe, linux-kernel, linux-scsi, gregkh
On Thu, Feb 22, 2007 at 03:41:24PM -0600, James Bottomley wrote:
> On Thu, 2007-02-22 at 13:24 -0800, Andrew Morton wrote:
> > > On Thu, 22 Feb 2007 10:51:23 -0600 "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> wrote:
> > > On Wed, Feb 21, 2007 at 07:14:27PM -0800, Andrew Morton wrote:
> > > > On Wed, 21 Feb 2007 15:10:39 -0600 "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> wrote:
> > > >
> > > > > Patch 1/2
> > > > > + if (total_size == 0xFFFFFFFF) {
> > > >
> > > > I seem to remember having already questioned this. total_size is sector_t, which
> > > > can be either 32-bit or 64-bit. Are you sure that comparison works as
> > > > intended in both cases?
> > > >
> > > >
> > > > > + if(total_size == 0xFFFFFFFF) {
> > > > > cciss_read_capacity_16(cntl_num, i, 0,
> > > > > &total_size, &block_size);
> > > > > hba[cntl_num]->cciss_read = CCISS_READ_16;
> > > >
> > > > Here too.
> > > It has worked in all of the configs I've tested. Should I change it from sector_t to a
> > > __64? I have not tested all possible configs.
> > >
> >
> > I'd suggest using -1: that just works.
>
> Actually, no, that won't work.
>
> This is a SCSI heuristic for determining when to use the 16 byte version
> of the read capacity command. The 10 byte command can only return 32
> bits of information (this is in sectors, so it returns up to 2TB of
> bytes).
>
> The heuristic requirement is that if the size is exactly 0xffffffff then
> you should try the 16 byte command (which can return 64 bits of
> information). If that fails then you assume the 0xfffffff is a real
> size otherwize, you assume it was truncated and take the real result
> from the 16 byte command.
>
> You can see a far more elaborate version of this in operation in
> sd.c:sd_read_capacity().
>
> The only thing I'd suggest is to use 0xFFFFFFFFULL as the constant to
> prevent sign extension issues.
>
> James
>
>
Will this patch for my patch work for now?
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index 1abf1f5..a1f1d9f 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1303,7 +1303,7 @@ static void cciss_update_drive_info(int
/* if read_capacity returns all F's this volume is >2TB in size */
/* so we switch to 16-byte CDB's for all read/write ops */
- if (total_size == 0xFFFFFFFF) {
+ if (total_size == 0xFFFFFFFFULL) {
cciss_read_capacity_16(ctlr, drv_index, 1,
&total_size, &block_size);
h->cciss_read = CCISS_READ_16;
@@ -3129,7 +3129,7 @@ #endif /* CCISS_DEBUG */
/* If read_capacity returns all F's the logical is >2TB */
/* so we switch to 16-byte CDBs for all read/write ops */
- if(total_size == 0xFFFFFFFF) {
+ if(total_size == 0xFFFFFFFFULL) {
cciss_read_capacity_16(cntl_num, i, 0,
&total_size, &block_size);
hba[cntl_num]->cciss_read = CCISS_READ_16;
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 1/2] cciss: fix for 2TB support
2007-02-22 22:02 ` Mike Miller (OS Dev)
@ 2007-02-22 22:06 ` James Bottomley
2007-02-23 20:52 ` Mike Miller (OS Dev)
0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2007-02-22 22:06 UTC (permalink / raw)
To: Mike Miller (OS Dev)
Cc: Andrew Morton, mike.miller, jens.axboe, linux-kernel, linux-scsi, gregkh
On Thu, 2007-02-22 at 16:02 -0600, Mike Miller (OS Dev) wrote:
> Will this patch for my patch work for now?
Yes, I think that should be fine ... it's only a theoretical worry; at
the moment sector_t is unsigned ... but just in case.
James
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 1/2] cciss: fix for 2TB support
2007-02-22 22:06 ` James Bottomley
@ 2007-02-23 20:52 ` Mike Miller (OS Dev)
2007-02-24 6:35 ` Andrew Morton
0 siblings, 1 reply; 15+ messages in thread
From: Mike Miller (OS Dev) @ 2007-02-23 20:52 UTC (permalink / raw)
To: James Bottomley
Cc: Andrew Morton, mike.miller, jens.axboe, linux-kernel, linux-scsi, gregkh
On Thu, Feb 22, 2007 at 04:06:41PM -0600, James Bottomley wrote:
> On Thu, 2007-02-22 at 16:02 -0600, Mike Miller (OS Dev) wrote:
> > Will this patch for my patch work for now?
>
> Yes, I think that should be fine ... it's only a theoretical worry; at
> the moment sector_t is unsigned ... but just in case.
>
> James
>
>
Andrew,
Are you waiting for a new patch from me? Or is my patch's patch sufficient?
-- mikem
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Patch 1/2] cciss: fix for 2TB support
2007-02-23 20:52 ` Mike Miller (OS Dev)
@ 2007-02-24 6:35 ` Andrew Morton
0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2007-02-24 6:35 UTC (permalink / raw)
To: Mike Miller (OS Dev)
Cc: James.Bottomley, mike.miller, jens.axboe, linux-kernel,
linux-scsi, gregkh
> On Fri, 23 Feb 2007 14:52:29 -0600 "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> wrote:
> On Thu, Feb 22, 2007 at 04:06:41PM -0600, James Bottomley wrote:
> > On Thu, 2007-02-22 at 16:02 -0600, Mike Miller (OS Dev) wrote:
> > > Will this patch for my patch work for now?
> >
> > Yes, I think that should be fine ... it's only a theoretical worry; at
> > the moment sector_t is unsigned ... but just in case.
> >
> > James
> >
> >
> Andrew,
> Are you waiting for a new patch from me? Or is my patch's patch sufficient?
>
It looked OK. But I'm travelling at present, will get back into things
late next week.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-02-24 6:39 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-21 21:10 [Patch 1/2] cciss: fix for 2TB support Mike Miller (OS Dev)
2007-02-22 3:14 ` Andrew Morton
2007-02-22 7:31 ` [PATCH] Speedup divides by cpu_power in scheduler Eric Dumazet
2007-02-22 7:56 ` Ingo Molnar
2007-02-22 8:19 ` [PATCH, take 2] " Eric Dumazet
2007-02-22 8:19 ` Ingo Molnar
2007-02-22 16:51 ` [Patch 1/2] cciss: fix for 2TB support Mike Miller (OS Dev)
2007-02-22 21:24 ` Andrew Morton
2007-02-22 21:41 ` James Bottomley
2007-02-22 22:02 ` Mike Miller (OS Dev)
2007-02-22 22:06 ` James Bottomley
2007-02-23 20:52 ` Mike Miller (OS Dev)
2007-02-24 6:35 ` Andrew Morton
2007-02-22 20:18 ` Mike Miller (OS Dev)
2007-02-22 21:22 ` Miller, Mike (OS Dev)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).