LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
@ 2015-02-26  6:19 green
  2015-02-26  6:19 ` [PATCH 1/2] cpumask: Properly calculate cpumask values green
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: green @ 2015-02-26  6:19 UTC (permalink / raw)
  To: Rusty Russell, Andrew Morton, David S. Miller; +Cc: linux-kernel, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

I just got a report today from Tyson Whitehead <twhitehead@gmail.com>
that Lustre crashes when CPUMASK_OFFSTACK is enabled.

A little investigation revealed that this code:
        cpumask_t                       mask;
...
        cpumask_copy(&mask, topology_thread_cpumask(0));
        weight = cpus_weight(mask);

that was supposed to calculate number of cpu siblings/partitions returns
a crazy high number over 3000 which is impossible as I only have
8 cpu cores on my system.

So after a bit of digging I found out that:
cpumask_copy only copies up to nr_cpumask_bits (actual number of cpus I
have in the system),
where as cpumask_weight actully tries to count bits up to NR_CPUS.

Not only calculating up to NR_CPUS is wasteful in this case, and
since we know how many cpus we have in the system - it only makes sense
to calculate only this much anyway, it's wrong because the copy only copied
8 bits to our variable and the rest of it is some random stack garbage.

So I propose two patches here, the first one I am more certain about -
operations that operate on current cpuset like cpus_weight, but also
cpus_empty, cpus_$LOGICALop cpus_$BINARYop are converted from NR_CPUS to
nr_cpumask_bits (this is ok when CONFIG_CPUMASK_OFFSTACK is not set as it's
then defined to NR_CPUS anyway).
I am leaving __cpus_setall __cpus_clear out of it as these two look like
they deal with entire set and it would be useful for them to operate on
all NR_CPUS bits for the case if more cpus are added later and such.

The second patch that I am not sure if we wnat, but it seems to be useful
until struct cpumask is fully dynamic is to convert what looks like
whole-set operations e.g. copies, namely:
cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS
bits to ensure there's no stale garbage left in the mask should the
cpu count increases later.

I checked the code and allocating cpumasks on stack is not all that 
uncommon in the code, so this should be a worthwhile fix.

Please consider.

Oleg Drokin (2):
  cpumask: Properly calculate cpumask values
  cpumask: make whole cpumask operations like copy to work with NR_CPUS
    bits

 include/linux/cpumask.h | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

-- 
2.1.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] cpumask: Properly calculate cpumask values
  2015-02-26  6:19 [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK green
@ 2015-02-26  6:19 ` green
  2015-02-26  6:19 ` [PATCH 2/2] cpumask: make whole cpumask operations like copy to work with NR_CPUS bits green
  2015-02-27 11:46 ` [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK Rusty Russell
  2 siblings, 0 replies; 6+ messages in thread
From: green @ 2015-02-26  6:19 UTC (permalink / raw)
  To: Rusty Russell, Andrew Morton, David S. Miller; +Cc: linux-kernel, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

With CONFIG_CPUMASK_OFFSTACK enabled there seems to be some disparity
between theoretical maximum of CPUs in the system (NR_CPUS that is huge)
and the actual value that is calculated at runtime (nr_cpu_ids).

Functions like cpus_weight should only check up to nr_cpu_ids bits
in the cpu mask, as there's no point to go all the way to 8192 bits
of theoritically possibly CPUs.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 include/linux/cpumask.h | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 086549a..f0599e1 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -902,21 +902,24 @@ static inline int __cpu_test_and_set(int cpu, cpumask_t *addr)
 	return test_and_set_bit(cpu, addr->bits);
 }
 
-#define cpus_and(dst, src1, src2) __cpus_and(&(dst), &(src1), &(src2), NR_CPUS)
+#define cpus_and(dst, src1, src2) __cpus_and(&(dst), &(src1), &(src2), \
+					     nr_cpumask_bits)
 static inline int __cpus_and(cpumask_t *dstp, const cpumask_t *src1p,
 					const cpumask_t *src2p, unsigned int nbits)
 {
 	return bitmap_and(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_or(dst, src1, src2) __cpus_or(&(dst), &(src1), &(src2), NR_CPUS)
+#define cpus_or(dst, src1, src2) __cpus_or(&(dst), &(src1), &(src2), \
+					   nr_cpumask_bits)
 static inline void __cpus_or(cpumask_t *dstp, const cpumask_t *src1p,
 					const cpumask_t *src2p, unsigned int nbits)
 {
 	bitmap_or(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_xor(dst, src1, src2) __cpus_xor(&(dst), &(src1), &(src2), NR_CPUS)
+#define cpus_xor(dst, src1, src2) __cpus_xor(&(dst), &(src1), &(src2), \
+					     nr_cpumask_bits)
 static inline void __cpus_xor(cpumask_t *dstp, const cpumask_t *src1p,
 					const cpumask_t *src2p, unsigned int nbits)
 {
@@ -924,48 +927,50 @@ static inline void __cpus_xor(cpumask_t *dstp, const cpumask_t *src1p,
 }
 
 #define cpus_andnot(dst, src1, src2) \
-				__cpus_andnot(&(dst), &(src1), &(src2), NR_CPUS)
+				__cpus_andnot(&(dst), &(src1), &(src2), \
+					      nr_cpumask_bits)
 static inline int __cpus_andnot(cpumask_t *dstp, const cpumask_t *src1p,
 					const cpumask_t *src2p, unsigned int nbits)
 {
 	return bitmap_andnot(dstp->bits, src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_equal(src1, src2) __cpus_equal(&(src1), &(src2), NR_CPUS)
+#define cpus_equal(src1, src2) __cpus_equal(&(src1), &(src2), nr_cpumask_bits)
 static inline int __cpus_equal(const cpumask_t *src1p,
 					const cpumask_t *src2p, unsigned int nbits)
 {
 	return bitmap_equal(src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_intersects(src1, src2) __cpus_intersects(&(src1), &(src2), NR_CPUS)
+#define cpus_intersects(src1, src2) __cpus_intersects(&(src1), &(src2), \
+						      nr_cpumask_bits)
 static inline int __cpus_intersects(const cpumask_t *src1p,
 					const cpumask_t *src2p, unsigned int nbits)
 {
 	return bitmap_intersects(src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_subset(src1, src2) __cpus_subset(&(src1), &(src2), NR_CPUS)
+#define cpus_subset(src1, src2) __cpus_subset(&(src1), &(src2), nr_cpumask_bits)
 static inline int __cpus_subset(const cpumask_t *src1p,
 					const cpumask_t *src2p, unsigned int nbits)
 {
 	return bitmap_subset(src1p->bits, src2p->bits, nbits);
 }
 
-#define cpus_empty(src) __cpus_empty(&(src), NR_CPUS)
+#define cpus_empty(src) __cpus_empty(&(src), nr_cpumask_bits)
 static inline int __cpus_empty(const cpumask_t *srcp, unsigned int nbits)
 {
 	return bitmap_empty(srcp->bits, nbits);
 }
 
-#define cpus_weight(cpumask) __cpus_weight(&(cpumask), NR_CPUS)
+#define cpus_weight(cpumask) __cpus_weight(&(cpumask), nr_cpumask_bits)
 static inline int __cpus_weight(const cpumask_t *srcp, unsigned int nbits)
 {
 	return bitmap_weight(srcp->bits, nbits);
 }
 
 #define cpus_shift_left(dst, src, n) \
-			__cpus_shift_left(&(dst), &(src), (n), NR_CPUS)
+			__cpus_shift_left(&(dst), &(src), (n), nr_cpumask_bits)
 static inline void __cpus_shift_left(cpumask_t *dstp,
 					const cpumask_t *srcp, int n, int nbits)
 {
-- 
2.1.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] cpumask: make whole cpumask operations like copy to work with NR_CPUS bits
  2015-02-26  6:19 [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK green
  2015-02-26  6:19 ` [PATCH 1/2] cpumask: Properly calculate cpumask values green
@ 2015-02-26  6:19 ` green
  2015-02-27 11:46 ` [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK Rusty Russell
  2 siblings, 0 replies; 6+ messages in thread
From: green @ 2015-02-26  6:19 UTC (permalink / raw)
  To: Rusty Russell, Andrew Morton, David S. Miller; +Cc: linux-kernel, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

When we are doing things like cpumask_copy, and CONFIG_CPUMASK_OFFSTACK
is set, we only copy actual number of bits equal to number of CPUs
we have. But underlying allocations got NR_CPUS = 8192, so
if the cpumask is allocated on the stack or has other prefilled values
there's a lot of garbage left that might become exposed as more CPUs are
added into the system.
The patch converts such whole-mask functions:
cpumask_setall, cpumask_clear, cpumask_copy
to operate on the whole NR_CPUS value.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 include/linux/cpumask.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f0599e1..28a8bb3 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -324,21 +324,21 @@ static inline int cpumask_test_and_clear_cpu(int cpu, struct cpumask *cpumask)
 }
 
 /**
- * cpumask_setall - set all cpus (< nr_cpu_ids) in a cpumask
+ * cpumask_setall - set all cpus (< NR_CPUS) in a cpumask
  * @dstp: the cpumask pointer
  */
 static inline void cpumask_setall(struct cpumask *dstp)
 {
-	bitmap_fill(cpumask_bits(dstp), nr_cpumask_bits);
+	bitmap_fill(cpumask_bits(dstp), NR_CPUS);
 }
 
 /**
- * cpumask_clear - clear all cpus (< nr_cpu_ids) in a cpumask
+ * cpumask_clear - clear all cpus (< NR_CPUS) in a cpumask
  * @dstp: the cpumask pointer
  */
 static inline void cpumask_clear(struct cpumask *dstp)
 {
-	bitmap_zero(cpumask_bits(dstp), nr_cpumask_bits);
+	bitmap_zero(cpumask_bits(dstp), NR_CPUS);
 }
 
 /**
@@ -470,7 +470,7 @@ static inline bool cpumask_full(const struct cpumask *srcp)
 
 /**
  * cpumask_weight - Count of bits in *srcp
- * @srcp: the cpumask to count bits (< nr_cpu_ids) in.
+ * @srcp: the cpumask to count bits (< NR_CPUS) in.
  */
 static inline unsigned int cpumask_weight(const struct cpumask *srcp)
 {
@@ -511,7 +511,7 @@ static inline void cpumask_shift_left(struct cpumask *dstp,
 static inline void cpumask_copy(struct cpumask *dstp,
 				const struct cpumask *srcp)
 {
-	bitmap_copy(cpumask_bits(dstp), cpumask_bits(srcp), nr_cpumask_bits);
+	bitmap_copy(cpumask_bits(dstp), cpumask_bits(srcp), NR_CPUS);
 }
 
 /**
-- 
2.1.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
  2015-02-26  6:19 [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK green
  2015-02-26  6:19 ` [PATCH 1/2] cpumask: Properly calculate cpumask values green
  2015-02-26  6:19 ` [PATCH 2/2] cpumask: make whole cpumask operations like copy to work with NR_CPUS bits green
@ 2015-02-27 11:46 ` Rusty Russell
  2015-02-27 17:51   ` Oleg Drokin
  2 siblings, 1 reply; 6+ messages in thread
From: Rusty Russell @ 2015-02-27 11:46 UTC (permalink / raw)
  To: green, Andrew Morton, David S. Miller; +Cc: linux-kernel, Oleg Drokin

green@linuxhacker.ru writes:
> From: Oleg Drokin <green@linuxhacker.ru>
>
> I just got a report today from Tyson Whitehead <twhitehead@gmail.com>
> that Lustre crashes when CPUMASK_OFFSTACK is enabled.
>
> A little investigation revealed that this code:
>         cpumask_t                       mask;
> ...
>         cpumask_copy(&mask, topology_thread_cpumask(0));
>         weight = cpus_weight(mask);

Yes.  cpumask_weight should have been used here.  The old cpus_* are
deprecated.

> The second patch that I am not sure if we wnat, but it seems to be useful
> until struct cpumask is fully dynamic is to convert what looks like
> whole-set operations e.g. copies, namely:
> cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS
> bits to ensure there's no stale garbage left in the mask should the
> cpu count increases later.

You can't do this, because dynamically allocated cpumasks don't have
NR_CPUS bits.

Let's just kill all the cpus_ functions.  This wasn't done originally
because archs which didn't care about offline cpumasks didn't want the
churn.  In particular, they must not copy struct cpumask by assignment,
and fixing those is a fair bit of churn.

The following is the minimal fix:

Cheers,
Rusty.

CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS: set if CPUMASK_OFFSTACK.

Using these functions with offstack cpus is unsafe.  They use all NR_CPUS
bits, unstead of nr_cpumask_bits.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/lib/Kconfig b/lib/Kconfig
index 87da53bb1fef..51b4210f3da9 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -398,8 +398,7 @@ config CPUMASK_OFFSTACK
 	  stack overflow.
 
 config DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
-       bool "Disable obsolete cpumask functions" if DEBUG_PER_CPU_MAPS
-       depends on BROKEN
+       bool "Disable obsolete cpumask functions" if CPUMASK_OFFSTACK
 
 config CPU_RMAP
 	bool

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
  2015-02-27 11:46 ` [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK Rusty Russell
@ 2015-02-27 17:51   ` Oleg Drokin
  2015-03-02 11:28     ` Rusty Russell
  0 siblings, 1 reply; 6+ messages in thread
From: Oleg Drokin @ 2015-02-27 17:51 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Andrew Morton, David S. Miller,
	<linux-kernel@vger.kernel.org> Mailing List

Hello!

On Feb 27, 2015, at 6:46 AM, Rusty Russell wrote:

> green@linuxhacker.ru writes:
>> From: Oleg Drokin <green@linuxhacker.ru>
>> 
>> I just got a report today from Tyson Whitehead <twhitehead@gmail.com>
>> that Lustre crashes when CPUMASK_OFFSTACK is enabled.
>> 
>> A little investigation revealed that this code:
>>        cpumask_t                       mask;
>> ...
>>        cpumask_copy(&mask, topology_thread_cpumask(0));
>>        weight = cpus_weight(mask);
> Yes.  cpumask_weight should have been used here.  The old cpus_* are
> deprecated.

Oh! I guess we missed the announcement.
I'll convert it over.

Should I also do a patch converting all other users and removing the deprecated
functions while I am at it?

>> The second patch that I am not sure if we wnat, but it seems to be useful
>> until struct cpumask is fully dynamic is to convert what looks like
>> whole-set operations e.g. copies, namely:
>> cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS
>> bits to ensure there's no stale garbage left in the mask should the
>> cpu count increases later.
> You can't do this, because dynamically allocated cpumasks don't have
> NR_CPUS bits.

Well, right now they certainly have. As in, it's a static define and we allocate
a bitmap to fit the (in my case) up to 8192 bits into such off-stack masks.

The concern is since number of cpus is not really a fixed variable, when you
do cpumask initialization, and then number of cpus grows, the end of the mask
could be garbage? Am I overthinking this and it's not really a problem?

> Let's just kill all the cpus_ functions.  This wasn't done originally
> because archs which didn't care about offline cpumasks didn't want the
> churn.  In particular, they must not copy struct cpumask by assignment,
> and fixing those is a fair bit of churn.

Ah, copy masks by assignment, I see.
Well, I guess we can eliminate the users outside of the affected arch trees
(I assume in x86 there would be no objections?) and perhaps add a warning to
checkpatch.pl?

Thanks!

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK
  2015-02-27 17:51   ` Oleg Drokin
@ 2015-03-02 11:28     ` Rusty Russell
  0 siblings, 0 replies; 6+ messages in thread
From: Rusty Russell @ 2015-03-02 11:28 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Andrew Morton, David S. Miller,
	<linux-kernel@vger.kernel.org> Mailing List

Oleg Drokin <green@linuxhacker.ru> writes:
>>> The second patch that I am not sure if we wnat, but it seems to be useful
>>> until struct cpumask is fully dynamic is to convert what looks like
>>> whole-set operations e.g. copies, namely:
>>> cpumask_setall, cpumask_clear, cpumask_copy to always operate on NR_CPUS
>>> bits to ensure there's no stale garbage left in the mask should the
>>> cpu count increases later.
>> You can't do this, because dynamically allocated cpumasks don't have
>> NR_CPUS bits.
>
> Well, right now they certainly have. As in, it's a static define and we allocate
> a bitmap to fit the (in my case) up to 8192 bits into such off-stack masks.

You're right, we should fix that properly.  Right now, cpumask_size() has:

	/* FIXME: Once all cpumask assignments are eliminated, this
	 * can be nr_cpumask_bits */
	return BITS_TO_LONGS(NR_CPUS) * sizeof(long);

>> Let's just kill all the cpus_ functions.  This wasn't done originally
>> because archs which didn't care about offline cpumasks didn't want the
>> churn.  In particular, they must not copy struct cpumask by assignment,
>> and fixing those is a fair bit of churn.
>
> Ah, copy masks by assignment, I see.
> Well, I guess we can eliminate the users outside of the affected arch trees
> (I assume in x86 there would be no objections?) and perhaps add a warning to
> checkpatch.pl?

OK... I have done a sweep.  It's not *that* bad with spatch.

I'm going to remove all the old functions.  Expect a series RSN.

Cheers,
Rusty.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-03-02 11:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-26  6:19 [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK green
2015-02-26  6:19 ` [PATCH 1/2] cpumask: Properly calculate cpumask values green
2015-02-26  6:19 ` [PATCH 2/2] cpumask: make whole cpumask operations like copy to work with NR_CPUS bits green
2015-02-27 11:46 ` [PATCH 0/2] incorrect cpumask behavior with CPUMASK_OFFSTACK Rusty Russell
2015-02-27 17:51   ` Oleg Drokin
2015-03-02 11:28     ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).