LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH] random: always use batched entropy for get_random_u{32,64} @ 2020-02-16 16:18 Jason A. Donenfeld 2020-02-16 18:23 ` Greg Kroah-Hartman 0 siblings, 1 reply; 12+ messages in thread From: Jason A. Donenfeld @ 2020-02-16 16:18 UTC (permalink / raw) To: tytso, linux-kernel; +Cc: Jason A. Donenfeld, Greg Kroah-Hartman It turns out that RDRAND is pretty slow. Comparing these two constructions: for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret)) arch_get_random_long(&ret); and long buf[CHACHA_BLOCK_SIZE / sizeof(long)]; extract_crng((u8 *)buf); it amortizes out to 352 cycles per long for the top one and 107 cycles per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H. And importantly, the top one has the drawback of not benefiting from the real rng, whereas the bottom one has all the nice benefits of using our own chacha rng. As get_random_u{32,64} gets used in more places (perhaps beyond what it was originally intended for when it was introduced as get_random_{int,long} back in the md5 monstrosity era), it seems like it might be a good thing to strengthen its posture a tiny bit. Doing this should only be stronger and not any weaker because that pool is already initialized with a bunch of rdrand data (when available). This way, we get the benefits of the hardware rng as well as our own rng. Another benefit of this is that we no longer hit pitfalls of the recent stream of AMD bugs in RDRAND. One often used code pattern for various things is: do { val = get_random_u32(); } while (hash_table_contains_key(val)); That recent AMD bug rendered that pattern useless, whereas we're really very certain that chacha20 output will give pretty distributed numbers, no matter what. So, this simplification seems better both from a security perspective and from a performance perspective. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- drivers/char/random.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index c7f9584de2c8..037fdb182b4d 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2166,15 +2166,6 @@ u64 get_random_u64(void) struct batched_entropy *batch; static void *previous; -#if BITS_PER_LONG == 64 - if (arch_get_random_long((unsigned long *)&ret)) - return ret; -#else - if (arch_get_random_long((unsigned long *)&ret) && - arch_get_random_long((unsigned long *)&ret + 1)) - return ret; -#endif - warn_unseeded_randomness(&previous); batch = raw_cpu_ptr(&batched_entropy_u64); @@ -2199,9 +2190,6 @@ u32 get_random_u32(void) struct batched_entropy *batch; static void *previous; - if (arch_get_random_int(&ret)) - return ret; - warn_unseeded_randomness(&previous); batch = raw_cpu_ptr(&batched_entropy_u32); -- 2.25.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] random: always use batched entropy for get_random_u{32,64} 2020-02-16 16:18 [PATCH] random: always use batched entropy for get_random_u{32,64} Jason A. Donenfeld @ 2020-02-16 18:23 ` Greg Kroah-Hartman 2020-02-20 22:20 ` Tony Luck 0 siblings, 1 reply; 12+ messages in thread From: Greg Kroah-Hartman @ 2020-02-16 18:23 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: tytso, linux-kernel On Sun, Feb 16, 2020 at 05:18:36PM +0100, Jason A. Donenfeld wrote: > It turns out that RDRAND is pretty slow. Comparing these two > constructions: > > for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret)) > arch_get_random_long(&ret); > > and > > long buf[CHACHA_BLOCK_SIZE / sizeof(long)]; > extract_crng((u8 *)buf); > > it amortizes out to 352 cycles per long for the top one and 107 cycles > per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H. > > And importantly, the top one has the drawback of not benefiting from the > real rng, whereas the bottom one has all the nice benefits of using our > own chacha rng. As get_random_u{32,64} gets used in more places (perhaps > beyond what it was originally intended for when it was introduced as > get_random_{int,long} back in the md5 monstrosity era), it seems like it > might be a good thing to strengthen its posture a tiny bit. Doing this > should only be stronger and not any weaker because that pool is already > initialized with a bunch of rdrand data (when available). This way, we > get the benefits of the hardware rng as well as our own rng. > > Another benefit of this is that we no longer hit pitfalls of the recent > stream of AMD bugs in RDRAND. One often used code pattern for various > things is: > > do { > val = get_random_u32(); > } while (hash_table_contains_key(val)); > > That recent AMD bug rendered that pattern useless, whereas we're really > very certain that chacha20 output will give pretty distributed numbers, > no matter what. > > So, this simplification seems better both from a security perspective > and from a performance perspective. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > --- > drivers/char/random.c | 12 ------------ > 1 file changed, 12 deletions(-) Looks good to me, thank for doing this: Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] random: always use batched entropy for get_random_u{32,64} 2020-02-16 18:23 ` Greg Kroah-Hartman @ 2020-02-20 22:20 ` Tony Luck 2020-02-20 22:29 ` Tony Luck 2020-02-21 20:07 ` Jason A. Donenfeld 0 siblings, 2 replies; 12+ messages in thread From: Tony Luck @ 2020-02-20 22:20 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jason A. Donenfeld, Ted Ts'o, Linux Kernel Mailing List On Sun, Feb 16, 2020 at 10:24 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > drivers/char/random.c | 12 ------------ > > 1 file changed, 12 deletions(-) > > Looks good to me, thank for doing this: > > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Perhaps also needs to update the comment above these functions. Specifically the bit that says "The quality of the random number is either as good as RDRAND" ... since you are no longer pulling from RDRAND it isn't true anymore. -Tony ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] random: always use batched entropy for get_random_u{32,64} 2020-02-20 22:20 ` Tony Luck @ 2020-02-20 22:29 ` Tony Luck 2020-02-21 20:08 ` Jason A. Donenfeld 2020-02-21 20:07 ` Jason A. Donenfeld 1 sibling, 1 reply; 12+ messages in thread From: Tony Luck @ 2020-02-20 22:29 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jason A. Donenfeld, Ted Ts'o, Linux Kernel Mailing List Also ... what's the deal with a spin_lock on a per-cpu structure? batch = raw_cpu_ptr(&batched_entropy_u64); spin_lock_irqsave(&batch->batch_lock, flags); if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) { extract_crng((u8 *)batch->entropy_u64); batch->position = 0; } ret = batch->entropy_u64[batch->position++]; spin_unlock_irqrestore(&batch->batch_lock, flags); Could we just disable interrupts and pre-emption around the entropy extraction? -Tony ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] random: always use batched entropy for get_random_u{32,64} 2020-02-20 22:29 ` Tony Luck @ 2020-02-21 20:08 ` Jason A. Donenfeld 2020-02-22 0:41 ` Theodore Y. Ts'o 0 siblings, 1 reply; 12+ messages in thread From: Jason A. Donenfeld @ 2020-02-21 20:08 UTC (permalink / raw) To: Tony Luck; +Cc: Greg Kroah-Hartman, Ted Ts'o, Linux Kernel Mailing List On Thu, Feb 20, 2020 at 11:29 PM Tony Luck <tony.luck@gmail.com> wrote: > > Also ... what's the deal with a spin_lock on a per-cpu structure? > > batch = raw_cpu_ptr(&batched_entropy_u64); > spin_lock_irqsave(&batch->batch_lock, flags); > if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) { > extract_crng((u8 *)batch->entropy_u64); > batch->position = 0; > } > ret = batch->entropy_u64[batch->position++]; > spin_unlock_irqrestore(&batch->batch_lock, flags); > > Could we just disable interrupts and pre-emption around the entropy extraction? Probably, yes... We can address this in a separate patch. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] random: always use batched entropy for get_random_u{32,64} 2020-02-21 20:08 ` Jason A. Donenfeld @ 2020-02-22 0:41 ` Theodore Y. Ts'o 2020-02-22 9:59 ` Jason A. Donenfeld 2020-02-24 20:41 ` Luck, Tony 0 siblings, 2 replies; 12+ messages in thread From: Theodore Y. Ts'o @ 2020-02-22 0:41 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Tony Luck, Greg Kroah-Hartman, Linux Kernel Mailing List On Fri, Feb 21, 2020 at 09:08:19PM +0100, Jason A. Donenfeld wrote: > On Thu, Feb 20, 2020 at 11:29 PM Tony Luck <tony.luck@gmail.com> wrote: > > > > Also ... what's the deal with a spin_lock on a per-cpu structure? > > > > batch = raw_cpu_ptr(&batched_entropy_u64); > > spin_lock_irqsave(&batch->batch_lock, flags); > > if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) { > > extract_crng((u8 *)batch->entropy_u64); > > batch->position = 0; > > } > > ret = batch->entropy_u64[batch->position++]; > > spin_unlock_irqrestore(&batch->batch_lock, flags); > > > > Could we just disable interrupts and pre-emption around the entropy extraction? > > Probably, yes... We can address this in a separate patch. No, we can't; take a look at invalidate_batched_entropy(), where we need invalidate all of per-cpu batched entropy from a single CPU after we have initialized the the CRNG. Since most of the time after CRNG initialization, the spinlock for each CPU will be on that CPU's cacheline, the time to take and release the spinlock is not going to be material. - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] random: always use batched entropy for get_random_u{32,64} 2020-02-22 0:41 ` Theodore Y. Ts'o @ 2020-02-22 9:59 ` Jason A. Donenfeld 2020-02-24 20:41 ` Luck, Tony 1 sibling, 0 replies; 12+ messages in thread From: Jason A. Donenfeld @ 2020-02-22 9:59 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Tony Luck, Greg Kroah-Hartman, Linux Kernel Mailing List Hi Ted, On 2/22/20, Theodore Y. Ts'o <tytso@mit.edu> wrote: > No, we can't; take a look at invalidate_batched_entropy(), where we > need invalidate all of per-cpu batched entropy from a single CPU after > we have initialized the the CRNG. > > Since most of the time after CRNG initialization, the spinlock for > each CPU will be on that CPU's cacheline, the time to take and release > the spinlock is not going to be material. Ah, you're right. I hadn't looked in detail before, hence mentioning "probably" and deferring to a new patch. Thanks for looking into it. Regarding this patch here, can you apply the v2 that I posted to your random.git tree? Thanks, Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] random: always use batched entropy for get_random_u{32,64} 2020-02-22 0:41 ` Theodore Y. Ts'o 2020-02-22 9:59 ` Jason A. Donenfeld @ 2020-02-24 20:41 ` Luck, Tony 1 sibling, 0 replies; 12+ messages in thread From: Luck, Tony @ 2020-02-24 20:41 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: Jason A. Donenfeld, Tony Luck, Greg Kroah-Hartman, Linux Kernel Mailing List On Fri, Feb 21, 2020 at 07:41:33PM -0500, Theodore Y. Ts'o wrote: > On Fri, Feb 21, 2020 at 09:08:19PM +0100, Jason A. Donenfeld wrote: > > On Thu, Feb 20, 2020 at 11:29 PM Tony Luck <tony.luck@gmail.com> wrote: > > > > > > Could we just disable interrupts and pre-emption around the entropy extraction? > > > > Probably, yes... We can address this in a separate patch. > > No, we can't; take a look at invalidate_batched_entropy(), where we > need invalidate all of per-cpu batched entropy from a single CPU after > we have initialized the the CRNG. > > Since most of the time after CRNG initialization, the spinlock for > each CPU will be on that CPU's cacheline, the time to take and release > the spinlock is not going to be material. So we could get rid of the spin lock by replacing with a "bool" that is written when we want to do an invalidate on the next call (where it is read and cleared). For me it makes a 15 cycle difference (56 vs. 71) for the fast case when we are just picking a value from the array. The slow path when we do extract_crng() is barely changed (731 vs 736 cycles). But I took the "do lazily" comment above invalidate_batched_entropy() very literally and didn't add any fences to make sure that readers of need_invalidate see the store ASAP. So a close race where the invalidate request would have won control of the spin lock might not get processed until a subsequent call. If you think a fence is needed, the the advantage will be lost and the below patch is worthless. -Tony diff --git a/drivers/char/random.c b/drivers/char/random.c index a6b77a850ddd..6fb222996ea4 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2144,7 +2144,7 @@ struct batched_entropy { u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)]; }; unsigned int position; - spinlock_t batch_lock; + bool need_invalidate; }; /* @@ -2155,9 +2155,7 @@ struct batched_entropy { * wait_for_random_bytes() should be called and return 0 at least once at any * point prior. */ -static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = { - .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock), -}; +static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64); u64 get_random_u64(void) { @@ -2168,21 +2166,23 @@ u64 get_random_u64(void) warn_unseeded_randomness(&previous); + local_irq_save(flags); + preempt_disable(); batch = raw_cpu_ptr(&batched_entropy_u64); - spin_lock_irqsave(&batch->batch_lock, flags); - if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) { + if (batch->need_invalidate || + batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) { extract_crng((u8 *)batch->entropy_u64); batch->position = 0; + batch->need_invalidate = false; } ret = batch->entropy_u64[batch->position++]; - spin_unlock_irqrestore(&batch->batch_lock, flags); + preempt_enable(); + local_irq_restore(flags); return ret; } EXPORT_SYMBOL(get_random_u64); -static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = { - .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock), -}; +static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32); u32 get_random_u32(void) { u32 ret; @@ -2192,14 +2192,18 @@ u32 get_random_u32(void) warn_unseeded_randomness(&previous); + local_irq_save(flags); + preempt_disable(); batch = raw_cpu_ptr(&batched_entropy_u32); - spin_lock_irqsave(&batch->batch_lock, flags); - if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) { + if (batch->need_invalidate || + batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) { extract_crng((u8 *)batch->entropy_u32); batch->position = 0; + batch->need_invalidate = false; } ret = batch->entropy_u32[batch->position++]; - spin_unlock_irqrestore(&batch->batch_lock, flags); + preempt_enable(); + local_irq_restore(flags); return ret; } EXPORT_SYMBOL(get_random_u32); @@ -2217,14 +2221,10 @@ static void invalidate_batched_entropy(void) struct batched_entropy *batched_entropy; batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu); - spin_lock_irqsave(&batched_entropy->batch_lock, flags); - batched_entropy->position = 0; - spin_unlock(&batched_entropy->batch_lock); + batched_entropy->need_invalidate = true; batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu); - spin_lock(&batched_entropy->batch_lock); - batched_entropy->position = 0; - spin_unlock_irqrestore(&batched_entropy->batch_lock, flags); + batched_entropy->need_invalidate = true; } } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] random: always use batched entropy for get_random_u{32,64} 2020-02-20 22:20 ` Tony Luck 2020-02-20 22:29 ` Tony Luck @ 2020-02-21 20:07 ` Jason A. Donenfeld 2020-02-21 20:10 ` [PATCH v2] " Jason A. Donenfeld 1 sibling, 1 reply; 12+ messages in thread From: Jason A. Donenfeld @ 2020-02-21 20:07 UTC (permalink / raw) To: Tony Luck; +Cc: Greg Kroah-Hartman, Ted Ts'o, Linux Kernel Mailing List Hi Tony, On Thu, Feb 20, 2020 at 11:20 PM Tony Luck <tony.luck@gmail.com> wrote: > > On Sun, Feb 16, 2020 at 10:24 AM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > drivers/char/random.c | 12 ------------ > > > 1 file changed, 12 deletions(-) > > > > Looks good to me, thank for doing this: > > > > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > Perhaps also needs to update the comment above these functions. > > Specifically the bit that says "The quality of the random > number is either as good as RDRAND" ... since you are > no longer pulling from RDRAND it isn't true anymore. Good call. I'll fix up this comment and submit v2. Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] random: always use batched entropy for get_random_u{32,64} 2020-02-21 20:07 ` Jason A. Donenfeld @ 2020-02-21 20:10 ` Jason A. Donenfeld 2020-02-28 4:09 ` Theodore Y. Ts'o 2020-04-01 13:08 ` Nicolai Stange 0 siblings, 2 replies; 12+ messages in thread From: Jason A. Donenfeld @ 2020-02-21 20:10 UTC (permalink / raw) To: tytso, linux-kernel, gregkh; +Cc: Jason A. Donenfeld It turns out that RDRAND is pretty slow. Comparing these two constructions: for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret)) arch_get_random_long(&ret); and long buf[CHACHA_BLOCK_SIZE / sizeof(long)]; extract_crng((u8 *)buf); it amortizes out to 352 cycles per long for the top one and 107 cycles per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H. And importantly, the top one has the drawback of not benefiting from the real rng, whereas the bottom one has all the nice benefits of using our own chacha rng. As get_random_u{32,64} gets used in more places (perhaps beyond what it was originally intended for when it was introduced as get_random_{int,long} back in the md5 monstrosity era), it seems like it might be a good thing to strengthen its posture a tiny bit. Doing this should only be stronger and not any weaker because that pool is already initialized with a bunch of rdrand data (when available). This way, we get the benefits of the hardware rng as well as our own rng. Another benefit of this is that we no longer hit pitfalls of the recent stream of AMD bugs in RDRAND. One often used code pattern for various things is: do { val = get_random_u32(); } while (hash_table_contains_key(val)); That recent AMD bug rendered that pattern useless, whereas we're really very certain that chacha20 output will give pretty distributed numbers, no matter what. So, this simplification seems better both from a security perspective and from a performance perspective. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Changes v1->v2: - Tony Luck suggested I also update the comment that referenced the no-longer relevant RDRAND. drivers/char/random.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index c7f9584de2c8..a6b77a850ddd 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -2149,11 +2149,11 @@ struct batched_entropy { /* * Get a random word for internal kernel use only. The quality of the random - * number is either as good as RDRAND or as good as /dev/urandom, with the - * goal of being quite fast and not depleting entropy. In order to ensure + * number is good as /dev/urandom, but there is no backtrack protection, with + * the goal of being quite fast and not depleting entropy. In order to ensure * that the randomness provided by this function is okay, the function - * wait_for_random_bytes() should be called and return 0 at least once - * at any point prior. + * wait_for_random_bytes() should be called and return 0 at least once at any + * point prior. */ static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = { .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock), @@ -2166,15 +2166,6 @@ u64 get_random_u64(void) struct batched_entropy *batch; static void *previous; -#if BITS_PER_LONG == 64 - if (arch_get_random_long((unsigned long *)&ret)) - return ret; -#else - if (arch_get_random_long((unsigned long *)&ret) && - arch_get_random_long((unsigned long *)&ret + 1)) - return ret; -#endif - warn_unseeded_randomness(&previous); batch = raw_cpu_ptr(&batched_entropy_u64); @@ -2199,9 +2190,6 @@ u32 get_random_u32(void) struct batched_entropy *batch; static void *previous; - if (arch_get_random_int(&ret)) - return ret; - warn_unseeded_randomness(&previous); batch = raw_cpu_ptr(&batched_entropy_u32); -- 2.25.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] random: always use batched entropy for get_random_u{32,64} 2020-02-21 20:10 ` [PATCH v2] " Jason A. Donenfeld @ 2020-02-28 4:09 ` Theodore Y. Ts'o 2020-04-01 13:08 ` Nicolai Stange 1 sibling, 0 replies; 12+ messages in thread From: Theodore Y. Ts'o @ 2020-02-28 4:09 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: linux-kernel, gregkh On Fri, Feb 21, 2020 at 09:10:37PM +0100, Jason A. Donenfeld wrote: > It turns out that RDRAND is pretty slow. Comparing these two > constructions: > > for (i = 0; i < CHACHA_BLOCK_SIZE; i += sizeof(ret)) > arch_get_random_long(&ret); > > and > > long buf[CHACHA_BLOCK_SIZE / sizeof(long)]; > extract_crng((u8 *)buf); > > it amortizes out to 352 cycles per long for the top one and 107 cycles > per long for the bottom one, on Coffee Lake Refresh, Intel Core i9-9880H. > > And importantly, the top one has the drawback of not benefiting from the > real rng, whereas the bottom one has all the nice benefits of using our > own chacha rng. As get_random_u{32,64} gets used in more places (perhaps > beyond what it was originally intended for when it was introduced as > get_random_{int,long} back in the md5 monstrosity era), it seems like it > might be a good thing to strengthen its posture a tiny bit. Doing this > should only be stronger and not any weaker because that pool is already > initialized with a bunch of rdrand data (when available). This way, we > get the benefits of the hardware rng as well as our own rng. > > Another benefit of this is that we no longer hit pitfalls of the recent > stream of AMD bugs in RDRAND. One often used code pattern for various > things is: > > do { > val = get_random_u32(); > } while (hash_table_contains_key(val)); > > That recent AMD bug rendered that pattern useless, whereas we're really > very certain that chacha20 output will give pretty distributed numbers, > no matter what. > > So, this simplification seems better both from a security perspective > and from a performance perspective. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Thanks, applied. - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] random: always use batched entropy for get_random_u{32,64} 2020-02-21 20:10 ` [PATCH v2] " Jason A. Donenfeld 2020-02-28 4:09 ` Theodore Y. Ts'o @ 2020-04-01 13:08 ` Nicolai Stange 1 sibling, 0 replies; 12+ messages in thread From: Nicolai Stange @ 2020-04-01 13:08 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: tytso, linux-kernel, gregkh Hi, "Jason A. Donenfeld" <Jason@zx2c4.com> writes: > diff --git a/drivers/char/random.c b/drivers/char/random.c > index c7f9584de2c8..a6b77a850ddd 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -2149,11 +2149,11 @@ struct batched_entropy { > > /* > * Get a random word for internal kernel use only. The quality of the random > - * number is either as good as RDRAND or as good as /dev/urandom, with the > - * goal of being quite fast and not depleting entropy. In order to ensure > + * number is good as /dev/urandom, but there is no backtrack protection, with > + * the goal of being quite fast and not depleting entropy. In order to ensure > * that the randomness provided by this function is okay, the function > - * wait_for_random_bytes() should be called and return 0 at least once > - * at any point prior. > + * wait_for_random_bytes() should be called and return 0 at least once at any > + * point prior. > */ > static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = { > .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock), > @@ -2166,15 +2166,6 @@ u64 get_random_u64(void) > struct batched_entropy *batch; > static void *previous; > > -#if BITS_PER_LONG == 64 > - if (arch_get_random_long((unsigned long *)&ret)) > - return ret; > -#else > - if (arch_get_random_long((unsigned long *)&ret) && > - arch_get_random_long((unsigned long *)&ret + 1)) > - return ret; > -#endif > - > warn_unseeded_randomness(&previous); I don't know if this has been reported elsewhere already, but this warning triggers on x86_64 with CONFIG_SLAB_FREELIST_HARDENED and CONFIG_SLAB_FREELIST_RANDOM during early boot now: [ 0.079775] random: random: get_random_u64 called from __kmem_cache_create+0x40/0x550 with crng_init=0 Strictly speaking, this isn't really a problem with this patch -- other archs without arch_get_random_long() support (like e.g. ppc64le) have showed this behaviour before. FWIW, I added a dump_stack() to warn_unseeded_randomness() in order to capture the resp. call paths of the offending get_random_u64/u32() callers, i.e. those invoked before rand_initialize(). Those are - __kmem_cache_create() and init_cache_random_seq()->cache_random_seq_create(), called indirectly quite a number of times from - kmem_cache_init() - kmem_cache_init()->create_kmalloc_caches() - vmalloc_init()->kmem_cache_create() - sched_init()->kmem_cache_create() - radix_tree_init()->kmem_cache_create() - workqueue_init_early()->kmem_cache_create() - trace_event_init()->kmem_cache_create() - __kmalloc()/kmem_cache_alloc()/kmem_cache_alloc_node()/kmem_cache_alloc_trace() ->__slab_alloc->___slab_alloc()->new_slab() called indirectly through - vmalloc_init() - workqueue_init_early() - trace_event_init() - early_irq_init()->alloc_desc() Two possible ways to work around this came to my mind: 1.) Either reintroduce arch_get_random_long() to get_random_u64/u32(), but only for the case that crng_init <= 1. 2.) Or introduce something like arch_seed_primary_crng_early() to be called early from start_kernel(). For x86_64 this could be implemented by filling the crng state with RDRAND data whereas other archs would fall back to get_cycles(), something jitterentropish-like or whatever. What do you think? Thanks, Nicolai > > batch = raw_cpu_ptr(&batched_entropy_u64); > @@ -2199,9 +2190,6 @@ u32 get_random_u32(void) > struct batched_entropy *batch; > static void *previous; > > - if (arch_get_random_int(&ret)) > - return ret; > - > warn_unseeded_randomness(&previous); > > batch = raw_cpu_ptr(&batched_entropy_u32); -- SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg), GF: Felix Imendörffer ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-04-01 13:08 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-02-16 16:18 [PATCH] random: always use batched entropy for get_random_u{32,64} Jason A. Donenfeld 2020-02-16 18:23 ` Greg Kroah-Hartman 2020-02-20 22:20 ` Tony Luck 2020-02-20 22:29 ` Tony Luck 2020-02-21 20:08 ` Jason A. Donenfeld 2020-02-22 0:41 ` Theodore Y. Ts'o 2020-02-22 9:59 ` Jason A. Donenfeld 2020-02-24 20:41 ` Luck, Tony 2020-02-21 20:07 ` Jason A. Donenfeld 2020-02-21 20:10 ` [PATCH v2] " Jason A. Donenfeld 2020-02-28 4:09 ` Theodore Y. Ts'o 2020-04-01 13:08 ` Nicolai Stange
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).