LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
@ 2019-07-11 22:12 Daniel Jordan
  2019-07-12 10:06 ` Herbert Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel Jordan @ 2019-07-11 22:12 UTC (permalink / raw)
  To: steffen.klassert
  Cc: Daniel Jordan, Andrea Parri, Boqun Feng, Paul E . McKenney,
	Peter Zijlstra, linux-arch, linux-crypto, linux-kernel

Testing padata with the tcrypt module on a 5.2 kernel...

    # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
    # modprobe tcrypt mode=211 sec=1

...produces this splat:

    INFO: task modprobe:10075 blocked for more than 120 seconds.
          Not tainted 5.2.0-base+ #16
    modprobe        D    0 10075  10064 0x80004080
    Call Trace:
     ? __schedule+0x4dd/0x610
     ? ring_buffer_unlock_commit+0x23/0x100
     schedule+0x6c/0x90
     schedule_timeout+0x3b/0x320
     ? trace_buffer_unlock_commit_regs+0x4f/0x1f0
     wait_for_common+0x160/0x1a0
     ? wake_up_q+0x80/0x80
     { crypto_wait_req }             # entries in braces added by hand
     { do_one_aead_op }
     { test_aead_jiffies }
     test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
     do_test+0x4053/0x6a2b [tcrypt]
     ? 0xffffffffa00f4000
     tcrypt_mod_init+0x50/0x1000 [tcrypt]
     ...

The second modprobe command never finishes because in padata_reorder,
CPU0's load of reorder_objects is executed before the unlocking store in
spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:

CPU0                                 CPU1

padata_reorder                       padata_do_serial
  LOAD reorder_objects  // 0
                                       INC reorder_objects  // 1
                                       padata_reorder
                                         TRYLOCK pd->lock   // failed
  UNLOCK pd->lock

CPU0 deletes the timer before returning from padata_reorder and since no
other job is submitted to padata, modprobe waits indefinitely.

Add a full barrier to prevent this scenario.  The hang was happening
about once every three runs, but now the test has finished successfully
fifty times in a row.

Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---

memory-barriers.txt says that a full barrier pairs with a release barrier, but
I'd appreciate a look from someone more familiar with barriers.  Thanks.

 kernel/padata.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/padata.c b/kernel/padata.c
index 2d2fddbb7a4c..9cffd4c303cb 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -267,7 +267,12 @@ static void padata_reorder(struct parallel_data *pd)
 	 * The next object that needs serialization might have arrived to
 	 * the reorder queues in the meantime, we will be called again
 	 * from the timer function if no one else cares for it.
+	 *
+	 * Ensure reorder_objects is read after pd->lock is dropped so we see
+	 * an increment from another task in padata_do_serial.  Pairs with
+	 * spin_unlock(&pqueue->reorder.lock) in padata_do_serial.
 	 */
+	smp_mb();
 	if (atomic_read(&pd->reorder_objects)
 			&& !(pinst->flags & PADATA_RESET))
 		mod_timer(&pd->timer, jiffies + HZ);

base-commit: 0ecfebd2b52404ae0c54a878c872bb93363ada36
-- 
2.22.0


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-11 22:12 [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Daniel Jordan
@ 2019-07-12 10:06 ` Herbert Xu
  2019-07-12 10:10   ` Steffen Klassert
  0 siblings, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-12 10:06 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: steffen.klassert, daniel.m.jordan, andrea.parri, boqun.feng,
	paulmck, peterz, linux-arch, linux-crypto, linux-kernel

Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
>
> CPU0                                 CPU1
> 
> padata_reorder                       padata_do_serial
>  LOAD reorder_objects  // 0
>                                       INC reorder_objects  // 1
>                                       padata_reorder
>                                         TRYLOCK pd->lock   // failed
>  UNLOCK pd->lock

I think this can't happen because CPU1 won't call padata_reorder
at all as it's the wrong CPU so it gets pushed back onto a work
queue which will go back to CPU0.

Steffen, could you please take a look at this as there clearly
is a problem here?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-12 10:06 ` Herbert Xu
@ 2019-07-12 10:10   ` Steffen Klassert
  2019-07-12 16:07     ` Daniel Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Steffen Klassert @ 2019-07-12 10:10 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Fri, Jul 12, 2019 at 06:06:36PM +0800, Herbert Xu wrote:
> Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
> >
> > CPU0                                 CPU1
> > 
> > padata_reorder                       padata_do_serial
> >  LOAD reorder_objects  // 0
> >                                       INC reorder_objects  // 1
> >                                       padata_reorder
> >                                         TRYLOCK pd->lock   // failed
> >  UNLOCK pd->lock
> 
> I think this can't happen because CPU1 won't call padata_reorder
> at all as it's the wrong CPU so it gets pushed back onto a work
> queue which will go back to CPU0.
> 
> Steffen, could you please take a look at this as there clearly
> is a problem here?

I'm currently travelling, will have a look at it when I'm back.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-12 10:10   ` Steffen Klassert
@ 2019-07-12 16:07     ` Daniel Jordan
  2019-07-13  5:03       ` Herbert Xu
  2019-07-16 12:53       ` [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Andrea Parri
  0 siblings, 2 replies; 42+ messages in thread
From: Daniel Jordan @ 2019-07-12 16:07 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Herbert Xu, Daniel Jordan, andrea.parri, boqun.feng, paulmck,
	peterz, linux-arch, linux-crypto, linux-kernel

On Fri, Jul 12, 2019 at 12:10:12PM +0200, Steffen Klassert wrote:
> On Fri, Jul 12, 2019 at 06:06:36PM +0800, Herbert Xu wrote:
> > Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
> > >
> > > CPU0                                 CPU1
> > >
> > > padata_reorder                       padata_do_serial
> > >  LOAD reorder_objects  // 0
> > >                                       INC reorder_objects  // 1
> > >                                       padata_reorder
> > >                                         TRYLOCK pd->lock   // failed
> > >  UNLOCK pd->lock
> >
> > I think this can't happen because CPU1 won't call padata_reorder
> > at all as it's the wrong CPU so it gets pushed back onto a work
> > queue which will go back to CPU0.

Thanks for looking at this.

When you say the wrong CPU, I think you're referring to the reorder_via_wq
logic in padata_do_serial.  If so, I think my explanation was unclear, because
there were two padata jobs in flight and my tracepoints showed neither of them
punted padata_reorder to a workqueue.  Let me expand on this, hopefully it
helps.

pcrypt used CPU 5 for its callback CPU.  The first job in question, with
sequence number 16581, hashed to CPU 21 on my system.  This is a more complete
version of what led to the hanging modprobe command.

modprobe (CPU2)               kworker/21:1-293 (CPU21)                              kworker/5:2-276 (CPU5)
--------------------------    ------------------------                              ----------------------
<submit job, seq_nr=16581>
...
  padata_do_parallel
    queue_work_on(21, ...)
<sleeps>
                              padata_parallel_worker
                                pcrypt_aead_dec
                                  padata_do_serial
                                    padata_reorder
                                    | padata_get_next  // returns job, seq_nr=16581
                                    | // serialize seq_nr=16581
                                    | queue_work_on(5, ...)
                                    | padata_get_next  // returns -EINPROGRESS
                                    // padata_reorder doesn't return yet!
                                    | |                                             padata_serial_worker
                                    | |                                               pcrypt_aead_serial
                                    | |                                                 <wake up modprobe>
                                    | |                                             <worker finishes>
<submit job, seq_nr=16582>          | |
...                                 | |
  padata_do_parallel                | |
    queue_work_on(22, ...)          | | (LOAD reorder_objects as 0 somewhere
<sleeps>                            | |    in here, before the INC)
                                    | |                                             kworker/22:1-291 (CPU22)
                                    | |                                             ------------------------
                                    | |                                             padata_parallel_worker
                                    | |                                               pcrypt_aead_dec
                                    | |                                                 padata_do_serial
                                    | |                                                   // INC reorder_objects to 1
                                    | |                                                   padata_reorder
                                    | |                                                     // trylock fail, CPU21 _should_
                                    | |                                                     //   serialize 16582 but doesn't
                                    | |                                             <worker finishes>
                                    | // deletes pd->timer
                                    // padata_reorder returns
                              <worker finishes>
<keeps on sleeping lazily>

My tracepoints showed CPU22 increased reorder_objects to 1 but CPU21 read it as
0.

I think adding the full barrier guarantees the following ordering, and the
memory model people can correct me if I'm wrong:

CPU21                      CPU22
------------------------   --------------------------
UNLOCK pd->lock
smp_mb()
LOAD reorder_objects
                           INC reorder_objects
                           spin_unlock(&pqueue->reorder.lock) // release barrier
                           TRYLOCK pd->lock

So if CPU22 has incremented reorder_objects but CPU21 reads 0 for it, CPU21
should also have unlocked pd->lock so CPU22 can get it and serialize any
remaining jobs.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-12 16:07     ` Daniel Jordan
@ 2019-07-13  5:03       ` Herbert Xu
  2019-07-15 16:10         ` Daniel Jordan
  2019-07-16 12:53       ` [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Andrea Parri
  1 sibling, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-13  5:03 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Fri, Jul 12, 2019 at 12:07:37PM -0400, Daniel Jordan wrote:
>
> modprobe (CPU2)               kworker/21:1-293 (CPU21)                              kworker/5:2-276 (CPU5)
> --------------------------    ------------------------                              ----------------------
> <submit job, seq_nr=16581>
> ...
>   padata_do_parallel
>     queue_work_on(21, ...)
> <sleeps>
>                               padata_parallel_worker
>                                 pcrypt_aead_dec
>                                   padata_do_serial
>                                     padata_reorder

This can't happen because if the job started on CPU2 then it must
go back to CPU2 for completion.  IOW padata_do_serial should be
punting this to a work queue for CPU2 rather than calling
padata_reorder on CPU21.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-13  5:03       ` Herbert Xu
@ 2019-07-15 16:10         ` Daniel Jordan
  2019-07-16 10:04           ` Herbert Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel Jordan @ 2019-07-15 16:10 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, Steffen Klassert, andrea.parri, boqun.feng,
	paulmck, peterz, linux-arch, linux-crypto, linux-kernel

On Sat, Jul 13, 2019 at 01:03:21PM +0800, Herbert Xu wrote:
> On Fri, Jul 12, 2019 at 12:07:37PM -0400, Daniel Jordan wrote:
> >
> > modprobe (CPU2)               kworker/21:1-293 (CPU21)                              kworker/5:2-276 (CPU5)
> > --------------------------    ------------------------                              ----------------------
> > <submit job, seq_nr=16581>
> > ...
> >   padata_do_parallel
> >     queue_work_on(21, ...)
> > <sleeps>
> >                               padata_parallel_worker
> >                                 pcrypt_aead_dec
> >                                   padata_do_serial
> >                                     padata_reorder
> 
> This can't happen because if the job started on CPU2 then it must
> go back to CPU2 for completion.  IOW padata_do_serial should be
> punting this to a work queue for CPU2 rather than calling
> padata_reorder on CPU21.

I've been wrong before plenty of times, and there's nothing preventing this
from being one of those times :) , but in this case I believe what I'm showing
is correct.

The padata_do_serial call for a given job ensures padata_reorder runs on the
CPU that the job hashed to in padata_do_parallel, which is not necessarily the
same CPU as the one that padata_do_parallel itself ran on.

In this case, the padata job in question started via padata_do_parallel, where
it hashed to CPU 21:

  padata_do_parallel                    // ran on CPU 2
    ...
    target_cpu = padata_cpu_hash(pd);   // target_cpu == 21
    padata->cpu = target_cpu;
    ...
    queue_work_on(21, ...)

The corresponding kworker then started:

  padata_parallel_worker                // bound to CPU 21
    pcrypt_aead_dec
      padata_do_serial
        padata_reorder

Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-15 16:10         ` Daniel Jordan
@ 2019-07-16 10:04           ` Herbert Xu
  2019-07-16 11:14             ` Steffen Klassert
  0 siblings, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-16 10:04 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Mon, Jul 15, 2019 at 12:10:46PM -0400, Daniel Jordan wrote:
>
> I've been wrong before plenty of times, and there's nothing preventing this
> from being one of those times :) , but in this case I believe what I'm showing
> is correct.
> 
> The padata_do_serial call for a given job ensures padata_reorder runs on the
> CPU that the job hashed to in padata_do_parallel, which is not necessarily the
> same CPU as the one that padata_do_parallel itself ran on.

You're right.  I was taking the comment in the code at face value,
never trust comments :)

While looking at the code in question, I think it is seriously
broken.  For instance, padata_replace does not deal with async
crypto at all.  It would fail miserably if the underlying async
crypto held onto references to the old pd.

So we may have to restrict pcrypt to sync crypto only, which
would obviously mean that it can no longer use aesni.  Or we
will have to spend a lot of time to fix this up properly.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-16 10:04           ` Herbert Xu
@ 2019-07-16 11:14             ` Steffen Klassert
  2019-07-16 12:57               ` [PATCH] padata: Use RCU when fetching pd from do_serial Herbert Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Steffen Klassert @ 2019-07-16 11:14 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 06:04:47PM +0800, Herbert Xu wrote:
> On Mon, Jul 15, 2019 at 12:10:46PM -0400, Daniel Jordan wrote:
> >
> > I've been wrong before plenty of times, and there's nothing preventing this
> > from being one of those times :) , but in this case I believe what I'm showing
> > is correct.
> > 
> > The padata_do_serial call for a given job ensures padata_reorder runs on the
> > CPU that the job hashed to in padata_do_parallel, which is not necessarily the
> > same CPU as the one that padata_do_parallel itself ran on.
> 
> You're right.  I was taking the comment in the code at face value,
> never trust comments :)
> 
> While looking at the code in question, I think it is seriously
> broken.  For instance, padata_replace does not deal with async
> crypto at all.  It would fail miserably if the underlying async
> crypto held onto references to the old pd.

Hm, yes looks like that.

padata_replace should not call padata_free_pd() as long as the
refcount is not zero. Currenlty padata_flush_queues() will
BUG if there are references left.

Maybe we can fix it if we call padata_free_pd() from
padata_serial_worker() when it sent out the last object.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-12 16:07     ` Daniel Jordan
  2019-07-13  5:03       ` Herbert Xu
@ 2019-07-16 12:53       ` Andrea Parri
  2019-07-16 13:13         ` Peter Zijlstra
  2019-07-16 15:01         ` Herbert Xu
  1 sibling, 2 replies; 42+ messages in thread
From: Andrea Parri @ 2019-07-16 12:53 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, Herbert Xu, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

Hi Daniel,

My two cents (summarizing some findings we discussed privately):


> I think adding the full barrier guarantees the following ordering, and the
> memory model people can correct me if I'm wrong:
> 
> CPU21                      CPU22
> ------------------------   --------------------------
> UNLOCK pd->lock
> smp_mb()
> LOAD reorder_objects
>                            INC reorder_objects
>                            spin_unlock(&pqueue->reorder.lock) // release barrier
>                            TRYLOCK pd->lock
> 
> So if CPU22 has incremented reorder_objects but CPU21 reads 0 for it, CPU21
> should also have unlocked pd->lock so CPU22 can get it and serialize any
> remaining jobs.

This information inspired me to write down the following litmus test:
(AFAICT, you independently wrote a very similar test, which is indeed
quite reassuring! ;D)

C daniel-padata

{ }

P0(atomic_t *reorder_objects, spinlock_t *pd_lock)
{
	int r0;

	spin_lock(pd_lock);
	spin_unlock(pd_lock);
	smp_mb();
	r0 = atomic_read(reorder_objects);
}

P1(atomic_t *reorder_objects, spinlock_t *pd_lock, spinlock_t *reorder_lock)
{
	int r1;

	spin_lock(reorder_lock);
	atomic_inc(reorder_objects);
	spin_unlock(reorder_lock);
	//smp_mb();
	r1 = spin_trylock(pd_lock);
}

exists (0:r0=0 /\ 1:r1=0)

It seems worth noticing that this test's "exists" clause is satisfiable
according to the (current) memory consistency model.  (Informally, this
can be explained by noticing that the RELEASE from the spin_unlock() in
P1 does not provide any order between the atomic increment and the read
part of the spin_trylock() operation.)  FWIW, uncommenting the smp_mb()
in P1 would suffice to prevent this clause from being satisfiable; I am
not sure, however, whether this approach is feasible or ideal... (sorry,
I'm definitely not too familiar with this code... ;/)

Thanks,
  Andrea

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-16 11:14             ` Steffen Klassert
@ 2019-07-16 12:57               ` Herbert Xu
  2019-07-16 13:09                 ` Herbert Xu
  2019-07-16 13:15                 ` Peter Zijlstra
  0 siblings, 2 replies; 42+ messages in thread
From: Herbert Xu @ 2019-07-16 12:57 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 01:14:10PM +0200, Steffen Klassert wrote:
>
> Maybe we can fix it if we call padata_free_pd() from
> padata_serial_worker() when it sent out the last object.

How about using RCU?

We still need to fix up the refcnt if it's supposed to limit the
overall number of outstanding requests.

---8<---
The function padata_do_serial uses parallel_data without obeying
the RCU rules around its life-cycle.  This means that a concurrent
padata_replace call can result in a crash.

This patch fixes it by using RCU just as we do in padata_do_parallel.

Fixes: 16295bec6398 ("padata: Generic parallelization/...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 5d13d25da2c8..952f6514dd72 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -35,7 +35,7 @@
  * struct padata_priv -  Embedded to the users data structure.
  *
  * @list: List entry, to attach to the padata lists.
- * @pd: Pointer to the internal control structure.
+ * @inst: Pointer to the overall control structure.
  * @cb_cpu: Callback cpu for serializatioon.
  * @cpu: Cpu for parallelization.
  * @seq_nr: Sequence number of the parallelized data object.
@@ -45,7 +45,7 @@
  */
 struct padata_priv {
 	struct list_head	list;
-	struct parallel_data	*pd;
+	struct padata_instance	*inst;
 	int			cb_cpu;
 	int			cpu;
 	int			info;
diff --git a/kernel/padata.c b/kernel/padata.c
index 2d2fddbb7a4c..fb5dd1210d2b 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -128,7 +128,7 @@ int padata_do_parallel(struct padata_instance *pinst,
 
 	err = 0;
 	atomic_inc(&pd->refcnt);
-	padata->pd = pd;
+	padata->inst = pinst;
 	padata->cb_cpu = cb_cpu;
 
 	target_cpu = padata_cpu_hash(pd);
@@ -367,7 +368,7 @@ void padata_do_serial(struct padata_priv *padata)
 	struct parallel_data *pd;
 	int reorder_via_wq = 0;
 
-	pd = padata->pd;
+	pd = rcu_dereference_bh(padata->inst->pd);
 
 	cpu = get_cpu();
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-16 12:57               ` [PATCH] padata: Use RCU when fetching pd from do_serial Herbert Xu
@ 2019-07-16 13:09                 ` Herbert Xu
  2019-07-16 13:23                   ` [v2 PATCH] " Herbert Xu
  2019-07-17  8:28                   ` [PATCH] " Steffen Klassert
  2019-07-16 13:15                 ` Peter Zijlstra
  1 sibling, 2 replies; 42+ messages in thread
From: Herbert Xu @ 2019-07-16 13:09 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 08:57:04PM +0800, Herbert Xu wrote:
>
> How about using RCU?
> 
> We still need to fix up the refcnt if it's supposed to limit the
> overall number of outstanding requests.

Hmm, it doesn't work because the refcnt is attached to the old
pd.  That shouldn't be a problem though as we could simply ignore
the refcnt in padata_flush_queue.

However, I think this leads to another bug in that pcrypt doesn't
support dm-crypt properly.  It never does the backlog stuff and
therefore can't guarantee reliable processing which dm-crypt requires.

Is it intentional to only allow pcrypt for IPsec?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-16 12:53       ` [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Andrea Parri
@ 2019-07-16 13:13         ` Peter Zijlstra
  2019-07-16 15:01         ` Herbert Xu
  1 sibling, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2019-07-16 13:13 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Daniel Jordan, Steffen Klassert, Herbert Xu, boqun.feng, paulmck,
	linux-arch, linux-crypto, linux-kernel, will

On Tue, Jul 16, 2019 at 02:53:09PM +0200, Andrea Parri wrote:

> C daniel-padata
> 
> { }
> 
> P0(atomic_t *reorder_objects, spinlock_t *pd_lock)
> {
> 	int r0;
> 
> 	spin_lock(pd_lock);
> 	spin_unlock(pd_lock);
> 	smp_mb();
> 	r0 = atomic_read(reorder_objects);
> }
> 
> P1(atomic_t *reorder_objects, spinlock_t *pd_lock, spinlock_t *reorder_lock)
> {
> 	int r1;
> 
> 	spin_lock(reorder_lock);
> 	atomic_inc(reorder_objects);
> 	spin_unlock(reorder_lock);
> 	//smp_mb();
> 	r1 = spin_trylock(pd_lock);
> }
> 
> exists (0:r0=0 /\ 1:r1=0)
> 
> It seems worth noticing that this test's "exists" clause is satisfiable
> according to the (current) memory consistency model.  (Informally, this
> can be explained by noticing that the RELEASE from the spin_unlock() in
> P1 does not provide any order between the atomic increment and the read
> part of the spin_trylock() operation.)  FWIW, uncommenting the smp_mb()
> in P1 would suffice to prevent this clause from being satisfiable; I am
> not sure, however, whether this approach is feasible or ideal... (sorry,
> I'm definitely not too familiar with this code... ;/)

Urgh, that one again.

Yes, you need the smp_mb(); although a whole bunch of architectures can
live without it. IIRC it is part of the eternal RCsc/RCpc debate.

Paul/RCU have their smp_mb__after_unlock_lock() that is about something
similar, although we've so far confinsed that to the RCU code, because
of how confusing that all is.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-16 12:57               ` [PATCH] padata: Use RCU when fetching pd from do_serial Herbert Xu
  2019-07-16 13:09                 ` Herbert Xu
@ 2019-07-16 13:15                 ` Peter Zijlstra
  2019-07-16 13:18                   ` Herbert Xu
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2019-07-16 13:15 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Steffen Klassert, Daniel Jordan, andrea.parri, boqun.feng,
	paulmck, linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 08:57:04PM +0800, Herbert Xu wrote:
> On Tue, Jul 16, 2019 at 01:14:10PM +0200, Steffen Klassert wrote:
> >
> > Maybe we can fix it if we call padata_free_pd() from
> > padata_serial_worker() when it sent out the last object.
> 
> How about using RCU?
> 
> We still need to fix up the refcnt if it's supposed to limit the
> overall number of outstanding requests.
> 
> ---8<---
> The function padata_do_serial uses parallel_data without obeying
> the RCU rules around its life-cycle.  This means that a concurrent
> padata_replace call can result in a crash.
> 
> This patch fixes it by using RCU just as we do in padata_do_parallel.
> 
> Fixes: 16295bec6398 ("padata: Generic parallelization/...")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

> diff --git a/kernel/padata.c b/kernel/padata.c
> index 2d2fddbb7a4c..fb5dd1210d2b 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -128,7 +128,7 @@ int padata_do_parallel(struct padata_instance *pinst,
>  
>  	err = 0;
>  	atomic_inc(&pd->refcnt);
> -	padata->pd = pd;
> +	padata->inst = pinst;
>  	padata->cb_cpu = cb_cpu;
>  
>  	target_cpu = padata_cpu_hash(pd);
> @@ -367,7 +368,7 @@ void padata_do_serial(struct padata_priv *padata)
>  	struct parallel_data *pd;
>  	int reorder_via_wq = 0;
>  
> -	pd = padata->pd;
> +	pd = rcu_dereference_bh(padata->inst->pd);
>  
>  	cpu = get_cpu();
>  

That's weird for not having a matching assign and lacking comments to
explain that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-16 13:15                 ` Peter Zijlstra
@ 2019-07-16 13:18                   ` Herbert Xu
  2019-07-16 13:50                     ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-16 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steffen Klassert, Daniel Jordan, andrea.parri, boqun.feng,
	paulmck, linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 03:15:20PM +0200, Peter Zijlstra wrote:
>
> > @@ -367,7 +368,7 @@ void padata_do_serial(struct padata_priv *padata)
> >  	struct parallel_data *pd;
> >  	int reorder_via_wq = 0;
> >  
> > -	pd = padata->pd;
> > +	pd = rcu_dereference_bh(padata->inst->pd);
> >  
> >  	cpu = get_cpu();
> >  
> 
> That's weird for not having a matching assign and lacking comments to
> explain that.

There is a matching rcu_assign_pointer.  But we should add some
RCU markers.

Or perhaps you're misreading the level of indirections :)

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [v2 PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-16 13:09                 ` Herbert Xu
@ 2019-07-16 13:23                   ` Herbert Xu
  2019-07-17  8:36                     ` Steffen Klassert
  2019-07-17  8:28                   ` [PATCH] " Steffen Klassert
  1 sibling, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-16 13:23 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 09:09:28PM +0800, Herbert Xu wrote:
>
> Hmm, it doesn't work because the refcnt is attached to the old
> pd.  That shouldn't be a problem though as we could simply ignore
> the refcnt in padata_flush_queue.

This should fix it:

---8<---
The function padata_do_serial uses parallel_data without obeying
the RCU rules around its life-cycle.  This means that a concurrent
padata_replace call can result in a crash.

This patch fixes it by using RCU just as we do in padata_do_parallel.

As the refcnt may now span two parallel_data structures, this patch
moves it to padata_instance instead.  FWIW the refcnt is used to
limit the number of outstanding requests (albeit a soft limit as
we don't do a proper atomic inc and test).

Fixes: 16295bec6398 ("padata: Generic parallelization/...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 5d13d25da2c8..ce51555cb86c 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -35,7 +35,7 @@
  * struct padata_priv -  Embedded to the users data structure.
  *
  * @list: List entry, to attach to the padata lists.
- * @pd: Pointer to the internal control structure.
+ * @inst: Pointer to the overall control structure.
  * @cb_cpu: Callback cpu for serializatioon.
  * @cpu: Cpu for parallelization.
  * @seq_nr: Sequence number of the parallelized data object.
@@ -45,7 +45,7 @@
  */
 struct padata_priv {
 	struct list_head	list;
-	struct parallel_data	*pd;
+	struct padata_instance	*inst;
 	int			cb_cpu;
 	int			cpu;
 	int			info;
@@ -120,7 +120,6 @@ struct padata_cpumask {
  * @pqueue: percpu padata queues used for parallelization.
  * @squeue: percpu padata queues used for serialuzation.
  * @reorder_objects: Number of objects waiting in the reorder queues.
- * @refcnt: Number of objects holding a reference on this parallel_data.
  * @max_seq_nr:  Maximal used sequence number.
  * @cpumask: The cpumasks in use for parallel and serial workers.
  * @lock: Reorder lock.
@@ -132,7 +131,6 @@ struct parallel_data {
 	struct padata_parallel_queue	__percpu *pqueue;
 	struct padata_serial_queue	__percpu *squeue;
 	atomic_t			reorder_objects;
-	atomic_t			refcnt;
 	atomic_t			seq_nr;
 	struct padata_cpumask		cpumask;
 	spinlock_t                      lock ____cacheline_aligned;
@@ -152,6 +150,7 @@ struct parallel_data {
  *            or both cpumasks change.
  * @kobj: padata instance kernel object.
  * @lock: padata instance lock.
+ * @refcnt: Number of objects holding a reference on this padata_instance.
  * @flags: padata flags.
  */
 struct padata_instance {
@@ -162,6 +161,7 @@ struct padata_instance {
 	struct blocking_notifier_head	 cpumask_change_notifier;
 	struct kobject                   kobj;
 	struct mutex			 lock;
+	atomic_t			refcnt;
 	u8				 flags;
 #define	PADATA_INIT	1
 #define	PADATA_RESET	2
diff --git a/kernel/padata.c b/kernel/padata.c
index 2d2fddbb7a4c..c86333fa3cec 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -123,12 +123,12 @@ int padata_do_parallel(struct padata_instance *pinst,
 	if ((pinst->flags & PADATA_RESET))
 		goto out;
 
-	if (atomic_read(&pd->refcnt) >= MAX_OBJ_NUM)
+	if (atomic_read(&pinst->refcnt) >= MAX_OBJ_NUM)
 		goto out;
 
 	err = 0;
-	atomic_inc(&pd->refcnt);
-	padata->pd = pd;
+	atomic_inc(&pinst->refcnt);
+	padata->inst = pinst;
 	padata->cb_cpu = cb_cpu;
 
 	target_cpu = padata_cpu_hash(pd);
@@ -347,7 +347,7 @@ static void padata_serial_worker(struct work_struct *serial_work)
 		list_del_init(&padata->list);
 
 		padata->serial(padata);
-		atomic_dec(&pd->refcnt);
+		atomic_dec(&pd->pinst->refcnt);
 	}
 	local_bh_enable();
 }
@@ -367,7 +367,7 @@ void padata_do_serial(struct padata_priv *padata)
 	struct parallel_data *pd;
 	int reorder_via_wq = 0;
 
-	pd = padata->pd;
+	pd = rcu_dereference_bh(padata->inst->pd);
 
 	cpu = get_cpu();
 
@@ -489,7 +489,6 @@ static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst,
 	timer_setup(&pd->timer, padata_reorder_timer, 0);
 	atomic_set(&pd->seq_nr, -1);
 	atomic_set(&pd->reorder_objects, 0);
-	atomic_set(&pd->refcnt, 0);
 	pd->pinst = pinst;
 	spin_lock_init(&pd->lock);
 
@@ -535,8 +534,6 @@ static void padata_flush_queues(struct parallel_data *pd)
 		squeue = per_cpu_ptr(pd->squeue, cpu);
 		flush_work(&squeue->work);
 	}
-
-	BUG_ON(atomic_read(&pd->refcnt) != 0);
 }
 
 static void __padata_start(struct padata_instance *pinst)
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-16 13:18                   ` Herbert Xu
@ 2019-07-16 13:50                     ` Peter Zijlstra
  2019-07-16 13:52                       ` Herbert Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2019-07-16 13:50 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Steffen Klassert, Daniel Jordan, andrea.parri, boqun.feng,
	paulmck, linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 09:18:07PM +0800, Herbert Xu wrote:
> On Tue, Jul 16, 2019 at 03:15:20PM +0200, Peter Zijlstra wrote:
> >
> > > @@ -367,7 +368,7 @@ void padata_do_serial(struct padata_priv *padata)
> > >  	struct parallel_data *pd;
> > >  	int reorder_via_wq = 0;
> > >  
> > > -	pd = padata->pd;
> > > +	pd = rcu_dereference_bh(padata->inst->pd);
> > >  
> > >  	cpu = get_cpu();
> > >  
> > 
> > That's weird for not having a matching assign and lacking comments to
> > explain that.
> 
> There is a matching rcu_assign_pointer.  But we should add some
> RCU markers.
> 
> Or perhaps you're misreading the level of indirections :)

Could well be, I'm not much familiar with this code; I'll look more
careful.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-16 13:50                     ` Peter Zijlstra
@ 2019-07-16 13:52                       ` Herbert Xu
  0 siblings, 0 replies; 42+ messages in thread
From: Herbert Xu @ 2019-07-16 13:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steffen Klassert, Daniel Jordan, andrea.parri, boqun.feng,
	paulmck, linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 03:50:10PM +0200, Peter Zijlstra wrote:
>
> Could well be, I'm not much familiar with this code; I'll look more
> careful.

My patch is broken anyway.  We can't just switch pd structures
as the code relies on going to exactly the same CPU on the starting
pd to ensure ordering.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-16 12:53       ` [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Andrea Parri
  2019-07-16 13:13         ` Peter Zijlstra
@ 2019-07-16 15:01         ` Herbert Xu
  2019-07-16 15:44           ` Daniel Jordan
  1 sibling, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-16 15:01 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Daniel Jordan, Steffen Klassert, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 02:53:09PM +0200, Andrea Parri wrote:
>
> P1(atomic_t *reorder_objects, spinlock_t *pd_lock, spinlock_t *reorder_lock)
> {
> 	int r1;
> 
> 	spin_lock(reorder_lock);
> 	atomic_inc(reorder_objects);
> 	spin_unlock(reorder_lock);
> 	//smp_mb();
> 	r1 = spin_trylock(pd_lock);
> }

Yes we need a matching mb on the other side.  However, we can
get away with using smp_mb__after_atomic thanks to the atomic_inc
above.

Daniel, can you please respin the patch with the matching smp_mb?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-16 15:01         ` Herbert Xu
@ 2019-07-16 15:44           ` Daniel Jordan
  2019-07-16 16:32             ` [PATCH v2] " Daniel Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel Jordan @ 2019-07-16 15:44 UTC (permalink / raw)
  To: Herbert Xu, Andrea Parri
  Cc: Steffen Klassert, boqun.feng, paulmck, peterz, linux-arch,
	linux-crypto, linux-kernel

On 7/16/19 11:01 AM, Herbert Xu wrote:
> On Tue, Jul 16, 2019 at 02:53:09PM +0200, Andrea Parri wrote:
>>
>> P1(atomic_t *reorder_objects, spinlock_t *pd_lock, spinlock_t *reorder_lock)
>> {
>> 	int r1;
>>
>> 	spin_lock(reorder_lock);
>> 	atomic_inc(reorder_objects);
>> 	spin_unlock(reorder_lock);
>> 	//smp_mb();
>> 	r1 = spin_trylock(pd_lock);
>> }
> 
> Yes we need a matching mb on the other side.  However, we can
> get away with using smp_mb__after_atomic thanks to the atomic_inc
> above.
> 
> Daniel, can you please respin the patch with the matching smp_mb?

Sure, Herbert, will do.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-16 15:44           ` Daniel Jordan
@ 2019-07-16 16:32             ` Daniel Jordan
  2019-07-17 11:11               ` [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder Herbert Xu
  2019-07-18  5:42               ` [PATCH v2] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Herbert Xu
  0 siblings, 2 replies; 42+ messages in thread
From: Daniel Jordan @ 2019-07-16 16:32 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Daniel Jordan, Andrea Parri, Boqun Feng, Herbert Xu,
	Paul E . McKenney, Peter Zijlstra, linux-arch, linux-crypto,
	linux-kernel

Testing padata with the tcrypt module on a 5.2 kernel...

    # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
    # modprobe tcrypt mode=211 sec=1

...produces this splat:

    INFO: task modprobe:10075 blocked for more than 120 seconds.
          Not tainted 5.2.0-base+ #16
    modprobe        D    0 10075  10064 0x80004080
    Call Trace:
     ? __schedule+0x4dd/0x610
     ? ring_buffer_unlock_commit+0x23/0x100
     schedule+0x6c/0x90
     schedule_timeout+0x3b/0x320
     ? trace_buffer_unlock_commit_regs+0x4f/0x1f0
     wait_for_common+0x160/0x1a0
     ? wake_up_q+0x80/0x80
     { crypto_wait_req }             # entries in braces added by hand
     { do_one_aead_op }
     { test_aead_jiffies }
     test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
     do_test+0x4053/0x6a2b [tcrypt]
     ? 0xffffffffa00f4000
     tcrypt_mod_init+0x50/0x1000 [tcrypt]
     ...

The second modprobe command never finishes because in padata_reorder,
CPU0's load of reorder_objects is executed before the unlocking store in
spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:

CPU0                                 CPU1

padata_reorder                       padata_do_serial
  LOAD reorder_objects  // 0
                                       INC reorder_objects  // 1
                                       padata_reorder
                                         TRYLOCK pd->lock   // failed
  UNLOCK pd->lock

CPU0 deletes the timer before returning from padata_reorder and since no
other job is submitted to padata, modprobe waits indefinitely.

Add a pair of full barriers to guarantee proper ordering:

CPU0                                 CPU1

padata_reorder                       padata_do_serial
  UNLOCK pd->lock
  smp_mb()
  LOAD reorder_objects
                                       INC reorder_objects
                                       smp_mb__after_atomic()
                                       padata_reorder
                                         TRYLOCK pd->lock

smp_mb__after_atomic is needed so the read part of the trylock operation
comes after the INC, as Andrea points out.   Thanks also to Andrea for
help with writing a litmus test.

Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 kernel/padata.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/padata.c b/kernel/padata.c
index 2d2fddbb7a4c..15a8ad63f4ff 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -267,7 +267,12 @@ static void padata_reorder(struct parallel_data *pd)
 	 * The next object that needs serialization might have arrived to
 	 * the reorder queues in the meantime, we will be called again
 	 * from the timer function if no one else cares for it.
+	 *
+	 * Ensure reorder_objects is read after pd->lock is dropped so we see
+	 * an increment from another task in padata_do_serial.  Pairs with
+	 * smp_mb__after_atomic in padata_do_serial.
 	 */
+	smp_mb();
 	if (atomic_read(&pd->reorder_objects)
 			&& !(pinst->flags & PADATA_RESET))
 		mod_timer(&pd->timer, jiffies + HZ);
@@ -387,6 +392,13 @@ void padata_do_serial(struct padata_priv *padata)
 	list_add_tail(&padata->list, &pqueue->reorder.list);
 	spin_unlock(&pqueue->reorder.lock);
 
+	/*
+	 * Ensure the atomic_inc of reorder_objects above is ordered correctly
+	 * with the trylock of pd->lock in padata_reorder.  Pairs with smp_mb
+	 * in padata_reorder.
+	 */
+	smp_mb__after_atomic();
+
 	put_cpu();
 
 	/* If we're running on the wrong CPU, call padata_reorder() via a

base-commit: 0ecfebd2b52404ae0c54a878c872bb93363ada36
-- 
2.22.0


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-16 13:09                 ` Herbert Xu
  2019-07-16 13:23                   ` [v2 PATCH] " Herbert Xu
@ 2019-07-17  8:28                   ` Steffen Klassert
  2019-07-17  8:47                     ` Herbert Xu
  1 sibling, 1 reply; 42+ messages in thread
From: Steffen Klassert @ 2019-07-17  8:28 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 09:09:28PM +0800, Herbert Xu wrote:
> 
> However, I think this leads to another bug in that pcrypt doesn't
> support dm-crypt properly.  It never does the backlog stuff and
> therefore can't guarantee reliable processing which dm-crypt requires.
> 
> Is it intentional to only allow pcrypt for IPsec?

I had a patch to support crypto backlog some years ago,
but testing with dm-crypt did not show any performance
improvement. So I decided to just skip that patch because
it added code for no need.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [v2 PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-16 13:23                   ` [v2 PATCH] " Herbert Xu
@ 2019-07-17  8:36                     ` Steffen Klassert
  2019-07-17  8:50                       ` Herbert Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Steffen Klassert @ 2019-07-17  8:36 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 09:23:45PM +0800, Herbert Xu wrote:
> On Tue, Jul 16, 2019 at 09:09:28PM +0800, Herbert Xu wrote:
> >
> > Hmm, it doesn't work because the refcnt is attached to the old
> > pd.  That shouldn't be a problem though as we could simply ignore
> > the refcnt in padata_flush_queue.
> 
> This should fix it:
> 
> ---8<---
> The function padata_do_serial uses parallel_data without obeying
> the RCU rules around its life-cycle.  This means that a concurrent
> padata_replace call can result in a crash.
> 
> This patch fixes it by using RCU just as we do in padata_do_parallel.

RCU alone won't help because if some object is queued for async
crypto, we left the RCU protected aera. I think padata_do_serial
needs to do RCU and should free 'parallel_data' if the flag
PADATA_RESET is set and the refcount goes to zero. padata_replace
should do the same then.

> 
> As the refcnt may now span two parallel_data structures, this patch
> moves it to padata_instance instead.  FWIW the refcnt is used to
> limit the number of outstanding requests (albeit a soft limit as
> we don't do a proper atomic inc and test).
> 
> Fixes: 16295bec6398 ("padata: Generic parallelization/...")
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/include/linux/padata.h b/include/linux/padata.h

...

> index 5d13d25da2c8..ce51555cb86c 100644
> @@ -367,7 +367,7 @@ void padata_do_serial(struct padata_priv *padata)
>  	struct parallel_data *pd;
>  	int reorder_via_wq = 0;
>  
> -	pd = padata->pd;
> +	pd = rcu_dereference_bh(padata->inst->pd);

Why not just

pd = rcu_dereference_bh(padata->pd);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-17  8:28                   ` [PATCH] " Steffen Klassert
@ 2019-07-17  8:47                     ` Herbert Xu
  2019-07-17  8:53                       ` Steffen Klassert
  0 siblings, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-17  8:47 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Wed, Jul 17, 2019 at 10:28:15AM +0200, Steffen Klassert wrote:
>
> I had a patch to support crypto backlog some years ago,
> but testing with dm-crypt did not show any performance
> improvement. So I decided to just skip that patch because
> it added code for no need.

Well pcrypt is part of the API so it needs to obey the rules.
So even if it wouldn't make sense to use dm-crypt with pcrypt
it still needs to do the right thing.

Thanks,`
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [v2 PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-17  8:36                     ` Steffen Klassert
@ 2019-07-17  8:50                       ` Herbert Xu
  0 siblings, 0 replies; 42+ messages in thread
From: Herbert Xu @ 2019-07-17  8:50 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Wed, Jul 17, 2019 at 10:36:41AM +0200, Steffen Klassert wrote:
>
> > This patch fixes it by using RCU just as we do in padata_do_parallel.
> 
> RCU alone won't help because if some object is queued for async
> crypto, we left the RCU protected aera. I think padata_do_serial
> needs to do RCU and should free 'parallel_data' if the flag
> PADATA_RESET is set and the refcount goes to zero. padata_replace
> should do the same then.

Yes this patch doesn't work because you can't just switch over to
the new pd as the old pd will then get stuck due to the missing
entry.

> > index 5d13d25da2c8..ce51555cb86c 100644
> > @@ -367,7 +367,7 @@ void padata_do_serial(struct padata_priv *padata)
> >  	struct parallel_data *pd;
> >  	int reorder_via_wq = 0;
> >  
> > -	pd = padata->pd;
> > +	pd = rcu_dereference_bh(padata->inst->pd);
> 
> Why not just
> 
> pd = rcu_dereference_bh(padata->pd);

I was trying to get the new pd which could only come from the inst.
In any case the whole RCU idea doesn't work so we'll probably do the
refcount idea that you suggested.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Use RCU when fetching pd from do_serial
  2019-07-17  8:47                     ` Herbert Xu
@ 2019-07-17  8:53                       ` Steffen Klassert
  0 siblings, 0 replies; 42+ messages in thread
From: Steffen Klassert @ 2019-07-17  8:53 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, andrea.parri, boqun.feng, paulmck, peterz,
	linux-arch, linux-crypto, linux-kernel

On Wed, Jul 17, 2019 at 04:47:40PM +0800, Herbert Xu wrote:
> On Wed, Jul 17, 2019 at 10:28:15AM +0200, Steffen Klassert wrote:
> >
> > I had a patch to support crypto backlog some years ago,
> > but testing with dm-crypt did not show any performance
> > improvement. So I decided to just skip that patch because
> > it added code for no need.
> 
> Well pcrypt is part of the API so it needs to obey the rules.
> So even if it wouldn't make sense to use dm-crypt with pcrypt
> it still needs to do the right thing.

Ok, makes sense. The old patch still exists:

https://git.kernel.org/pub/scm/linux/kernel/git/klassert/linux-stk.git/commit/?h=net-next-pcrypt&id=5909a88ccef6f4c78fe9969160155a8f0ce8fee7

Let me see if I can respin it...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-16 16:32             ` [PATCH v2] " Daniel Jordan
@ 2019-07-17 11:11               ` Herbert Xu
  2019-07-17 18:32                 ` Daniel Jordan
  2019-07-17 23:21                 ` Daniel Jordan
  2019-07-18  5:42               ` [PATCH v2] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Herbert Xu
  1 sibling, 2 replies; 42+ messages in thread
From: Herbert Xu @ 2019-07-17 11:11 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, Andrea Parri, Boqun Feng, Paul E . McKenney,
	Peter Zijlstra, linux-arch, linux-crypto, linux-kernel,
	Mathias Krause

On Tue, Jul 16, 2019 at 12:32:53PM -0400, Daniel Jordan wrote:
> Testing padata with the tcrypt module on a 5.2 kernel...

Thanks for the patch!

And here is an incremental patch to get rid of the timer that
appears to be an attempt at fixing a problem related to this.

---8<---
The function padata_reorder will use a timer when it cannot progress
while completed jobs are outstanding (pd->reorder_objects > 0).  This
is suboptimal as if we do end up using the timer then it would have
introduced a gratuitous delay of one second.

In fact we can easily distinguish between whether completed jobs
are outstanding and whether we can make progress.  All we have to
do is look at the next pqueue list.

This patch does that by replacing pd->processed with pd->cpu so
that the next pqueue is more accessible.

A work queue is used instead of the original try_again to avoid
hogging the CPU.

Note that we don't bother removing the work queue in
padata_flush_queues because the whole premise is broken.  You
cannot flush async crypto requests so it makes no sense to even
try.  A subsequent patch will fix it by replacing it with a ref
counting scheme.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 5d13d25da2c8..d803397a28f7 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -24,7 +24,6 @@
 #include <linux/workqueue.h>
 #include <linux/spinlock.h>
 #include <linux/list.h>
-#include <linux/timer.h>
 #include <linux/notifier.h>
 #include <linux/kobject.h>
 
@@ -85,18 +84,14 @@ struct padata_serial_queue {
  * @serial: List to wait for serialization after reordering.
  * @pwork: work struct for parallelization.
  * @swork: work struct for serialization.
- * @pd: Backpointer to the internal control structure.
  * @work: work struct for parallelization.
- * @reorder_work: work struct for reordering.
  * @num_obj: Number of objects that are processed by this cpu.
  * @cpu_index: Index of the cpu.
  */
 struct padata_parallel_queue {
        struct padata_list    parallel;
        struct padata_list    reorder;
-       struct parallel_data *pd;
        struct work_struct    work;
-       struct work_struct    reorder_work;
        atomic_t              num_obj;
        int                   cpu_index;
 };
@@ -122,10 +117,10 @@ struct padata_cpumask {
  * @reorder_objects: Number of objects waiting in the reorder queues.
  * @refcnt: Number of objects holding a reference on this parallel_data.
  * @max_seq_nr:  Maximal used sequence number.
+ * @cpu: Next CPU to be processed.
  * @cpumask: The cpumasks in use for parallel and serial workers.
+ * @reorder_work: work struct for reordering.
  * @lock: Reorder lock.
- * @processed: Number of already processed objects.
- * @timer: Reorder timer.
  */
 struct parallel_data {
 	struct padata_instance		*pinst;
@@ -134,10 +129,10 @@ struct parallel_data {
 	atomic_t			reorder_objects;
 	atomic_t			refcnt;
 	atomic_t			seq_nr;
+	int				cpu;
 	struct padata_cpumask		cpumask;
+	struct work_struct		reorder_work;
 	spinlock_t                      lock ____cacheline_aligned;
-	unsigned int			processed;
-	struct timer_list		timer;
 };
 
 /**
diff --git a/kernel/padata.c b/kernel/padata.c
index 15a8ad63f4ff..b5dfc21e976f 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -165,23 +165,12 @@ EXPORT_SYMBOL(padata_do_parallel);
  */
 static struct padata_priv *padata_get_next(struct parallel_data *pd)
 {
-	int cpu, num_cpus;
-	unsigned int next_nr, next_index;
 	struct padata_parallel_queue *next_queue;
 	struct padata_priv *padata;
 	struct padata_list *reorder;
+	int cpu = pd->cpu;
 
-	num_cpus = cpumask_weight(pd->cpumask.pcpu);
-
-	/*
-	 * Calculate the percpu reorder queue and the sequence
-	 * number of the next object.
-	 */
-	next_nr = pd->processed;
-	next_index = next_nr % num_cpus;
-	cpu = padata_index_to_cpu(pd, next_index);
 	next_queue = per_cpu_ptr(pd->pqueue, cpu);
-
 	reorder = &next_queue->reorder;
 
 	spin_lock(&reorder->lock);
@@ -192,7 +181,8 @@ static struct padata_priv *padata_get_next(struct parallel_data *pd)
 		list_del_init(&padata->list);
 		atomic_dec(&pd->reorder_objects);
 
-		pd->processed++;
+		pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, 0,
+					    false);
 
 		spin_unlock(&reorder->lock);
 		goto out;
@@ -215,6 +205,7 @@ static void padata_reorder(struct parallel_data *pd)
 	struct padata_priv *padata;
 	struct padata_serial_queue *squeue;
 	struct padata_instance *pinst = pd->pinst;
+	struct padata_parallel_queue *next_queue;
 
 	/*
 	 * We need to ensure that only one cpu can work on dequeueing of
@@ -246,7 +237,6 @@ static void padata_reorder(struct parallel_data *pd)
 		 * so exit immediately.
 		 */
 		if (PTR_ERR(padata) == -ENODATA) {
-			del_timer(&pd->timer);
 			spin_unlock_bh(&pd->lock);
 			return;
 		}
@@ -265,70 +255,29 @@ static void padata_reorder(struct parallel_data *pd)
 
 	/*
 	 * The next object that needs serialization might have arrived to
-	 * the reorder queues in the meantime, we will be called again
-	 * from the timer function if no one else cares for it.
+	 * the reorder queues in the meantime.
 	 *
-	 * Ensure reorder_objects is read after pd->lock is dropped so we see
-	 * an increment from another task in padata_do_serial.  Pairs with
+	 * Ensure reorder queue is read after pd->lock is dropped so we see
+	 * new objects from another task in padata_do_serial.  Pairs with
 	 * smp_mb__after_atomic in padata_do_serial.
 	 */
 	smp_mb();
-	if (atomic_read(&pd->reorder_objects)
-			&& !(pinst->flags & PADATA_RESET))
-		mod_timer(&pd->timer, jiffies + HZ);
-	else
-		del_timer(&pd->timer);
 
-	return;
+	next_queue = per_cpu_ptr(pd->pqueue, pd->cpu);
+	if (!list_empty(&next_queue->reorder.list))
+		queue_work(pinst->wq, &pd->reorder_work);
 }
 
 static void invoke_padata_reorder(struct work_struct *work)
 {
-	struct padata_parallel_queue *pqueue;
 	struct parallel_data *pd;
 
 	local_bh_disable();
-	pqueue = container_of(work, struct padata_parallel_queue, reorder_work);
-	pd = pqueue->pd;
+	pd = container_of(work, struct parallel_data, reorder_work);
 	padata_reorder(pd);
 	local_bh_enable();
 }
 
-static void padata_reorder_timer(struct timer_list *t)
-{
-	struct parallel_data *pd = from_timer(pd, t, timer);
-	unsigned int weight;
-	int target_cpu, cpu;
-
-	cpu = get_cpu();
-
-	/* We don't lock pd here to not interfere with parallel processing
-	 * padata_reorder() calls on other CPUs. We just need any CPU out of
-	 * the cpumask.pcpu set. It would be nice if it's the right one but
-	 * it doesn't matter if we're off to the next one by using an outdated
-	 * pd->processed value.
-	 */
-	weight = cpumask_weight(pd->cpumask.pcpu);
-	target_cpu = padata_index_to_cpu(pd, pd->processed % weight);
-
-	/* ensure to call the reorder callback on the correct CPU */
-	if (cpu != target_cpu) {
-		struct padata_parallel_queue *pqueue;
-		struct padata_instance *pinst;
-
-		/* The timer function is serialized wrt itself -- no locking
-		 * needed.
-		 */
-		pinst = pd->pinst;
-		pqueue = per_cpu_ptr(pd->pqueue, target_cpu);
-		queue_work_on(target_cpu, pinst->wq, &pqueue->reorder_work);
-	} else {
-		padata_reorder(pd);
-	}
-
-	put_cpu();
-}
-
 static void padata_serial_worker(struct work_struct *serial_work)
 {
 	struct padata_serial_queue *squeue;
@@ -376,9 +325,8 @@ void padata_do_serial(struct padata_priv *padata)
 
 	cpu = get_cpu();
 
-	/* We need to run on the same CPU padata_do_parallel(.., padata, ..)
-	 * was called on -- or, at least, enqueue the padata object into the
-	 * correct per-cpu queue.
+	/* We need to enqueue the padata object into the correct
+	 * per-cpu queue.
 	 */
 	if (cpu != padata->cpu) {
 		reorder_via_wq = 1;
@@ -388,12 +336,12 @@ void padata_do_serial(struct padata_priv *padata)
 	pqueue = per_cpu_ptr(pd->pqueue, cpu);
 
 	spin_lock(&pqueue->reorder.lock);
-	atomic_inc(&pd->reorder_objects);
 	list_add_tail(&padata->list, &pqueue->reorder.list);
+	atomic_inc(&pd->reorder_objects);
 	spin_unlock(&pqueue->reorder.lock);
 
 	/*
-	 * Ensure the atomic_inc of reorder_objects above is ordered correctly
+	 * Ensure the addition to the reorder list is ordered correctly
 	 * with the trylock of pd->lock in padata_reorder.  Pairs with smp_mb
 	 * in padata_reorder.
 	 */
@@ -401,13 +349,7 @@ void padata_do_serial(struct padata_priv *padata)
 
 	put_cpu();
 
-	/* If we're running on the wrong CPU, call padata_reorder() via a
-	 * kernel worker.
-	 */
-	if (reorder_via_wq)
-		queue_work_on(cpu, pd->pinst->wq, &pqueue->reorder_work);
-	else
-		padata_reorder(pd);
+	padata_reorder(pd);
 }
 EXPORT_SYMBOL(padata_do_serial);
 
@@ -463,14 +405,12 @@ static void padata_init_pqueues(struct parallel_data *pd)
 			continue;
 		}
 
-		pqueue->pd = pd;
 		pqueue->cpu_index = cpu_index;
 		cpu_index++;
 
 		__padata_list_init(&pqueue->reorder);
 		__padata_list_init(&pqueue->parallel);
 		INIT_WORK(&pqueue->work, padata_parallel_worker);
-		INIT_WORK(&pqueue->reorder_work, invoke_padata_reorder);
 		atomic_set(&pqueue->num_obj, 0);
 	}
 }
@@ -498,12 +438,13 @@ static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst,
 
 	padata_init_pqueues(pd);
 	padata_init_squeues(pd);
-	timer_setup(&pd->timer, padata_reorder_timer, 0);
 	atomic_set(&pd->seq_nr, -1);
 	atomic_set(&pd->reorder_objects, 0);
 	atomic_set(&pd->refcnt, 0);
 	pd->pinst = pinst;
 	spin_lock_init(&pd->lock);
+	pd->cpu = cpumask_first(pcpumask);
+	INIT_WORK(&pd->reorder_work, invoke_padata_reorder);
 
 	return pd;
 
@@ -538,8 +479,6 @@ static void padata_flush_queues(struct parallel_data *pd)
 		flush_work(&pqueue->work);
 	}
 
-	del_timer_sync(&pd->timer);
-
 	if (atomic_read(&pd->reorder_objects))
 		padata_reorder(pd);
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-17 11:11               ` [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder Herbert Xu
@ 2019-07-17 18:32                 ` Daniel Jordan
  2019-07-18  3:31                   ` Herbert Xu
  2019-07-17 23:21                 ` Daniel Jordan
  1 sibling, 1 reply; 42+ messages in thread
From: Daniel Jordan @ 2019-07-17 18:32 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, Steffen Klassert, Andrea Parri, Boqun Feng,
	Paul E . McKenney, Peter Zijlstra, linux-arch, linux-crypto,
	linux-kernel, Mathias Krause

On Wed, Jul 17, 2019 at 07:11:47PM +0800, Herbert Xu wrote:
> On Tue, Jul 16, 2019 at 12:32:53PM -0400, Daniel Jordan wrote:
> > Testing padata with the tcrypt module on a 5.2 kernel...
> 
> Thanks for the patch!
> 
> And here is an incremental patch to get rid of the timer that
> appears to be an attempt at fixing a problem related to this.

Nice, +1 for getting rid of the timer.

> diff --git a/kernel/padata.c b/kernel/padata.c
> index 15a8ad63f4ff..b5dfc21e976f 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -165,23 +165,12 @@ EXPORT_SYMBOL(padata_do_parallel);
>   */
>  static struct padata_priv *padata_get_next(struct parallel_data *pd)
>  {
> -	int cpu, num_cpus;
> -	unsigned int next_nr, next_index;
>  	struct padata_parallel_queue *next_queue;
>  	struct padata_priv *padata;
>  	struct padata_list *reorder;
> +	int cpu = pd->cpu;
>  
> -	num_cpus = cpumask_weight(pd->cpumask.pcpu);
> -
> -	/*
> -	 * Calculate the percpu reorder queue and the sequence
> -	 * number of the next object.
> -	 */
> -	next_nr = pd->processed;
> -	next_index = next_nr % num_cpus;
> -	cpu = padata_index_to_cpu(pd, next_index);
>  	next_queue = per_cpu_ptr(pd->pqueue, cpu);
> -
>  	reorder = &next_queue->reorder;
>  
>  	spin_lock(&reorder->lock);
> @@ -192,7 +181,8 @@ static struct padata_priv *padata_get_next(struct parallel_data *pd)
>  		list_del_init(&padata->list);
>  		atomic_dec(&pd->reorder_objects);
>  
> -		pd->processed++;
> +		pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, 0,
> +					    false);

We'll crash when cpumask_next_wrap returns nr_cpumask_bits and later try to get
the corresponding per-cpu queue.

This handles that as well as the case where there's only 1 CPU in the parallel
mask:

diff --git a/kernel/padata.c b/kernel/padata.c
index b5dfc21e976f..ab352839df04 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -181,8 +181,10 @@ static struct padata_priv *padata_get_next(struct parallel_data *pd)
 		list_del_init(&padata->list);
 		atomic_dec(&pd->reorder_objects);
 
-		pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, 0,
-					    false);
+		if (cpumask_weight(pd->cpumask.pcpu) > 1) {
+			pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, cpu,
+						    false);
+		}
 
 		spin_unlock(&reorder->lock);
 		goto out;



Haven't finished looking at the patch, but have to run somewhere for now, will
pick it up later today.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-17 11:11               ` [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder Herbert Xu
  2019-07-17 18:32                 ` Daniel Jordan
@ 2019-07-17 23:21                 ` Daniel Jordan
  2019-07-18  3:30                   ` Herbert Xu
  1 sibling, 1 reply; 42+ messages in thread
From: Daniel Jordan @ 2019-07-17 23:21 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, Steffen Klassert, Andrea Parri, Boqun Feng,
	Paul E . McKenney, Peter Zijlstra, linux-arch, linux-crypto,
	linux-kernel, Mathias Krause

On Wed, Jul 17, 2019 at 07:11:47PM +0800, Herbert Xu wrote:
> Note that we don't bother removing the work queue in
> padata_flush_queues because the whole premise is broken.  You
> cannot flush async crypto requests so it makes no sense to even
> try.  A subsequent patch will fix it by replacing it with a ref
> counting scheme.

Interested to see what happens with the ref counting.

You mean you don't bother removing the serial workqueue flushing, right, not
the parallel?

> @@ -122,10 +117,10 @@ struct padata_cpumask {
>   * @reorder_objects: Number of objects waiting in the reorder queues.
>   * @refcnt: Number of objects holding a reference on this parallel_data.
>   * @max_seq_nr:  Maximal used sequence number.
> + * @cpu: Next CPU to be processed.

Maybe something more specific...

      @cpu: CPU of the next reorder queue to process.

>  static struct padata_priv *padata_get_next(struct parallel_data *pd)
>  {
> -	int cpu, num_cpus;
> -	unsigned int next_nr, next_index;
>  	struct padata_parallel_queue *next_queue;
>  	struct padata_priv *padata;
>  	struct padata_list *reorder;
> +	int cpu = pd->cpu;
>  
> -	num_cpus = cpumask_weight(pd->cpumask.pcpu);
> -
> -	/*
> -	 * Calculate the percpu reorder queue and the sequence
> -	 * number of the next object.
> -	 */
> -	next_nr = pd->processed;
> -	next_index = next_nr % num_cpus;
> -	cpu = padata_index_to_cpu(pd, next_index);

After this patch padata_index_to_cpu has only one caller, so it doesn't need to
be a function anymore.

> @@ -246,7 +237,6 @@ static void padata_reorder(struct parallel_data *pd)
>  		 * so exit immediately.
>  		 */
>  		if (PTR_ERR(padata) == -ENODATA) {
> -			del_timer(&pd->timer);
>  			spin_unlock_bh(&pd->lock);
>  			return;
>  		}
> @@ -265,70 +255,29 @@ static void padata_reorder(struct parallel_data *pd)
>  
>  	/*
>  	 * The next object that needs serialization might have arrived to
> -	 * the reorder queues in the meantime, we will be called again
> -	 * from the timer function if no one else cares for it.
> +	 * the reorder queues in the meantime.
>  	 *
> -	 * Ensure reorder_objects is read after pd->lock is dropped so we see
> -	 * an increment from another task in padata_do_serial.  Pairs with
> +	 * Ensure reorder queue is read after pd->lock is dropped so we see
> +	 * new objects from another task in padata_do_serial.  Pairs with
>  	 * smp_mb__after_atomic in padata_do_serial.
>  	 */
>  	smp_mb();
> -	if (atomic_read(&pd->reorder_objects)
> -			&& !(pinst->flags & PADATA_RESET))
> -		mod_timer(&pd->timer, jiffies + HZ);
> -	else
> -		del_timer(&pd->timer);
>  
> -	return;
> +	next_queue = per_cpu_ptr(pd->pqueue, pd->cpu);
> +	if (!list_empty(&next_queue->reorder.list))
> +		queue_work(pinst->wq, &pd->reorder_work);

It's possible that the work gets queued when it doesn't need to be when another
task adds a job to the reorder queue but hasn't grabbed pd->lock yet, but I
can't think of a way around it...and it does no harm anyway.

> @@ -376,9 +325,8 @@ void padata_do_serial(struct padata_priv *padata)
>  
>  	cpu = get_cpu();
>  
> -	/* We need to run on the same CPU padata_do_parallel(.., padata, ..)
> -	 * was called on -- or, at least, enqueue the padata object into the
> -	 * correct per-cpu queue.
> +	/* We need to enqueue the padata object into the correct
> +	 * per-cpu queue.
>  	 */
>  	if (cpu != padata->cpu) {
>  		reorder_via_wq = 1;

reorder_via_wq and I think get_cpu/put_cpu can go away now that we're always
using padata->cpu to get the parallel queue and then running padata_reorder in
the current task.

Maybe Steffen can check my reasoning on the get_cpu thing.  It looks like that
was added in the original padata commit to keep 'cpu' stable for getting the
parallel queue but is no longer needed because we just use padata->cpu.

> @@ -388,12 +336,12 @@ void padata_do_serial(struct padata_priv *padata)
>  	pqueue = per_cpu_ptr(pd->pqueue, cpu);
>  
>  	spin_lock(&pqueue->reorder.lock);
> -	atomic_inc(&pd->reorder_objects);
>  	list_add_tail(&padata->list, &pqueue->reorder.list);
> +	atomic_inc(&pd->reorder_objects);

Why switch the lines?  Seems ok to not do this.

> @@ -538,8 +479,6 @@ static void padata_flush_queues(struct parallel_data *pd)
>  		flush_work(&pqueue->work);
>  	}
>  
> -	del_timer_sync(&pd->timer);
> -

>  	if (atomic_read(&pd->reorder_objects))
>  		padata_reorder(pd);

I think we can do away with reorder_objects entirely by checking pd->cpu's
reorder queue here.

It's racy to read pd->cpu without pd->lock, but it doesn't matter.  If there
are objects left to process and no other task is in padata_reorder, this path
will notice that, and if there's another task in padata_reorder changing
pd->cpu from under us, that task will finish the reordering so this path
doesn't have to.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-17 23:21                 ` Daniel Jordan
@ 2019-07-18  3:30                   ` Herbert Xu
  2019-07-18 14:25                     ` Daniel Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-18  3:30 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, Andrea Parri, Boqun Feng, Paul E . McKenney,
	Peter Zijlstra, linux-arch, linux-crypto, linux-kernel,
	Mathias Krause

On Wed, Jul 17, 2019 at 07:21:36PM -0400, Daniel Jordan wrote:
>
> > @@ -388,12 +336,12 @@ void padata_do_serial(struct padata_priv *padata)
> >  	pqueue = per_cpu_ptr(pd->pqueue, cpu);
> >  
> >  	spin_lock(&pqueue->reorder.lock);
> > -	atomic_inc(&pd->reorder_objects);
> >  	list_add_tail(&padata->list, &pqueue->reorder.list);
> > +	atomic_inc(&pd->reorder_objects);
> 
> Why switch the lines?  Seems ok to not do this.

This is crucial because otherwise the memory barrier won't apply
to the list insertion.  With this patch, we are now using the list
insertion as the indicator, rather than reorder_objects.

> > @@ -538,8 +479,6 @@ static void padata_flush_queues(struct parallel_data *pd)
> >  		flush_work(&pqueue->work);
> >  	}
> >  
> > -	del_timer_sync(&pd->timer);
> > -
> 
> >  	if (atomic_read(&pd->reorder_objects))
> >  		padata_reorder(pd);
> 
> I think we can do away with reorder_objects entirely by checking pd->cpu's
> reorder queue here.

As I said this will probably disappear altogether since we can't
guarantee that padata_reorder will actually do anything if the
jobs are stuck in async crypto processing.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-17 18:32                 ` Daniel Jordan
@ 2019-07-18  3:31                   ` Herbert Xu
  2019-07-18 14:27                     ` Daniel Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-18  3:31 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, Andrea Parri, Boqun Feng, Paul E . McKenney,
	Peter Zijlstra, linux-arch, linux-crypto, linux-kernel,
	Mathias Krause

On Wed, Jul 17, 2019 at 02:32:27PM -0400, Daniel Jordan wrote:
>
> We'll crash when cpumask_next_wrap returns nr_cpumask_bits and later try to get
> the corresponding per-cpu queue.

The whole point of cpumask_next_wrap is to wrap around to the
beginning when it hits nr_cpumask_bits.  So it cannot return
nr_cpumask_bits.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs
  2019-07-16 16:32             ` [PATCH v2] " Daniel Jordan
  2019-07-17 11:11               ` [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder Herbert Xu
@ 2019-07-18  5:42               ` Herbert Xu
  1 sibling, 0 replies; 42+ messages in thread
From: Herbert Xu @ 2019-07-18  5:42 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, Andrea Parri, Boqun Feng, Paul E . McKenney,
	Peter Zijlstra, linux-arch, linux-crypto, linux-kernel

On Tue, Jul 16, 2019 at 12:32:53PM -0400, Daniel Jordan wrote:
> Testing padata with the tcrypt module on a 5.2 kernel...
> 
>     # modprobe tcrypt alg="pcrypt(rfc4106(gcm(aes)))" type=3
>     # modprobe tcrypt mode=211 sec=1
> 
> ...produces this splat:
> 
>     INFO: task modprobe:10075 blocked for more than 120 seconds.
>           Not tainted 5.2.0-base+ #16
>     modprobe        D    0 10075  10064 0x80004080
>     Call Trace:
>      ? __schedule+0x4dd/0x610
>      ? ring_buffer_unlock_commit+0x23/0x100
>      schedule+0x6c/0x90
>      schedule_timeout+0x3b/0x320
>      ? trace_buffer_unlock_commit_regs+0x4f/0x1f0
>      wait_for_common+0x160/0x1a0
>      ? wake_up_q+0x80/0x80
>      { crypto_wait_req }             # entries in braces added by hand
>      { do_one_aead_op }
>      { test_aead_jiffies }
>      test_aead_speed.constprop.17+0x681/0xf30 [tcrypt]
>      do_test+0x4053/0x6a2b [tcrypt]
>      ? 0xffffffffa00f4000
>      tcrypt_mod_init+0x50/0x1000 [tcrypt]
>      ...
> 
> The second modprobe command never finishes because in padata_reorder,
> CPU0's load of reorder_objects is executed before the unlocking store in
> spin_unlock_bh(pd->lock), causing CPU0 to miss CPU1's increment:
> 
> CPU0                                 CPU1
> 
> padata_reorder                       padata_do_serial
>   LOAD reorder_objects  // 0
>                                        INC reorder_objects  // 1
>                                        padata_reorder
>                                          TRYLOCK pd->lock   // failed
>   UNLOCK pd->lock
> 
> CPU0 deletes the timer before returning from padata_reorder and since no
> other job is submitted to padata, modprobe waits indefinitely.
> 
> Add a pair of full barriers to guarantee proper ordering:
> 
> CPU0                                 CPU1
> 
> padata_reorder                       padata_do_serial
>   UNLOCK pd->lock
>   smp_mb()
>   LOAD reorder_objects
>                                        INC reorder_objects
>                                        smp_mb__after_atomic()
>                                        padata_reorder
>                                          TRYLOCK pd->lock
> 
> smp_mb__after_atomic is needed so the read part of the trylock operation
> comes after the INC, as Andrea points out.   Thanks also to Andrea for
> help with writing a litmus test.
> 
> Fixes: 16295bec6398 ("padata: Generic parallelization/serialization interface")
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Paul E. McKenney <paulmck@linux.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-crypto@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  kernel/padata.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-18  3:30                   ` Herbert Xu
@ 2019-07-18 14:25                     ` Daniel Jordan
  2019-07-18 14:49                       ` Herbert Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel Jordan @ 2019-07-18 14:25 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, Steffen Klassert, Andrea Parri, Boqun Feng,
	Paul E . McKenney, Peter Zijlstra, linux-arch, linux-crypto,
	linux-kernel, Mathias Krause

On Thu, Jul 18, 2019 at 11:30:08AM +0800, Herbert Xu wrote:
> On Wed, Jul 17, 2019 at 07:21:36PM -0400, Daniel Jordan wrote:
> >
> > > @@ -388,12 +336,12 @@ void padata_do_serial(struct padata_priv *padata)
> > >  	pqueue = per_cpu_ptr(pd->pqueue, cpu);
> > >  
> > >  	spin_lock(&pqueue->reorder.lock);
> > > -	atomic_inc(&pd->reorder_objects);
> > >  	list_add_tail(&padata->list, &pqueue->reorder.list);
> > > +	atomic_inc(&pd->reorder_objects);
> > 
> > Why switch the lines?  Seems ok to not do this.
> 
> This is crucial because otherwise the memory barrier won't apply
> to the list insertion.  With this patch, we are now using the list
> insertion as the indicator, rather than reorder_objects.

Which memory barrier do you mean?  I think you're referring to the one that
atomic_inc might provide?  If so, the memory model maintainers can correct me
here, but my understanding is that RMW atomic ops that don't return values are
unordered, so switching the lines has no effect.

Besides, the smp_mb__after_atomic is what orders the list insertion with the
trylock of pd->lock.

> > > @@ -538,8 +479,6 @@ static void padata_flush_queues(struct parallel_data *pd)
> > >  		flush_work(&pqueue->work);
> > >  	}
> > >  
> > > -	del_timer_sync(&pd->timer);
> > > -
> > 
> > >  	if (atomic_read(&pd->reorder_objects))
> > >  		padata_reorder(pd);
> > 
> > I think we can do away with reorder_objects entirely by checking pd->cpu's
> > reorder queue here.
> 
> As I said this will probably disappear altogether since we can't
> guarantee that padata_reorder will actually do anything if the
> jobs are stuck in async crypto processing.

Ok, makes sense.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-18  3:31                   ` Herbert Xu
@ 2019-07-18 14:27                     ` Daniel Jordan
  2019-07-18 14:56                       ` Herbert Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel Jordan @ 2019-07-18 14:27 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, Steffen Klassert, Andrea Parri, Boqun Feng,
	Paul E . McKenney, Peter Zijlstra, linux-arch, linux-crypto,
	linux-kernel, Mathias Krause

On Thu, Jul 18, 2019 at 11:31:31AM +0800, Herbert Xu wrote:
> On Wed, Jul 17, 2019 at 02:32:27PM -0400, Daniel Jordan wrote:
> >
> > We'll crash when cpumask_next_wrap returns nr_cpumask_bits and later try to get
> > the corresponding per-cpu queue.
> 
> The whole point of cpumask_next_wrap is to wrap around to the
> beginning when it hits nr_cpumask_bits.  So it cannot return
> nr_cpumask_bits.

That's what I expected when I first saw it too, but nr_cpumask_bits is returned
to signal the end of the iteration.  The patch always passes 0 for the 'start'
argument, so when cpumask_next_wrap is called with the last cpu in the mask,
the end-of-iteration case is triggered.  To reassure you and myself :) I ran it
and got the expected crash.

Passing pd->cpu for the start argument instead avoids that problem, but the
one-cpu-in-mask case still needs handling because cpumask_next_wrap always
signals end of iteration for that, hence the cpumask_weight check.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-18 14:25                     ` Daniel Jordan
@ 2019-07-18 14:49                       ` Herbert Xu
  2019-07-19 14:21                         ` Daniel Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-18 14:49 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, Andrea Parri, Boqun Feng, Paul E . McKenney,
	Peter Zijlstra, linux-arch, linux-crypto, linux-kernel,
	Mathias Krause

On Thu, Jul 18, 2019 at 10:25:15AM -0400, Daniel Jordan wrote:
>
> Which memory barrier do you mean?  I think you're referring to the one that
> atomic_inc might provide?  If so, the memory model maintainers can correct me
> here, but my understanding is that RMW atomic ops that don't return values are
> unordered, so switching the lines has no effect.
> 
> Besides, the smp_mb__after_atomic is what orders the list insertion with the
> trylock of pd->lock.

The primitive smp_mb__after_atomic only provides a barrier when
used in conjunction with atomic_inc (and similar atomic ops).

The actual barrier may either be in smp_mb__after_atomic or the
atomic op itself (which is the case on x86).  Since we need the
barrier to occur after the list insertion we must move both of
these after the list_add_tail.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-18 14:27                     ` Daniel Jordan
@ 2019-07-18 14:56                       ` Herbert Xu
  2019-07-18 15:01                         ` [v2 PATCH] " Herbert Xu
  2019-07-19 14:27                         ` [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder Daniel Jordan
  0 siblings, 2 replies; 42+ messages in thread
From: Herbert Xu @ 2019-07-18 14:56 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, Andrea Parri, Boqun Feng, Paul E . McKenney,
	Peter Zijlstra, linux-arch, linux-crypto, linux-kernel,
	Mathias Krause

On Thu, Jul 18, 2019 at 10:27:30AM -0400, Daniel Jordan wrote:
>
> That's what I expected when I first saw it too, but nr_cpumask_bits is returned
> to signal the end of the iteration.  The patch always passes 0 for the 'start'
> argument, so when cpumask_next_wrap is called with the last cpu in the mask,
> the end-of-iteration case is triggered.  To reassure you and myself :) I ran it
> and got the expected crash.
> 
> Passing pd->cpu for the start argument instead avoids that problem, but the
> one-cpu-in-mask case still needs handling because cpumask_next_wrap always
> signals end of iteration for that, hence the cpumask_weight check.

My bad.  I should have set start to -1 to make it do the right thing.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [v2 PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-18 14:56                       ` Herbert Xu
@ 2019-07-18 15:01                         ` Herbert Xu
  2019-07-19 14:37                           ` Daniel Jordan
  2019-07-19 14:27                         ` [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder Daniel Jordan
  1 sibling, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-18 15:01 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, Andrea Parri, Boqun Feng, Paul E . McKenney,
	Peter Zijlstra, linux-arch, linux-crypto, linux-kernel,
	Mathias Krause

The function padata_reorder will use a timer when it cannot progress
while completed jobs are outstanding (pd->reorder_objects > 0).  This
is suboptimal as if we do end up using the timer then it would have
introduced a gratuitous delay of one second.

In fact we can easily distinguish between whether completed jobs
are outstanding and whether we can make progress.  All we have to
do is look at the next pqueue list.

This patch does that by replacing pd->processed with pd->cpu so
that the next pqueue is more accessible.

A work queue is used instead of the original try_again to avoid
hogging the CPU.

Note that we don't bother removing the work queue in
padata_flush_queues because the whole premise is broken.  You
cannot flush async crypto requests so it makes no sense to even
try.  A subsequent patch will fix it by replacing it with a ref
counting scheme.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/include/linux/padata.h b/include/linux/padata.h
index 5d13d25da2c8..d803397a28f7 100644
--- a/include/linux/padata.h
+++ b/include/linux/padata.h
@@ -24,7 +24,6 @@
 #include <linux/workqueue.h>
 #include <linux/spinlock.h>
 #include <linux/list.h>
-#include <linux/timer.h>
 #include <linux/notifier.h>
 #include <linux/kobject.h>
 
@@ -85,18 +84,14 @@ struct padata_serial_queue {
  * @serial: List to wait for serialization after reordering.
  * @pwork: work struct for parallelization.
  * @swork: work struct for serialization.
- * @pd: Backpointer to the internal control structure.
  * @work: work struct for parallelization.
- * @reorder_work: work struct for reordering.
  * @num_obj: Number of objects that are processed by this cpu.
  * @cpu_index: Index of the cpu.
  */
 struct padata_parallel_queue {
        struct padata_list    parallel;
        struct padata_list    reorder;
-       struct parallel_data *pd;
        struct work_struct    work;
-       struct work_struct    reorder_work;
        atomic_t              num_obj;
        int                   cpu_index;
 };
@@ -122,10 +117,10 @@ struct padata_cpumask {
  * @reorder_objects: Number of objects waiting in the reorder queues.
  * @refcnt: Number of objects holding a reference on this parallel_data.
  * @max_seq_nr:  Maximal used sequence number.
+ * @cpu: Next CPU to be processed.
  * @cpumask: The cpumasks in use for parallel and serial workers.
+ * @reorder_work: work struct for reordering.
  * @lock: Reorder lock.
- * @processed: Number of already processed objects.
- * @timer: Reorder timer.
  */
 struct parallel_data {
 	struct padata_instance		*pinst;
@@ -134,10 +129,10 @@ struct parallel_data {
 	atomic_t			reorder_objects;
 	atomic_t			refcnt;
 	atomic_t			seq_nr;
+	int				cpu;
 	struct padata_cpumask		cpumask;
+	struct work_struct		reorder_work;
 	spinlock_t                      lock ____cacheline_aligned;
-	unsigned int			processed;
-	struct timer_list		timer;
 };
 
 /**
diff --git a/kernel/padata.c b/kernel/padata.c
index 15a8ad63f4ff..fbafca18597f 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -165,23 +165,12 @@ EXPORT_SYMBOL(padata_do_parallel);
  */
 static struct padata_priv *padata_get_next(struct parallel_data *pd)
 {
-	int cpu, num_cpus;
-	unsigned int next_nr, next_index;
 	struct padata_parallel_queue *next_queue;
 	struct padata_priv *padata;
 	struct padata_list *reorder;
+	int cpu = pd->cpu;
 
-	num_cpus = cpumask_weight(pd->cpumask.pcpu);
-
-	/*
-	 * Calculate the percpu reorder queue and the sequence
-	 * number of the next object.
-	 */
-	next_nr = pd->processed;
-	next_index = next_nr % num_cpus;
-	cpu = padata_index_to_cpu(pd, next_index);
 	next_queue = per_cpu_ptr(pd->pqueue, cpu);
-
 	reorder = &next_queue->reorder;
 
 	spin_lock(&reorder->lock);
@@ -192,7 +181,8 @@ static struct padata_priv *padata_get_next(struct parallel_data *pd)
 		list_del_init(&padata->list);
 		atomic_dec(&pd->reorder_objects);
 
-		pd->processed++;
+		pd->cpu = cpumask_next_wrap(cpu, pd->cpumask.pcpu, -1,
+					    false);
 
 		spin_unlock(&reorder->lock);
 		goto out;
@@ -215,6 +205,7 @@ static void padata_reorder(struct parallel_data *pd)
 	struct padata_priv *padata;
 	struct padata_serial_queue *squeue;
 	struct padata_instance *pinst = pd->pinst;
+	struct padata_parallel_queue *next_queue;
 
 	/*
 	 * We need to ensure that only one cpu can work on dequeueing of
@@ -246,7 +237,6 @@ static void padata_reorder(struct parallel_data *pd)
 		 * so exit immediately.
 		 */
 		if (PTR_ERR(padata) == -ENODATA) {
-			del_timer(&pd->timer);
 			spin_unlock_bh(&pd->lock);
 			return;
 		}
@@ -265,70 +255,29 @@ static void padata_reorder(struct parallel_data *pd)
 
 	/*
 	 * The next object that needs serialization might have arrived to
-	 * the reorder queues in the meantime, we will be called again
-	 * from the timer function if no one else cares for it.
+	 * the reorder queues in the meantime.
 	 *
-	 * Ensure reorder_objects is read after pd->lock is dropped so we see
-	 * an increment from another task in padata_do_serial.  Pairs with
+	 * Ensure reorder queue is read after pd->lock is dropped so we see
+	 * new objects from another task in padata_do_serial.  Pairs with
 	 * smp_mb__after_atomic in padata_do_serial.
 	 */
 	smp_mb();
-	if (atomic_read(&pd->reorder_objects)
-			&& !(pinst->flags & PADATA_RESET))
-		mod_timer(&pd->timer, jiffies + HZ);
-	else
-		del_timer(&pd->timer);
 
-	return;
+	next_queue = per_cpu_ptr(pd->pqueue, pd->cpu);
+	if (!list_empty(&next_queue->reorder.list))
+		queue_work(pinst->wq, &pd->reorder_work);
 }
 
 static void invoke_padata_reorder(struct work_struct *work)
 {
-	struct padata_parallel_queue *pqueue;
 	struct parallel_data *pd;
 
 	local_bh_disable();
-	pqueue = container_of(work, struct padata_parallel_queue, reorder_work);
-	pd = pqueue->pd;
+	pd = container_of(work, struct parallel_data, reorder_work);
 	padata_reorder(pd);
 	local_bh_enable();
 }
 
-static void padata_reorder_timer(struct timer_list *t)
-{
-	struct parallel_data *pd = from_timer(pd, t, timer);
-	unsigned int weight;
-	int target_cpu, cpu;
-
-	cpu = get_cpu();
-
-	/* We don't lock pd here to not interfere with parallel processing
-	 * padata_reorder() calls on other CPUs. We just need any CPU out of
-	 * the cpumask.pcpu set. It would be nice if it's the right one but
-	 * it doesn't matter if we're off to the next one by using an outdated
-	 * pd->processed value.
-	 */
-	weight = cpumask_weight(pd->cpumask.pcpu);
-	target_cpu = padata_index_to_cpu(pd, pd->processed % weight);
-
-	/* ensure to call the reorder callback on the correct CPU */
-	if (cpu != target_cpu) {
-		struct padata_parallel_queue *pqueue;
-		struct padata_instance *pinst;
-
-		/* The timer function is serialized wrt itself -- no locking
-		 * needed.
-		 */
-		pinst = pd->pinst;
-		pqueue = per_cpu_ptr(pd->pqueue, target_cpu);
-		queue_work_on(target_cpu, pinst->wq, &pqueue->reorder_work);
-	} else {
-		padata_reorder(pd);
-	}
-
-	put_cpu();
-}
-
 static void padata_serial_worker(struct work_struct *serial_work)
 {
 	struct padata_serial_queue *squeue;
@@ -376,9 +325,8 @@ void padata_do_serial(struct padata_priv *padata)
 
 	cpu = get_cpu();
 
-	/* We need to run on the same CPU padata_do_parallel(.., padata, ..)
-	 * was called on -- or, at least, enqueue the padata object into the
-	 * correct per-cpu queue.
+	/* We need to enqueue the padata object into the correct
+	 * per-cpu queue.
 	 */
 	if (cpu != padata->cpu) {
 		reorder_via_wq = 1;
@@ -388,12 +336,12 @@ void padata_do_serial(struct padata_priv *padata)
 	pqueue = per_cpu_ptr(pd->pqueue, cpu);
 
 	spin_lock(&pqueue->reorder.lock);
-	atomic_inc(&pd->reorder_objects);
 	list_add_tail(&padata->list, &pqueue->reorder.list);
+	atomic_inc(&pd->reorder_objects);
 	spin_unlock(&pqueue->reorder.lock);
 
 	/*
-	 * Ensure the atomic_inc of reorder_objects above is ordered correctly
+	 * Ensure the addition to the reorder list is ordered correctly
 	 * with the trylock of pd->lock in padata_reorder.  Pairs with smp_mb
 	 * in padata_reorder.
 	 */
@@ -401,13 +349,7 @@ void padata_do_serial(struct padata_priv *padata)
 
 	put_cpu();
 
-	/* If we're running on the wrong CPU, call padata_reorder() via a
-	 * kernel worker.
-	 */
-	if (reorder_via_wq)
-		queue_work_on(cpu, pd->pinst->wq, &pqueue->reorder_work);
-	else
-		padata_reorder(pd);
+	padata_reorder(pd);
 }
 EXPORT_SYMBOL(padata_do_serial);
 
@@ -463,14 +405,12 @@ static void padata_init_pqueues(struct parallel_data *pd)
 			continue;
 		}
 
-		pqueue->pd = pd;
 		pqueue->cpu_index = cpu_index;
 		cpu_index++;
 
 		__padata_list_init(&pqueue->reorder);
 		__padata_list_init(&pqueue->parallel);
 		INIT_WORK(&pqueue->work, padata_parallel_worker);
-		INIT_WORK(&pqueue->reorder_work, invoke_padata_reorder);
 		atomic_set(&pqueue->num_obj, 0);
 	}
 }
@@ -498,12 +438,13 @@ static struct parallel_data *padata_alloc_pd(struct padata_instance *pinst,
 
 	padata_init_pqueues(pd);
 	padata_init_squeues(pd);
-	timer_setup(&pd->timer, padata_reorder_timer, 0);
 	atomic_set(&pd->seq_nr, -1);
 	atomic_set(&pd->reorder_objects, 0);
 	atomic_set(&pd->refcnt, 0);
 	pd->pinst = pinst;
 	spin_lock_init(&pd->lock);
+	pd->cpu = cpumask_first(pcpumask);
+	INIT_WORK(&pd->reorder_work, invoke_padata_reorder);
 
 	return pd;
 
@@ -538,8 +479,6 @@ static void padata_flush_queues(struct parallel_data *pd)
 		flush_work(&pqueue->work);
 	}
 
-	del_timer_sync(&pd->timer);
-
 	if (atomic_read(&pd->reorder_objects))
 		padata_reorder(pd);
 
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-18 14:49                       ` Herbert Xu
@ 2019-07-19 14:21                         ` Daniel Jordan
  0 siblings, 0 replies; 42+ messages in thread
From: Daniel Jordan @ 2019-07-19 14:21 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, Steffen Klassert, Andrea Parri, Boqun Feng,
	Paul E . McKenney, Peter Zijlstra, linux-arch, linux-crypto,
	linux-kernel, Mathias Krause

On Thu, Jul 18, 2019 at 10:49:50PM +0800, Herbert Xu wrote:
> On Thu, Jul 18, 2019 at 10:25:15AM -0400, Daniel Jordan wrote:
> >
> > Which memory barrier do you mean?  I think you're referring to the one that
> > atomic_inc might provide?  If so, the memory model maintainers can correct me
> > here, but my understanding is that RMW atomic ops that don't return values are
> > unordered, so switching the lines has no effect.
> > 
> > Besides, the smp_mb__after_atomic is what orders the list insertion with the
> > trylock of pd->lock.
> 
> The primitive smp_mb__after_atomic only provides a barrier when
> used in conjunction with atomic_inc (and similar atomic ops).
> 
> The actual barrier may either be in smp_mb__after_atomic or the
> atomic op itself (which is the case on x86).  Since we need the
> barrier to occur after the list insertion we must move both of
> these after the list_add_tail.

Yes, my mistake!  Thanks for clarifying that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-18 14:56                       ` Herbert Xu
  2019-07-18 15:01                         ` [v2 PATCH] " Herbert Xu
@ 2019-07-19 14:27                         ` Daniel Jordan
  1 sibling, 0 replies; 42+ messages in thread
From: Daniel Jordan @ 2019-07-19 14:27 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, Steffen Klassert, Andrea Parri, Boqun Feng,
	Paul E . McKenney, Peter Zijlstra, linux-arch, linux-crypto,
	linux-kernel, Mathias Krause

On Thu, Jul 18, 2019 at 10:56:34PM +0800, Herbert Xu wrote:
> On Thu, Jul 18, 2019 at 10:27:30AM -0400, Daniel Jordan wrote:
> >
> > That's what I expected when I first saw it too, but nr_cpumask_bits is returned
> > to signal the end of the iteration.  The patch always passes 0 for the 'start'
> > argument, so when cpumask_next_wrap is called with the last cpu in the mask,
> > the end-of-iteration case is triggered.  To reassure you and myself :) I ran it
> > and got the expected crash.
> > 
> > Passing pd->cpu for the start argument instead avoids that problem, but the
> > one-cpu-in-mask case still needs handling because cpumask_next_wrap always
> > signals end of iteration for that, hence the cpumask_weight check.
> 
> My bad.  I should have set start to -1 to make it do the right thing.

Oh, you're right, that's nicer, just noticed other callers do it that way as
well.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [v2 PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-18 15:01                         ` [v2 PATCH] " Herbert Xu
@ 2019-07-19 14:37                           ` Daniel Jordan
  2019-07-19 14:55                             ` Herbert Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel Jordan @ 2019-07-19 14:37 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Jordan, Steffen Klassert, Andrea Parri, Boqun Feng,
	Paul E . McKenney, Peter Zijlstra, linux-arch, linux-crypto,
	linux-kernel, Mathias Krause

On Thu, Jul 18, 2019 at 11:01:46PM +0800, Herbert Xu wrote:
> @@ -376,9 +325,8 @@ void padata_do_serial(struct padata_priv *padata)
>  
>  	cpu = get_cpu();
>  
> -	/* We need to run on the same CPU padata_do_parallel(.., padata, ..)
> -	 * was called on -- or, at least, enqueue the padata object into the
> -	 * correct per-cpu queue.
> +	/* We need to enqueue the padata object into the correct
> +	 * per-cpu queue.
>  	 */
>  	if (cpu != padata->cpu) {
>  		reorder_via_wq = 1;
> @@ -388,12 +336,12 @@ void padata_do_serial(struct padata_priv *padata)
>  	pqueue = per_cpu_ptr(pd->pqueue, cpu);
>  
>  	spin_lock(&pqueue->reorder.lock);
> -	atomic_inc(&pd->reorder_objects);
>  	list_add_tail(&padata->list, &pqueue->reorder.list);
> +	atomic_inc(&pd->reorder_objects);
>  	spin_unlock(&pqueue->reorder.lock);
>  
>  	/*
> -	 * Ensure the atomic_inc of reorder_objects above is ordered correctly
> +	 * Ensure the addition to the reorder list is ordered correctly
>  	 * with the trylock of pd->lock in padata_reorder.  Pairs with smp_mb
>  	 * in padata_reorder.
>  	 */
> @@ -401,13 +349,7 @@ void padata_do_serial(struct padata_priv *padata)
>  
>  	put_cpu();
>  
> -	/* If we're running on the wrong CPU, call padata_reorder() via a
> -	 * kernel worker.
> -	 */
> -	if (reorder_via_wq)
> -		queue_work_on(cpu, pd->pinst->wq, &pqueue->reorder_work);
> -	else
> -		padata_reorder(pd);
> +	padata_reorder(pd);
>  }
>  EXPORT_SYMBOL(padata_do_serial);

If I'm not missing anything, still looks like get_cpu() and reorder_via_wq no
longer have an effect with this patch and can be removed.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [v2 PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder
  2019-07-19 14:37                           ` Daniel Jordan
@ 2019-07-19 14:55                             ` Herbert Xu
  2019-07-19 19:04                               ` [PATCH] padata: purge get_cpu and reorder_via_wq from padata_do_serial Daniel Jordan
  0 siblings, 1 reply; 42+ messages in thread
From: Herbert Xu @ 2019-07-19 14:55 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Steffen Klassert, Andrea Parri, Boqun Feng, Paul E . McKenney,
	Peter Zijlstra, linux-arch, linux-crypto, linux-kernel,
	Mathias Krause

On Fri, Jul 19, 2019 at 10:37:21AM -0400, Daniel Jordan wrote:
>
> If I'm not missing anything, still looks like get_cpu() and reorder_via_wq no
> longer have an effect with this patch and can be removed.

You're right that they're not needed.  But we shouldn't mix clean-ups
with substantial changes (in case we break something this would make
bisection less efficient).  So feel free to send a patch on top of
this one to remove them.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH] padata: purge get_cpu and reorder_via_wq from padata_do_serial
  2019-07-19 14:55                             ` Herbert Xu
@ 2019-07-19 19:04                               ` Daniel Jordan
  2019-07-26 12:36                                 ` Herbert Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Daniel Jordan @ 2019-07-19 19:04 UTC (permalink / raw)
  To: Herbert Xu, Steffen Klassert; +Cc: Daniel Jordan, linux-crypto, linux-kernel

With the removal of the padata timer, padata_do_serial no longer
needs special CPU handling, so remove it.

Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 kernel/padata.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index fbafca18597f..7372fb45eeeb 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -316,24 +316,9 @@ static void padata_serial_worker(struct work_struct *serial_work)
  */
 void padata_do_serial(struct padata_priv *padata)
 {
-	int cpu;
-	struct padata_parallel_queue *pqueue;
-	struct parallel_data *pd;
-	int reorder_via_wq = 0;
-
-	pd = padata->pd;
-
-	cpu = get_cpu();
-
-	/* We need to enqueue the padata object into the correct
-	 * per-cpu queue.
-	 */
-	if (cpu != padata->cpu) {
-		reorder_via_wq = 1;
-		cpu = padata->cpu;
-	}
-
-	pqueue = per_cpu_ptr(pd->pqueue, cpu);
+	struct parallel_data *pd = padata->pd;
+	struct padata_parallel_queue *pqueue = per_cpu_ptr(pd->pqueue,
+							   padata->cpu);
 
 	spin_lock(&pqueue->reorder.lock);
 	list_add_tail(&padata->list, &pqueue->reorder.list);
@@ -347,8 +332,6 @@ void padata_do_serial(struct padata_priv *padata)
 	 */
 	smp_mb__after_atomic();
 
-	put_cpu();
-
 	padata_reorder(pd);
 }
 EXPORT_SYMBOL(padata_do_serial);
-- 
2.22.0


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] padata: purge get_cpu and reorder_via_wq from padata_do_serial
  2019-07-19 19:04                               ` [PATCH] padata: purge get_cpu and reorder_via_wq from padata_do_serial Daniel Jordan
@ 2019-07-26 12:36                                 ` Herbert Xu
  0 siblings, 0 replies; 42+ messages in thread
From: Herbert Xu @ 2019-07-26 12:36 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: steffen.klassert, daniel.m.jordan, linux-crypto, linux-kernel

Daniel Jordan <daniel.m.jordan@oracle.com> wrote:
> With the removal of the padata timer, padata_do_serial no longer
> needs special CPU handling, so remove it.
> 
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: linux-crypto@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> kernel/padata.c | 23 +++--------------------
> 1 file changed, 3 insertions(+), 20 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-07-26 12:36 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-11 22:12 [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Daniel Jordan
2019-07-12 10:06 ` Herbert Xu
2019-07-12 10:10   ` Steffen Klassert
2019-07-12 16:07     ` Daniel Jordan
2019-07-13  5:03       ` Herbert Xu
2019-07-15 16:10         ` Daniel Jordan
2019-07-16 10:04           ` Herbert Xu
2019-07-16 11:14             ` Steffen Klassert
2019-07-16 12:57               ` [PATCH] padata: Use RCU when fetching pd from do_serial Herbert Xu
2019-07-16 13:09                 ` Herbert Xu
2019-07-16 13:23                   ` [v2 PATCH] " Herbert Xu
2019-07-17  8:36                     ` Steffen Klassert
2019-07-17  8:50                       ` Herbert Xu
2019-07-17  8:28                   ` [PATCH] " Steffen Klassert
2019-07-17  8:47                     ` Herbert Xu
2019-07-17  8:53                       ` Steffen Klassert
2019-07-16 13:15                 ` Peter Zijlstra
2019-07-16 13:18                   ` Herbert Xu
2019-07-16 13:50                     ` Peter Zijlstra
2019-07-16 13:52                       ` Herbert Xu
2019-07-16 12:53       ` [PATCH] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Andrea Parri
2019-07-16 13:13         ` Peter Zijlstra
2019-07-16 15:01         ` Herbert Xu
2019-07-16 15:44           ` Daniel Jordan
2019-07-16 16:32             ` [PATCH v2] " Daniel Jordan
2019-07-17 11:11               ` [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder Herbert Xu
2019-07-17 18:32                 ` Daniel Jordan
2019-07-18  3:31                   ` Herbert Xu
2019-07-18 14:27                     ` Daniel Jordan
2019-07-18 14:56                       ` Herbert Xu
2019-07-18 15:01                         ` [v2 PATCH] " Herbert Xu
2019-07-19 14:37                           ` Daniel Jordan
2019-07-19 14:55                             ` Herbert Xu
2019-07-19 19:04                               ` [PATCH] padata: purge get_cpu and reorder_via_wq from padata_do_serial Daniel Jordan
2019-07-26 12:36                                 ` Herbert Xu
2019-07-19 14:27                         ` [PATCH] padata: Replace delayed timer with immediate workqueue in padata_reorder Daniel Jordan
2019-07-17 23:21                 ` Daniel Jordan
2019-07-18  3:30                   ` Herbert Xu
2019-07-18 14:25                     ` Daniel Jordan
2019-07-18 14:49                       ` Herbert Xu
2019-07-19 14:21                         ` Daniel Jordan
2019-07-18  5:42               ` [PATCH v2] padata: use smp_mb in padata_reorder to avoid orphaned padata jobs Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).