LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH -next 0/6] rhashtable: guarantee first allocation
@ 2018-05-24 21:11 Davidlohr Bueso
  2018-05-24 21:11 ` [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON Davidlohr Bueso
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-24 21:11 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

Hi,

This series is the result of the discussion with Linus around ipc
subsystem initialization and how it behaves with error return when
calling rhashtable_init()[1]. Instead of caring about the error
or calling the infamous BUG_ON, Linus suggested we guarantee the
rhashtable allocation.

First two patches modify rhashtable_init() to just return 0, future
patches I guess can update more callers.

patch 3 is a nit I found while reading the code and just makes use
kvmalloc_array().

patch 4+5 remove some ipc hacks we no longer need.

patch 6 updates the rhashtable test module. Trivial.

Please consider for v4.18.

Thanks!

[0] https://lkml.org/lkml/2018/5/23/758

Davidlohr Bueso (6):
  lib/rhashtable: convert param sanitations to WARN_ON
  lib/rhashtable: guarantee initial hashtable allocation
  lib/bucket_locks: use kvmalloc_array()
  ipc: get rid of ids->tables_initialized hack
  ipc: simplify ipc initialization
  lib/test_rhashtable: rhashtable_init() can no longer fail

 include/linux/ipc_namespace.h |  1 -
 ipc/msg.c                     |  9 ++++----
 ipc/namespace.c               | 20 ++++--------------
 ipc/sem.c                     | 10 ++++-----
 ipc/shm.c                     |  9 ++++----
 ipc/util.c                    | 41 ++++++++++++------------------------
 ipc/util.h                    | 18 ++++++++--------
 lib/bucket_locks.c            |  2 +-
 lib/rhashtable.c              | 49 ++++++++++++++++++++++++++++++++++---------
 lib/test_rhashtable.c         |  6 +-----
 10 files changed, 79 insertions(+), 86 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON
  2018-05-24 21:11 [PATCH -next 0/6] rhashtable: guarantee first allocation Davidlohr Bueso
@ 2018-05-24 21:11 ` Davidlohr Bueso
  2018-05-28  9:40   ` Herbert Xu
  2018-05-24 21:11 ` [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-24 21:11 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

For the purpose of making rhashtable_init() unable to fail,
we can replace the returning -EINVAL with WARN_ONs whenever
the caller passes bogus parameters during initialization.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/rhashtable.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 9427b5766134..05a4b1b8b8ce 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -1024,12 +1024,11 @@ int rhashtable_init(struct rhashtable *ht,
 
 	size = HASH_DEFAULT_SIZE;
 
-	if ((!params->key_len && !params->obj_hashfn) ||
-	    (params->obj_hashfn && !params->obj_cmpfn))
-		return -EINVAL;
+	WARN_ON((!params->key_len && !params->obj_hashfn) ||
+		(params->obj_hashfn && !params->obj_cmpfn));
 
-	if (params->nulls_base && params->nulls_base < (1U << RHT_BASE_SHIFT))
-		return -EINVAL;
+	WARN_ON(params->nulls_base &&
+		params->nulls_base < (1U << RHT_BASE_SHIFT));
 
 	memset(ht, 0, sizeof(*ht));
 	mutex_init(&ht->mutex);
-- 
2.13.6


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-24 21:11 [PATCH -next 0/6] rhashtable: guarantee first allocation Davidlohr Bueso
  2018-05-24 21:11 ` [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON Davidlohr Bueso
@ 2018-05-24 21:11 ` Davidlohr Bueso
  2018-05-25  3:26   ` Davidlohr Bueso
                     ` (2 more replies)
  2018-05-24 21:11 ` [PATCH 3/6] lib/bucket_locks: use kvmalloc_array() Davidlohr Bueso
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-24 21:11 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

rhashtable_init() may fail due to -ENOMEM, thus making the
entire api unusable. This patch removes this scenario,
however unlikely. In order to guarantee memory allocation,
this patch refactors bucket_table_alloc() to add a 'retry'
parameter which always ends up doing GFP_KERNEL|__GFP_NOFAIL
for both the tbl as well as alloc_bucket_spinlocks().

So upon the first table allocation failure, we shrink the
size to the smallest value that makes sense and retry the alloc
with the same semantics. If we fail again, then we force the
call with __GFP_NOFAIL. With the defaults, this means that from
64 buckets, we retry with only 4. Any later issues regarding
performance due to collisions or larger table resizing (when
more memory becomes available) is the last of our problems.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/rhashtable.c | 40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 05a4b1b8b8ce..28f28602e2f5 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -166,15 +166,20 @@ static struct bucket_table *nested_bucket_table_alloc(struct rhashtable *ht,
 	return tbl;
 }
 
-static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
-					       size_t nbuckets,
-					       gfp_t gfp)
+static struct bucket_table *__bucket_table_alloc(struct rhashtable *ht,
+						 size_t nbuckets,
+						 gfp_t gfp, bool retry)
 {
 	struct bucket_table *tbl = NULL;
 	size_t size, max_locks;
 	int i;
 
 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
+	if (retry) {
+		gfp |= __GFP_NOFAIL;
+		tbl = kzalloc(size, gfp);
+	} /* fall-through */
+
 	if (gfp != GFP_KERNEL)
 		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
 	else
@@ -211,6 +216,20 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 	return tbl;
 }
 
+static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
+					       size_t nbuckets,
+					       gfp_t gfp)
+{
+	return __bucket_table_alloc(ht, nbuckets, gfp, false);
+}
+
+static struct bucket_table *bucket_table_alloc_retry(struct rhashtable *ht,
+						     size_t nbuckets,
+						     gfp_t gfp)
+{
+	return __bucket_table_alloc(ht, nbuckets, gfp, true);
+}
+
 static struct bucket_table *rhashtable_last_table(struct rhashtable *ht,
 						  struct bucket_table *tbl)
 {
@@ -1067,9 +1086,20 @@ int rhashtable_init(struct rhashtable *ht,
 		}
 	}
 
+	/*
+	 * This is api initialization and thus we need to guarantee the
+	 * initial rhashtable allocation. Upon failure, retry with a
+	 * smallest possible size, otherwise we exhaust our options with
+	 * __GFP_NOFAIL.
+	 */
 	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
-	if (tbl == NULL)
-		return -ENOMEM;
+	if (unlikely(tbl == NULL)) {
+		size = HASH_MIN_SIZE;
+
+		tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
+		if (tbl == NULL)
+			tbl = bucket_table_alloc_retry(ht, size, GFP_KERNEL);
+	}
 
 	atomic_set(&ht->nelems, 0);
 
-- 
2.13.6


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 3/6] lib/bucket_locks: use kvmalloc_array()
  2018-05-24 21:11 [PATCH -next 0/6] rhashtable: guarantee first allocation Davidlohr Bueso
  2018-05-24 21:11 ` [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON Davidlohr Bueso
  2018-05-24 21:11 ` [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
@ 2018-05-24 21:11 ` Davidlohr Bueso
  2018-05-24 21:37   ` Linus Torvalds
  2018-05-24 21:11 ` [PATCH 4/6] ipc: get rid of ids->tables_initialized hack Davidlohr Bueso
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-24 21:11 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

For some reason we don't use this call, but we rely just
fine on kmalloc_array(). Make both consistent.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/bucket_locks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bucket_locks.c b/lib/bucket_locks.c
index 266a97c5708b..75fe7386756f 100644
--- a/lib/bucket_locks.c
+++ b/lib/bucket_locks.c
@@ -31,7 +31,7 @@ int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *locks_mask,
 
 	if (sizeof(spinlock_t) != 0) {
 		if (gfpflags_allow_blocking(gfp))
-			tlocks = kvmalloc(size * sizeof(spinlock_t), gfp);
+			tlocks = kvmalloc_array(size, sizeof(spinlock_t), gfp);
 		else
 			tlocks = kmalloc_array(size, sizeof(spinlock_t), gfp);
 		if (!tlocks)
-- 
2.13.6


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4/6] ipc: get rid of ids->tables_initialized hack
  2018-05-24 21:11 [PATCH -next 0/6] rhashtable: guarantee first allocation Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2018-05-24 21:11 ` [PATCH 3/6] lib/bucket_locks: use kvmalloc_array() Davidlohr Bueso
@ 2018-05-24 21:11 ` Davidlohr Bueso
  2018-05-24 21:11 ` [PATCH 5/6] ipc: simplify ipc initialization Davidlohr Bueso
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-24 21:11 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

In sysvipc we have an ids->tables_initialized regarding the
rhashtable, introduced in:

    0cfb6aee70b (ipc: optimize semget/shmget/msgget for lots of keys).

It's there, specifically, to prevent nil pointer dereferences,
from using an uninitialized api. Considering how rhashtable_init()
can fail (probably due to ENOMEM, if anything), this made the
overall ipc initialization capable of failure as well. That alone
is ugly, but fine, however I've spotted a few issues regarding the
semantics of tables_initialized (however unlikely they may be):

- There is inconsistency in what we return to userspace: ipc_addid()
returns ENOSPC which is certainly _wrong_, while ipc_obtain_object_idr()
returns EINVAL.

- After we started using rhashtables, ipc_findkey() can return nil upon
!tables_initialized, but the caller expects nil for when the ipc structure
isn't found, and can therefore call into ipcget() callbacks.

Now that rhashtable initialization cannot fail, we can properly
get rid of the hack altogether.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/ipc_namespace.h |  1 -
 ipc/util.c                    | 23 ++++++++---------------
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index b5630c8eb2f3..37f3a4b7c637 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -16,7 +16,6 @@ struct user_namespace;
 struct ipc_ids {
 	int in_use;
 	unsigned short seq;
-	bool tables_initialized;
 	struct rw_semaphore rwsem;
 	struct idr ipcs_idr;
 	int max_id;
diff --git a/ipc/util.c b/ipc/util.c
index 4e81182fa0ac..823e09e72c58 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -125,7 +125,6 @@ int ipc_init_ids(struct ipc_ids *ids)
 	if (err)
 		return err;
 	idr_init(&ids->ipcs_idr);
-	ids->tables_initialized = true;
 	ids->max_id = -1;
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	ids->next_id = -1;
@@ -178,19 +177,16 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
  */
 static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
 {
-	struct kern_ipc_perm *ipcp = NULL;
+	struct kern_ipc_perm *ipcp;
 
-	if (likely(ids->tables_initialized))
-		ipcp = rhashtable_lookup_fast(&ids->key_ht, &key,
+	ipcp = rhashtable_lookup_fast(&ids->key_ht, &key,
 					      ipc_kht_params);
+	if (!ipcp)
+		return NULL;
 
-	if (ipcp) {
-		rcu_read_lock();
-		ipc_lock_object(ipcp);
-		return ipcp;
-	}
-
-	return NULL;
+	rcu_read_lock();
+	ipc_lock_object(ipcp);
+	return ipcp;
 }
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
@@ -255,7 +251,7 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 	if (limit > IPCMNI)
 		limit = IPCMNI;
 
-	if (!ids->tables_initialized || ids->in_use >= limit)
+	if (ids->in_use >= limit)
 		return -ENOSPC;
 
 	idr_preload(GFP_KERNEL);
@@ -566,9 +562,6 @@ struct kern_ipc_perm *ipc_obtain_object_idr(struct ipc_ids *ids, int id)
 	struct kern_ipc_perm *out;
 	int lid = ipcid_to_idx(id);
 
-	if (unlikely(!ids->tables_initialized))
-		return ERR_PTR(-EINVAL);
-
 	out = idr_find(&ids->ipcs_idr, lid);
 	if (!out)
 		return ERR_PTR(-EINVAL);
-- 
2.13.6


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5/6] ipc: simplify ipc initialization
  2018-05-24 21:11 [PATCH -next 0/6] rhashtable: guarantee first allocation Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2018-05-24 21:11 ` [PATCH 4/6] ipc: get rid of ids->tables_initialized hack Davidlohr Bueso
@ 2018-05-24 21:11 ` Davidlohr Bueso
  2018-05-24 21:11 ` [PATCH 6/6] lib/test_rhashtable: rhashtable_init() can no longer fail Davidlohr Bueso
  2018-05-24 21:41 ` [PATCH -next 0/6] rhashtable: guarantee first allocation Linus Torvalds
  6 siblings, 0 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-24 21:11 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

Now that we know that rhashtable_init() will not fail, we
can get rid of a lot of the unnecessary cleanup paths when
the call errored out.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 ipc/msg.c       |  9 ++++-----
 ipc/namespace.c | 20 ++++----------------
 ipc/sem.c       | 10 ++++------
 ipc/shm.c       |  9 ++++-----
 ipc/util.c      | 18 +++++-------------
 ipc/util.h      | 18 +++++++++---------
 6 files changed, 30 insertions(+), 54 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 3b6545302598..62545ce19173 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -1228,7 +1228,7 @@ COMPAT_SYSCALL_DEFINE5(msgrcv, int, msqid, compat_uptr_t, msgp,
 }
 #endif
 
-int msg_init_ns(struct ipc_namespace *ns)
+void msg_init_ns(struct ipc_namespace *ns)
 {
 	ns->msg_ctlmax = MSGMAX;
 	ns->msg_ctlmnb = MSGMNB;
@@ -1236,7 +1236,7 @@ int msg_init_ns(struct ipc_namespace *ns)
 
 	atomic_set(&ns->msg_bytes, 0);
 	atomic_set(&ns->msg_hdrs, 0);
-	return ipc_init_ids(&ns->ids[IPC_MSG_IDS]);
+	ipc_init_ids(&ns->ids[IPC_MSG_IDS]);
 }
 
 #ifdef CONFIG_IPC_NS
@@ -1277,12 +1277,11 @@ static int sysvipc_msg_proc_show(struct seq_file *s, void *it)
 }
 #endif
 
-int __init msg_init(void)
+void __init msg_init(void)
 {
-	const int err = msg_init_ns(&init_ipc_ns);
+	msg_init_ns(&init_ipc_ns);
 
 	ipc_init_proc_interface("sysvipc/msg",
 				"       key      msqid perms      cbytes       qnum lspid lrpid   uid   gid  cuid  cgid      stime      rtime      ctime\n",
 				IPC_MSG_IDS, sysvipc_msg_proc_show);
-	return err;
 }
diff --git a/ipc/namespace.c b/ipc/namespace.c
index f59a89966f92..21607791d62c 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -55,28 +55,16 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	ns->user_ns = get_user_ns(user_ns);
 	ns->ucounts = ucounts;
 
-	err = sem_init_ns(ns);
+	err = mq_init_ns(ns);
 	if (err)
 		goto fail_put;
-	err = msg_init_ns(ns);
-	if (err)
-		goto fail_destroy_sem;
-	err = shm_init_ns(ns);
-	if (err)
-		goto fail_destroy_msg;
 
-	err = mq_init_ns(ns);
-	if (err)
-		goto fail_destroy_shm;
+	sem_init_ns(ns);
+	msg_init_ns(ns);
+	shm_init_ns(ns);
 
 	return ns;
 
-fail_destroy_shm:
-	shm_exit_ns(ns);
-fail_destroy_msg:
-	msg_exit_ns(ns);
-fail_destroy_sem:
-	sem_exit_ns(ns);
 fail_put:
 	put_user_ns(ns->user_ns);
 	ns_free_inum(&ns->ns);
diff --git a/ipc/sem.c b/ipc/sem.c
index 20f649eed8c6..78ea913ee0d8 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -220,14 +220,14 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
 #define sc_semopm	sem_ctls[2]
 #define sc_semmni	sem_ctls[3]
 
-int sem_init_ns(struct ipc_namespace *ns)
+void sem_init_ns(struct ipc_namespace *ns)
 {
 	ns->sc_semmsl = SEMMSL;
 	ns->sc_semmns = SEMMNS;
 	ns->sc_semopm = SEMOPM;
 	ns->sc_semmni = SEMMNI;
 	ns->used_sems = 0;
-	return ipc_init_ids(&ns->ids[IPC_SEM_IDS]);
+	ipc_init_ids(&ns->ids[IPC_SEM_IDS]);
 }
 
 #ifdef CONFIG_IPC_NS
@@ -239,14 +239,12 @@ void sem_exit_ns(struct ipc_namespace *ns)
 }
 #endif
 
-int __init sem_init(void)
+void __init sem_init(void)
 {
-	const int err = sem_init_ns(&init_ipc_ns);
-
+	sem_init_ns(&init_ipc_ns);
 	ipc_init_proc_interface("sysvipc/sem",
 				"       key      semid perms      nsems   uid   gid  cuid  cgid      otime      ctime\n",
 				IPC_SEM_IDS, sysvipc_sem_proc_show);
-	return err;
 }
 
 /**
diff --git a/ipc/shm.c b/ipc/shm.c
index 29978ee76c2e..97468bd06b6e 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -95,14 +95,14 @@ static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp);
 static int sysvipc_shm_proc_show(struct seq_file *s, void *it);
 #endif
 
-int shm_init_ns(struct ipc_namespace *ns)
+void shm_init_ns(struct ipc_namespace *ns)
 {
 	ns->shm_ctlmax = SHMMAX;
 	ns->shm_ctlall = SHMALL;
 	ns->shm_ctlmni = SHMMNI;
 	ns->shm_rmid_forced = 0;
 	ns->shm_tot = 0;
-	return ipc_init_ids(&shm_ids(ns));
+	ipc_init_ids(&shm_ids(ns));
 }
 
 /*
@@ -135,9 +135,8 @@ void shm_exit_ns(struct ipc_namespace *ns)
 
 static int __init ipc_ns_init(void)
 {
-	const int err = shm_init_ns(&init_ipc_ns);
-	WARN(err, "ipc: sysv shm_init_ns failed: %d\n", err);
-	return err;
+	shm_init_ns(&init_ipc_ns);
+	return 0;
 }
 
 pure_initcall(ipc_ns_init);
diff --git a/ipc/util.c b/ipc/util.c
index 823e09e72c58..06d7c575847c 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -87,16 +87,12 @@ struct ipc_proc_iface {
  */
 static int __init ipc_init(void)
 {
-	int err_sem, err_msg;
-
 	proc_mkdir("sysvipc", NULL);
-	err_sem = sem_init();
-	WARN(err_sem, "ipc: sysv sem_init failed: %d\n", err_sem);
-	err_msg = msg_init();
-	WARN(err_msg, "ipc: sysv msg_init failed: %d\n", err_msg);
+	sem_init();
+	msg_init();
 	shm_init();
 
-	return err_msg ? err_msg : err_sem;
+	return 0;
 }
 device_initcall(ipc_init);
 
@@ -115,21 +111,17 @@ static const struct rhashtable_params ipc_kht_params = {
  * Set up the sequence range to use for the ipc identifier range (limited
  * below IPCMNI) then initialise the keys hashtable and ids idr.
  */
-int ipc_init_ids(struct ipc_ids *ids)
+void ipc_init_ids(struct ipc_ids *ids)
 {
-	int err;
 	ids->in_use = 0;
 	ids->seq = 0;
 	init_rwsem(&ids->rwsem);
-	err = rhashtable_init(&ids->key_ht, &ipc_kht_params);
-	if (err)
-		return err;
+	rhashtable_init(&ids->key_ht, &ipc_kht_params);
 	idr_init(&ids->ipcs_idr);
 	ids->max_id = -1;
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	ids->next_id = -1;
 #endif
-	return 0;
 }
 
 #ifdef CONFIG_PROC_FS
diff --git a/ipc/util.h b/ipc/util.h
index 0aba3230d007..65fad8a94da8 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -18,8 +18,8 @@
 #define IPCMNI 32768  /* <= MAX_INT limit for ipc arrays (including sysctl changes) */
 #define SEQ_MULTIPLIER	(IPCMNI)
 
-int sem_init(void);
-int msg_init(void);
+void sem_init(void);
+void msg_init(void);
 void shm_init(void);
 
 struct ipc_namespace;
@@ -34,17 +34,17 @@ static inline void mq_put_mnt(struct ipc_namespace *ns) { }
 #endif
 
 #ifdef CONFIG_SYSVIPC
-int sem_init_ns(struct ipc_namespace *ns);
-int msg_init_ns(struct ipc_namespace *ns);
-int shm_init_ns(struct ipc_namespace *ns);
+void sem_init_ns(struct ipc_namespace *ns);
+void msg_init_ns(struct ipc_namespace *ns);
+void shm_init_ns(struct ipc_namespace *ns);
 
 void sem_exit_ns(struct ipc_namespace *ns);
 void msg_exit_ns(struct ipc_namespace *ns);
 void shm_exit_ns(struct ipc_namespace *ns);
 #else
-static inline int sem_init_ns(struct ipc_namespace *ns) { return 0; }
-static inline int msg_init_ns(struct ipc_namespace *ns) { return 0; }
-static inline int shm_init_ns(struct ipc_namespace *ns) { return 0; }
+static inline void sem_init_ns(struct ipc_namespace *ns) { }
+static inline void msg_init_ns(struct ipc_namespace *ns) { }
+static inline void shm_init_ns(struct ipc_namespace *ns) { }
 
 static inline void sem_exit_ns(struct ipc_namespace *ns) { }
 static inline void msg_exit_ns(struct ipc_namespace *ns) { }
@@ -83,7 +83,7 @@ struct ipc_ops {
 struct seq_file;
 struct ipc_ids;
 
-int ipc_init_ids(struct ipc_ids *);
+void ipc_init_ids(struct ipc_ids *);
 #ifdef CONFIG_PROC_FS
 void __init ipc_init_proc_interface(const char *path, const char *header,
 		int ids, int (*show)(struct seq_file *, void *));
-- 
2.13.6


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 6/6] lib/test_rhashtable: rhashtable_init() can no longer fail
  2018-05-24 21:11 [PATCH -next 0/6] rhashtable: guarantee first allocation Davidlohr Bueso
                   ` (4 preceding siblings ...)
  2018-05-24 21:11 ` [PATCH 5/6] ipc: simplify ipc initialization Davidlohr Bueso
@ 2018-05-24 21:11 ` Davidlohr Bueso
  2018-05-24 21:41 ` [PATCH -next 0/6] rhashtable: guarantee first allocation Linus Torvalds
  6 siblings, 0 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-24 21:11 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso, Davidlohr Bueso

Update the test module as such.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/test_rhashtable.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c
index f4000c137dbe..a894eb0407f0 100644
--- a/lib/test_rhashtable.c
+++ b/lib/test_rhashtable.c
@@ -182,11 +182,7 @@ static void test_bucket_stats(struct rhashtable *ht, unsigned int entries)
 	struct rhashtable_iter hti;
 	struct rhash_head *pos;
 
-	err = rhashtable_walk_init(ht, &hti, GFP_KERNEL);
-	if (err) {
-		pr_warn("Test failed: allocation error");
-		return;
-	}
+	rhashtable_walk_init(ht, &hti, GFP_KERNEL);
 
 	rhashtable_walk_start(&hti);
 
-- 
2.13.6


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] lib/bucket_locks: use kvmalloc_array()
  2018-05-24 21:11 ` [PATCH 3/6] lib/bucket_locks: use kvmalloc_array() Davidlohr Bueso
@ 2018-05-24 21:37   ` Linus Torvalds
  2018-05-29 14:43     ` Michal Hocko
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2018-05-24 21:37 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Andrew Morton, Thomas Graf, Herbert Xu, Manfred Spraul,
	guillaume.knispel, Linux API, Linux Kernel Mailing List,
	Davidlohr Bueso

On Thu, May 24, 2018 at 2:28 PM Davidlohr Bueso <dave@stgolabs.net> wrote:

>                  if (gfpflags_allow_blocking(gfp))
> -                       tlocks = kvmalloc(size * sizeof(spinlock_t), gfp);
> +                       tlocks = kvmalloc_array(size, sizeof(spinlock_t),
gfp);
>                  else
>                          tlocks = kmalloc_array(size, sizeof(spinlock_t),
gfp);

Side note: how about we just move that "gfpflags_allow_blocking()" into
kvmalloc() instead, and make kvmalloc() generally usable?

Now we have that really odd situation where kvmalloc() takes gfp flags, but
to quote the comment:

  * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm
people.

and the code:

         /*
          * vmalloc uses GFP_KERNEL for some internal allocations (e.g page
tables)
          * so the given set of flags has to be compatible.
          */
         WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);

which isn't really all that helpful. Do mm people really want to be
consulted about random uses?

Maybe we could just make the rule for kvmalloc() be to only fall back on
vmalloc for allocations that are

  - larger than page size

  - blocking and allow GFP_KERNEL (so basically that WARN_ON_ONCE() logic in
kvmalloc_node).

Hmm? Isn't that what everybody really *wants* kvmalloc() and friends to do?

              Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH -next 0/6] rhashtable: guarantee first allocation
  2018-05-24 21:11 [PATCH -next 0/6] rhashtable: guarantee first allocation Davidlohr Bueso
                   ` (5 preceding siblings ...)
  2018-05-24 21:11 ` [PATCH 6/6] lib/test_rhashtable: rhashtable_init() can no longer fail Davidlohr Bueso
@ 2018-05-24 21:41 ` Linus Torvalds
  2018-05-25  3:34   ` Davidlohr Bueso
  6 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2018-05-24 21:41 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Andrew Morton, Thomas Graf, Herbert Xu, Manfred Spraul,
	guillaume.knispel, Linux API, Linux Kernel Mailing List

On Thu, May 24, 2018 at 2:28 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
>   10 files changed, 79 insertions(+), 86 deletions(-)

I certainly can't complain about this small code removal, but I think if we
did the kvmalloc_node() cleanup, we'd be able to get rid of even more.

For example, bucket_table_alloc() does that

         if (gfp != GFP_KERNEL)
                 tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
         else
                 tbl = kvzalloc(size, gfp);

purely due to the kvalloc_node() oddity. Wouldn't it be nice to just write
it as

         tbl = kvzalloc(size, gfp);

knowing that the whole point of all the kv*alloc*() functions is to "just
do the right thing given size, gpf mask, and ease of allocation".

             Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-24 21:11 ` [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
@ 2018-05-25  3:26   ` Davidlohr Bueso
  2018-05-28  9:49   ` Herbert Xu
  2018-05-28 10:02   ` Herbert Xu
  2 siblings, 0 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-25  3:26 UTC (permalink / raw)
  To: akpm, torvalds
  Cc: tgraf, herbert, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Thu, 24 May 2018, Davidlohr Bueso wrote:
> 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
>+	if (retry) {
>+		gfp |= __GFP_NOFAIL;
>+		tbl = kzalloc(size, gfp);
>+	} /* fall-through */
>+
> 	if (gfp != GFP_KERNEL)
> 		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);

This is wrong I'll fix in a v2, it should be _else_ if (gfp != GFP_KERNEL) ...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH -next 0/6] rhashtable: guarantee first allocation
  2018-05-24 21:41 ` [PATCH -next 0/6] rhashtable: guarantee first allocation Linus Torvalds
@ 2018-05-25  3:34   ` Davidlohr Bueso
  0 siblings, 0 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-25  3:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Thomas Graf, Herbert Xu, Manfred Spraul,
	guillaume.knispel, Linux API, Linux Kernel Mailing List

On Thu, 24 May 2018, Linus Torvalds wrote:

>On Thu, May 24, 2018 at 2:28 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
>>   10 files changed, 79 insertions(+), 86 deletions(-)
>
>I certainly can't complain about this small code removal, but I think if we
>did the kvmalloc_node() cleanup, we'd be able to get rid of even more.
>
>For example, bucket_table_alloc() does that
>
>         if (gfp != GFP_KERNEL)
>                 tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
>         else
>                 tbl = kvzalloc(size, gfp);
>
>purely due to the kvalloc_node() oddity. Wouldn't it be nice to just write
>it as
>
>         tbl = kvzalloc(size, gfp);
>
>knowing that the whole point of all the kv*alloc*() functions is to "just
>do the right thing given size, gpf mask, and ease of allocation".

Yes this makes a lot of sense. I'll see about adding it on top.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON
  2018-05-24 21:11 ` [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON Davidlohr Bueso
@ 2018-05-28  9:40   ` Herbert Xu
  2018-05-28 13:12     ` Davidlohr Bueso
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2018-05-28  9:40 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Thu, May 24, 2018 at 02:11:30PM -0700, Davidlohr Bueso wrote:
> For the purpose of making rhashtable_init() unable to fail,
> we can replace the returning -EINVAL with WARN_ONs whenever
> the caller passes bogus parameters during initialization.
> 
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

I don't really see why we need these WARN_ONs.  The problem should
be quite obvious when you're writing your code.

If you really want them perhaps add them to the ipc code instead?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-24 21:11 ` [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
  2018-05-25  3:26   ` Davidlohr Bueso
@ 2018-05-28  9:49   ` Herbert Xu
  2018-05-29 17:03     ` Davidlohr Bueso
  2018-05-28 10:02   ` Herbert Xu
  2 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2018-05-28  9:49 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Thu, May 24, 2018 at 02:11:31PM -0700, Davidlohr Bueso wrote:
>
> -static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
> -					       size_t nbuckets,
> -					       gfp_t gfp)
> +static struct bucket_table *__bucket_table_alloc(struct rhashtable *ht,
> +						 size_t nbuckets,
> +						 gfp_t gfp, bool retry)
>  {
>  	struct bucket_table *tbl = NULL;
>  	size_t size, max_locks;
>  	int i;
>  
>  	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
> +	if (retry) {
> +		gfp |= __GFP_NOFAIL;
> +		tbl = kzalloc(size, gfp);
> +	} /* fall-through */

I'd prefer this logic to be moved to the caller.  So just call the
function with GFP_KERNEL | __GFP_NOFAIL.

Of course you need to modify bucket_table_alloc so that it still
treats this as GFP_KERNEL (as opposed to GFP_ATOMIC).  That is,
instead of 

>  	if (gfp != GFP_KERNEL)

You will need
	if ((gfp & ~__GFP_NOFAIL) != GFP_KERNEL)

> @@ -1067,9 +1086,20 @@ int rhashtable_init(struct rhashtable *ht,
>  		}
>  	}
>  
> +	/*
> +	 * This is api initialization and thus we need to guarantee the
> +	 * initial rhashtable allocation. Upon failure, retry with a
> +	 * smallest possible size, otherwise we exhaust our options with
> +	 * __GFP_NOFAIL.
> +	 */
>  	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
> -	if (tbl == NULL)
> -		return -ENOMEM;
> +	if (unlikely(tbl == NULL)) {
> +		size = HASH_MIN_SIZE;
> +
> +		tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
> +		if (tbl == NULL)
> +			tbl = bucket_table_alloc_retry(ht, size, GFP_KERNEL);
> +	}

Perhaps you should also explain here why we don't just try the
minimum size with __GFP_NOFAIL as the second step rather than the
third.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-24 21:11 ` [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
  2018-05-25  3:26   ` Davidlohr Bueso
  2018-05-28  9:49   ` Herbert Xu
@ 2018-05-28 10:02   ` Herbert Xu
  2018-05-29 16:42     ` Davidlohr Bueso
  2 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2018-05-28 10:02 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Thu, May 24, 2018 at 02:11:31PM -0700, Davidlohr Bueso wrote:
>
> +	/*
> +	 * This is api initialization and thus we need to guarantee the
> +	 * initial rhashtable allocation. Upon failure, retry with a
> +	 * smallest possible size, otherwise we exhaust our options with
> +	 * __GFP_NOFAIL.
> +	 */
>  	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
> -	if (tbl == NULL)
> -		return -ENOMEM;
> +	if (unlikely(tbl == NULL)) {
> +		size = HASH_MIN_SIZE;

You should also take min_size into account.  Yes I know the current
code ignores it unless you also set nelem_hint.  But that's just a
bug.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON
  2018-05-28  9:40   ` Herbert Xu
@ 2018-05-28 13:12     ` Davidlohr Bueso
  2018-05-28 15:54       ` Herbert Xu
  0 siblings, 1 reply; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-28 13:12 UTC (permalink / raw)
  To: Herbert Xu
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Mon, 28 May 2018, Herbert Xu wrote:
>I don't really see why we need these WARN_ONs.  The problem should
>be quite obvious when you're writing your code.
>
>If you really want them perhaps add them to the ipc code instead?

Well, I don't really _want_ them; nor does the ipc code which already
does a WARN_ON() (but that goes away in future patches). What I want
is to get rid of the return path. So I don't really care if we convert
them to WARN or remove them altogether. Would you be happy with the
later?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON
  2018-05-28 15:54       ` Herbert Xu
@ 2018-05-28 15:51         ` Davidlohr Bueso
  0 siblings, 0 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-28 15:51 UTC (permalink / raw)
  To: Herbert Xu
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Mon, 28 May 2018, Herbert Xu wrote:

>On Mon, May 28, 2018 at 06:12:09AM -0700, Davidlohr Bueso wrote:
>>
>> Well, I don't really _want_ them; nor does the ipc code which already
>> does a WARN_ON() (but that goes away in future patches). What I want
>> is to get rid of the return path. So I don't really care if we convert
>> them to WARN or remove them altogether. Would you be happy with the
>> later?
>
>It has nothing to do with the error return path.  Assuming you
>remove the allocation failure path, then this error path can never
>trigger for *your* use-case. 

Why would this be triggered by any use case if the developer setup the
params correctly...? I don't see the point of not getting rid of the
EINVAL or wrapping around warning around it. Ultimately it would be
good to just have rhashtable_init() return void.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON
  2018-05-28 13:12     ` Davidlohr Bueso
@ 2018-05-28 15:54       ` Herbert Xu
  2018-05-28 15:51         ` Davidlohr Bueso
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2018-05-28 15:54 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Mon, May 28, 2018 at 06:12:09AM -0700, Davidlohr Bueso wrote:
>
> Well, I don't really _want_ them; nor does the ipc code which already
> does a WARN_ON() (but that goes away in future patches). What I want
> is to get rid of the return path. So I don't really care if we convert
> them to WARN or remove them altogether. Would you be happy with the
> later?

It has nothing to do with the error return path.  Assuming you
remove the allocation failure path, then this error path can never
trigger for *your* use-case.  IOW you can either just ignore the
return value for ipc or just do the WARN_ON in your code.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] lib/bucket_locks: use kvmalloc_array()
  2018-05-24 21:37   ` Linus Torvalds
@ 2018-05-29 14:43     ` Michal Hocko
  2018-05-29 14:51       ` Michal Hocko
  0 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2018-05-29 14:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Davidlohr Bueso, Andrew Morton, Thomas Graf, Herbert Xu,
	Manfred Spraul, guillaume.knispel, Linux API,
	Linux Kernel Mailing List, Davidlohr Bueso

On Thu 24-05-18 14:37:36, Linus Torvalds wrote:
> On Thu, May 24, 2018 at 2:28 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> 
> >                  if (gfpflags_allow_blocking(gfp))
> > -                       tlocks = kvmalloc(size * sizeof(spinlock_t), gfp);
> > +                       tlocks = kvmalloc_array(size, sizeof(spinlock_t),
> gfp);
> >                  else
> >                          tlocks = kmalloc_array(size, sizeof(spinlock_t),
> gfp);
> 
> Side note: how about we just move that "gfpflags_allow_blocking()" into
> kvmalloc() instead, and make kvmalloc() generally usable?
> 
> Now we have that really odd situation where kvmalloc() takes gfp flags, but
> to quote the comment:
> 
>   * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm
> people.
> 
> and the code:
> 
>          /*
>           * vmalloc uses GFP_KERNEL for some internal allocations (e.g page
> tables)
>           * so the given set of flags has to be compatible.
>           */
>          WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> 
> which isn't really all that helpful. Do mm people really want to be
> consulted about random uses?

The purpose was to have a clean usage base after the conversion. If we
are growing a non-trivial use base which wants to use GFP_NOWAIT semantic
then sure we can make kvmalloc never fallback to vmallock. But see
below...

> Maybe we could just make the rule for kvmalloc() be to only fall back on
> vmalloc for allocations that are
> 
>   - larger than page size
> 
>   - blocking and allow GFP_KERNEL (so basically that WARN_ON_ONCE() logic in
> kvmalloc_node).
> 
> Hmm? Isn't that what everybody really *wants* kvmalloc() and friends to do?

... Well, there are users who would like to use kvmalloc for
GFP_NOFS/GFP_NOIO context. Do we want them to fail more likely for
larger order rather than have them fixed (to either drop the NOFS
because it just has been blindly copied from a different code without
too much thinking or use the scope NOFS/NOIO API)? A warn_on tends to be
rather harsh but effective way to push maintainers fix their broken
code...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] lib/bucket_locks: use kvmalloc_array()
  2018-05-29 14:43     ` Michal Hocko
@ 2018-05-29 14:51       ` Michal Hocko
  2018-05-29 20:46         ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2018-05-29 14:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Davidlohr Bueso, Andrew Morton, Thomas Graf, Herbert Xu,
	Manfred Spraul, guillaume.knispel, Linux API,
	Linux Kernel Mailing List, Davidlohr Bueso

On Tue 29-05-18 16:43:17, Michal Hocko wrote:
> On Thu 24-05-18 14:37:36, Linus Torvalds wrote:
> > On Thu, May 24, 2018 at 2:28 PM Davidlohr Bueso <dave@stgolabs.net> wrote:
> > 
> > >                  if (gfpflags_allow_blocking(gfp))
> > > -                       tlocks = kvmalloc(size * sizeof(spinlock_t), gfp);
> > > +                       tlocks = kvmalloc_array(size, sizeof(spinlock_t),
> > gfp);
> > >                  else
> > >                          tlocks = kmalloc_array(size, sizeof(spinlock_t),
> > gfp);
> > 
> > Side note: how about we just move that "gfpflags_allow_blocking()" into
> > kvmalloc() instead, and make kvmalloc() generally usable?
> > 
> > Now we have that really odd situation where kvmalloc() takes gfp flags, but
> > to quote the comment:
> > 
> >   * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm
> > people.
> > 
> > and the code:
> > 
> >          /*
> >           * vmalloc uses GFP_KERNEL for some internal allocations (e.g page
> > tables)
> >           * so the given set of flags has to be compatible.
> >           */
> >          WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> > 
> > which isn't really all that helpful. Do mm people really want to be
> > consulted about random uses?
> 
> The purpose was to have a clean usage base after the conversion. If we
> are growing a non-trivial use base which wants to use GFP_NOWAIT semantic
> then sure we can make kvmalloc never fallback to vmallock. But see
> below...
> 
> > Maybe we could just make the rule for kvmalloc() be to only fall back on
> > vmalloc for allocations that are
> > 
> >   - larger than page size
> > 
> >   - blocking and allow GFP_KERNEL (so basically that WARN_ON_ONCE() logic in
> > kvmalloc_node).
> > 
> > Hmm? Isn't that what everybody really *wants* kvmalloc() and friends to do?
> 
> ... Well, there are users who would like to use kvmalloc for
> GFP_NOFS/GFP_NOIO context. Do we want them to fail more likely for
> larger order rather than have them fixed (to either drop the NOFS
> because it just has been blindly copied from a different code without
> too much thinking or use the scope NOFS/NOIO API)? A warn_on tends to be
> rather harsh but effective way to push maintainers fix their broken
> code...

In other words, what about the following?

diff --git a/mm/util.c b/mm/util.c
index 45fc3169e7b0..05706e18d201 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -391,6 +391,10 @@ EXPORT_SYMBOL(vm_mmap);
  * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
  * preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
+ * GFP_NOWAIT request never fallback to vmalloc but it is accepted for convenience
+ * to not force people open conding kmalloc fallback on !gfpflags_allow_blocking
+ * requests.
+ *
  * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
@@ -402,7 +406,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
 	 * so the given set of flags has to be compatible.
 	 */
-	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
+	WARN_ON_ONCE((flags & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO));
 
 	/*
 	 * We want to attempt a large physically contiguous block first because
@@ -427,6 +431,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
+	if (!gfpflags_allow_blocking(flags))
+		return NULL;
+
 	return __vmalloc_node_flags_caller(size, node, flags,
 			__builtin_return_address(0));
 }
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-28 10:02   ` Herbert Xu
@ 2018-05-29 16:42     ` Davidlohr Bueso
  2018-05-29 18:03       ` Herbert Xu
  0 siblings, 1 reply; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-29 16:42 UTC (permalink / raw)
  To: Herbert Xu
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Mon, 28 May 2018, Herbert Xu wrote:

>On Thu, May 24, 2018 at 02:11:31PM -0700, Davidlohr Bueso wrote:
>>
>> +	/*
>> +	 * This is api initialization and thus we need to guarantee the
>> +	 * initial rhashtable allocation. Upon failure, retry with a
>> +	 * smallest possible size, otherwise we exhaust our options with
>> +	 * __GFP_NOFAIL.
>> +	 */
>>  	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
>> -	if (tbl == NULL)
>> -		return -ENOMEM;
>> +	if (unlikely(tbl == NULL)) {
>> +		size = HASH_MIN_SIZE;
>
>You should also take min_size into account.  Yes I know the current
>code ignores it unless you also set nelem_hint.  But that's just a
>bug.

For the sake of simplicity, Linus suggested directly using HASH_MIN_SIZE
such that we have a single fallback.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-28  9:49   ` Herbert Xu
@ 2018-05-29 17:03     ` Davidlohr Bueso
  2018-05-29 18:04       ` Herbert Xu
  0 siblings, 1 reply; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-29 17:03 UTC (permalink / raw)
  To: Herbert Xu
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Mon, 28 May 2018, Herbert Xu wrote:
>> +	/*
>> +	 * This is api initialization and thus we need to guarantee the
>> +	 * initial rhashtable allocation. Upon failure, retry with a
>> +	 * smallest possible size, otherwise we exhaust our options with
>> +	 * __GFP_NOFAIL.
>> +	 */
>>  	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
>> -	if (tbl == NULL)
>> -		return -ENOMEM;
>> +	if (unlikely(tbl == NULL)) {
>> +		size = HASH_MIN_SIZE;
>> +
>> +		tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
>> +		if (tbl == NULL)
>> +			tbl = bucket_table_alloc_retry(ht, size, GFP_KERNEL);
>> +	}
>
>Perhaps you should also explain here why we don't just try the
>minimum size with __GFP_NOFAIL as the second step rather than the
>third.

Please see the comment above, I try to explain the rationale.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-29 18:03       ` Herbert Xu
@ 2018-05-29 17:55         ` Davidlohr Bueso
  2018-05-29 18:15           ` Herbert Xu
  0 siblings, 1 reply; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-29 17:55 UTC (permalink / raw)
  To: Herbert Xu
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Wed, 30 May 2018, Herbert Xu wrote:

>On Tue, May 29, 2018 at 09:42:31AM -0700, Davidlohr Bueso wrote:
>>
>> For the sake of simplicity, Linus suggested directly using HASH_MIN_SIZE
>> such that we have a single fallback.
>
>Where did he suggest that?

https://lkml.org/lkml/2018/5/24/1265

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-29 18:04       ` Herbert Xu
@ 2018-05-29 17:59         ` Davidlohr Bueso
  2018-05-29 18:27           ` Herbert Xu
  0 siblings, 1 reply; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-29 17:59 UTC (permalink / raw)
  To: Herbert Xu
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Wed, 30 May 2018, Herbert Xu wrote:

>It doesn't explain it at all.  In fact I don't see why we neeed
>three attempts, just do the GFP_NOFAIL as the second and final step.

Second attempt is reduced size only as we don't want to GFP_NOFAIL
if we can avoid it helping the allocator. We go from an arbitrary
allocation to the smallest possible allocation, if all that fails
ok lets use GFP_NOFAIL. I don't know how this is not clear...

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-29 16:42     ` Davidlohr Bueso
@ 2018-05-29 18:03       ` Herbert Xu
  2018-05-29 17:55         ` Davidlohr Bueso
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2018-05-29 18:03 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Tue, May 29, 2018 at 09:42:31AM -0700, Davidlohr Bueso wrote:
>
> For the sake of simplicity, Linus suggested directly using HASH_MIN_SIZE
> such that we have a single fallback.

Where did he suggest that?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-29 17:03     ` Davidlohr Bueso
@ 2018-05-29 18:04       ` Herbert Xu
  2018-05-29 17:59         ` Davidlohr Bueso
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2018-05-29 18:04 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Tue, May 29, 2018 at 10:03:38AM -0700, Davidlohr Bueso wrote:
> On Mon, 28 May 2018, Herbert Xu wrote:
> > > +	/*
> > > +	 * This is api initialization and thus we need to guarantee the
> > > +	 * initial rhashtable allocation. Upon failure, retry with a
> > > +	 * smallest possible size, otherwise we exhaust our options with
> > > +	 * __GFP_NOFAIL.
> > > +	 */
> > >  	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
> > > -	if (tbl == NULL)
> > > -		return -ENOMEM;
> > > +	if (unlikely(tbl == NULL)) {
> > > +		size = HASH_MIN_SIZE;
> > > +
> > > +		tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
> > > +		if (tbl == NULL)
> > > +			tbl = bucket_table_alloc_retry(ht, size, GFP_KERNEL);
> > > +	}
> > 
> > Perhaps you should also explain here why we don't just try the
> > minimum size with __GFP_NOFAIL as the second step rather than the
> > third.
> 
> Please see the comment above, I try to explain the rationale.

It doesn't explain it at all.  In fact I don't see why we neeed
three attempts, just do the GFP_NOFAIL as the second and final step.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-29 18:15           ` Herbert Xu
@ 2018-05-29 18:05             ` Davidlohr Bueso
  0 siblings, 0 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-29 18:05 UTC (permalink / raw)
  To: Herbert Xu
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Wed, 30 May 2018, Herbert Xu wrote:

>Well I think we should respect min_size.  rhashtable users may
>fail at insertion time if the table is too small.

I'm fine either way.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-29 17:55         ` Davidlohr Bueso
@ 2018-05-29 18:15           ` Herbert Xu
  2018-05-29 18:05             ` Davidlohr Bueso
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2018-05-29 18:15 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Tue, May 29, 2018 at 10:55:13AM -0700, Davidlohr Bueso wrote:
> On Wed, 30 May 2018, Herbert Xu wrote:
> 
> > On Tue, May 29, 2018 at 09:42:31AM -0700, Davidlohr Bueso wrote:
> > > 
> > > For the sake of simplicity, Linus suggested directly using HASH_MIN_SIZE
> > > such that we have a single fallback.
> > 
> > Where did he suggest that?
> 
> https://lkml.org/lkml/2018/5/24/1265

Well I think we should respect min_size.  rhashtable users may
fail at insertion time if the table is too small.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-29 17:59         ` Davidlohr Bueso
@ 2018-05-29 18:27           ` Herbert Xu
  2018-05-30 14:29             ` Davidlohr Bueso
  0 siblings, 1 reply; 33+ messages in thread
From: Herbert Xu @ 2018-05-29 18:27 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Tue, May 29, 2018 at 10:59:27AM -0700, Davidlohr Bueso wrote:
> On Wed, 30 May 2018, Herbert Xu wrote:
> 
> > It doesn't explain it at all.  In fact I don't see why we neeed
> > three attempts, just do the GFP_NOFAIL as the second and final step.
> 
> Second attempt is reduced size only as we don't want to GFP_NOFAIL
> if we can avoid it helping the allocator. We go from an arbitrary
> allocation to the smallest possible allocation, if all that fails
> ok lets use GFP_NOFAIL. I don't know how this is not clear...

That's exactly what you need to explain in the patch or the commit
message.  In fact you still haven't explained it fully.  Why do we
need a second attempt without the GFP_NOFAIL? How does it help the
allocator?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] lib/bucket_locks: use kvmalloc_array()
  2018-05-29 14:51       ` Michal Hocko
@ 2018-05-29 20:46         ` Linus Torvalds
  2018-05-30  7:42           ` Michal Hocko
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2018-05-29 20:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Davidlohr Bueso, Andrew Morton, Thomas Graf, Herbert Xu,
	Manfred Spraul, guillaume.knispel, Linux API,
	Linux Kernel Mailing List, Davidlohr Bueso

On Tue, May 29, 2018 at 9:51 AM Michal Hocko <mhocko@kernel.org> wrote:

> In other words, what about the following?

> +       WARN_ON_ONCE((flags & (__GFP_FS|__GFP_IO)) !=
(__GFP_FS|__GFP_IO));

I  still don't understand the point of this warning.

It's only stupid. It basically says "this function is garbage, so let me
warn about the fact that I'm a moron".

If we all needed to warn about ourselves being morons, there would be a
hell of a lot of big blinking signs everywhere.

And the *ONLY* thing that warning has ever caused is  just stupid code that
does

     if (flags == GFP_KERNEL)
         .. do kvmalloc ..
     else
         .. do kmalloc() ..

which is a *STUPID* pattern.

In other words, the WARN_ON() is bogus garbage. It's bogus exactly because
NOBODY CARES and all everybody will ever do is to just avoid it by writing
worse code.

The whole and ONLY point of "kvmalloc()" and friends is to make it easy to
write code and _not_ have those idiotic "let's do kmalloc or kvmalloc
depending on the phase of the moon" garbage. So the warning has literally
destroyed the only value that function has!

> +       if (!gfpflags_allow_blocking(flags))
> +               return NULL;
> +

And no. Now all the code *above* this check is just wrong. All the code
that modifies the gfp_flags with the intention of falling back on vmalloc()
is just wrong, since we're not falling back on vmalloc any more.

So I really think the semantics should be simple:

  - if we don't allow GFP_KERNEL, then the code becomes just "kmalloc()",
since there is no valid fallback to vmalloc. vmalloc does GFP_KERNEL.

  - otherwise, we do what we used to do (except now the warning is gone,
because we already handled the case it warned about).

Nothing else. No stupid other cases. The *only* thing that function should
ask itself is "can I fall back on vmalloc or not", and if it can't then it
should just be a kmalloc.

Because otherwise we'll continue to have people that just check in the
caller instead.

         Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] lib/bucket_locks: use kvmalloc_array()
  2018-05-29 20:46         ` Linus Torvalds
@ 2018-05-30  7:42           ` Michal Hocko
  2018-05-31 15:01             ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Michal Hocko @ 2018-05-30  7:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Davidlohr Bueso, Andrew Morton, Thomas Graf, Herbert Xu,
	Manfred Spraul, guillaume.knispel, Linux API,
	Linux Kernel Mailing List, Davidlohr Bueso

On Tue 29-05-18 15:46:25, Linus Torvalds wrote:
[...]
> The whole and ONLY point of "kvmalloc()" and friends is to make it easy to
> write code and _not_ have those idiotic "let's do kmalloc or kvmalloc
> depending on the phase of the moon" garbage. So the warning has literally
> destroyed the only value that function has!

Well, I do agree but I've also seen terrible things while doing the
conversion when introducing kvmalloc.

So I admit that the defensive mode here is mostly inspired by existing
users of vmalloc(GFP_NOFS). They are simply wrong and not really
eager to be fixed from my experience. Now with kvmalloc fixing them
up silently it would get even less likely to get fixed because there
won't be any deadlock possible (compared to open coded kvmalloc like
ext4_kvmalloc for example).

My experience also tells me that most of those vmalloc NOFS users
simply do not need NOFS at all because there is no risk of the reclaim
recursion deadlocks. They are just used because of cargo cult which is
sad and it causes some subtle problems for the direct reclaim. I would
really like to eliminate those (e.g. see [1]).  It is sad reality that
people tend to be more sensitive to WARN splats than "look this is wrong
albeit not critical in most cases).

[1] http://lkml.kernel.org/r/20180424162712.GL17484@dhcp22.suse.cz

That being sad, if you believe that silently fixing up a code like that
is a good idea we can do the following of course:

>From c1a098e809a109800f9cfa63cb27fe9a78f3f316 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 30 May 2018 09:34:39 +0200
Subject: [PATCH] mm: kvmalloc does not fallback to vmalloc for incompatible
 gfp flags

kvmalloc warned about incompatible gfp_mask to catch abusers (mostly
GFP_NOFS) with an intention that this will motivate authors of the code
to fix those. Linus argues that this just motivates people to do even
more hacks like
	if (gfp == GFP_KERNEL)
		kvmalloc
	else
		kmalloc

I haven't seen this happening but it is true that we can grow those in
future. Therefore Linus suggested to simply not fallback to vmalloc for
incompatible gfp flags and rather stick with the kmalloc path.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/util.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/util.c b/mm/util.c
index 45fc3169e7b0..c6586c146995 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -391,7 +391,8 @@ EXPORT_SYMBOL(vm_mmap);
  * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
  * preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
- * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
+ * Please note that any use of gfp flags outside of GFP_KERNEL is careful to not
+ * fall back to vmalloc.
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
 {
@@ -402,7 +403,8 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
 	 * so the given set of flags has to be compatible.
 	 */
-	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
+	if ((flags & GFP_KERNEL) != GFP_KERNEL)
+		return kmalloc_node(size, flags, node);
 
 	/*
 	 * We want to attempt a large physically contiguous block first because
-- 
2.17.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation
  2018-05-29 18:27           ` Herbert Xu
@ 2018-05-30 14:29             ` Davidlohr Bueso
  0 siblings, 0 replies; 33+ messages in thread
From: Davidlohr Bueso @ 2018-05-30 14:29 UTC (permalink / raw)
  To: Herbert Xu
  Cc: akpm, torvalds, tgraf, manfred, guillaume.knispel, linux-api,
	linux-kernel, Davidlohr Bueso

On Wed, 30 May 2018, Herbert Xu wrote:

>On Tue, May 29, 2018 at 10:59:27AM -0700, Davidlohr Bueso wrote:
>That's exactly what you need to explain in the patch or the commit
>message.  In fact you still haven't explained it fully.  Why do we
>need a second attempt without the GFP_NOFAIL? How does it help the
>allocator?

It helps in that we have two fastpath attempts before going in to
__alloc_pages_slowpath() and looping in __GFP_NOFAIL. But yeah, I
see your point. We can just apply KISS and avoid the extra alloc.
That actually makes more sense to me now than ignoring min_size
based on simplicity.

Thanks for the review.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] lib/bucket_locks: use kvmalloc_array()
  2018-05-30  7:42           ` Michal Hocko
@ 2018-05-31 15:01             ` Linus Torvalds
  2018-05-31 15:29               ` Michal Hocko
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2018-05-31 15:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Davidlohr Bueso, Andrew Morton, Thomas Graf, Herbert Xu,
	Manfred Spraul, guillaume.knispel, Linux API,
	Linux Kernel Mailing List, Davidlohr Bueso

On Wed, May 30, 2018 at 2:42 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> That being sad, if you believe that silently fixing up a code like that
> is a good idea we can do the following of course:

Ack.

Except for:

> Linus argues that this just motivates people to do even
> more hacks like
>         if (gfp == GFP_KERNEL)
>                 kvmalloc
>         else
>                 kmalloc
>
> I haven't seen this happening but it is true that we can grow those in
> future.

This whole discussion came from the fact that YES, THIS IS ACTUALLY HAPPENING.

See lib/bucket_locks.c - it just uses gfpflags_allow_blocking()
instead of explicitly checking for GFP_KERNEL (probably because the
only two cases it actually deals with is GFP_ATOMIC and GFP_KERNEL).

              Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] lib/bucket_locks: use kvmalloc_array()
  2018-05-31 15:01             ` Linus Torvalds
@ 2018-05-31 15:29               ` Michal Hocko
  0 siblings, 0 replies; 33+ messages in thread
From: Michal Hocko @ 2018-05-31 15:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Davidlohr Bueso, Andrew Morton, Thomas Graf, Herbert Xu,
	Manfred Spraul, guillaume.knispel, Linux API,
	Linux Kernel Mailing List, Davidlohr Bueso

On Thu 31-05-18 10:01:51, Linus Torvalds wrote:
> On Wed, May 30, 2018 at 2:42 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > That being sad, if you believe that silently fixing up a code like that
> > is a good idea we can do the following of course:
> 
> Ack.
> 
> Except for:
> 
> > Linus argues that this just motivates people to do even
> > more hacks like
> >         if (gfp == GFP_KERNEL)
> >                 kvmalloc
> >         else
> >                 kmalloc
> >
> > I haven't seen this happening but it is true that we can grow those in
> > future.
> 
> This whole discussion came from the fact that YES, THIS IS ACTUALLY HAPPENING.
> 
> See lib/bucket_locks.c - it just uses gfpflags_allow_blocking()
> instead of explicitly checking for GFP_KERNEL (probably because the
> only two cases it actually deals with is GFP_ATOMIC and GFP_KERNEL).

OK, I haven't noticed this one and will fix it up. So what about the
following?

>From abc6ac9a690060d5ceda79e747c78d24cc7f2951 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 30 May 2018 09:34:39 +0200
Subject: [PATCH] mm: kvmalloc does not fallback to vmalloc for incompatible
 gfp flags

kvmalloc warned about incompatible gfp_mask to catch abusers (mostly
GFP_NOFS) with an intention that this will motivate authors of the code
to fix those. Linus argues that this just motivates people to do even
more hacks like
	if (gfp == GFP_KERNEL)
		kvmalloc
	else
		kmalloc

I haven't seen this happening much (bucket_lock special cases an atomic
allocation) but it is true that we can grow those in future. Therefore
Linus suggested to simply not fallback to vmalloc for incompatible gfp
flags and rather stick with the kmalloc path.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 lib/bucket_locks.c | 5 +----
 mm/util.c          | 6 ++++--
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/lib/bucket_locks.c b/lib/bucket_locks.c
index 266a97c5708b..ade3ce6c4af6 100644
--- a/lib/bucket_locks.c
+++ b/lib/bucket_locks.c
@@ -30,10 +30,7 @@ int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *locks_mask,
 	}
 
 	if (sizeof(spinlock_t) != 0) {
-		if (gfpflags_allow_blocking(gfp))
-			tlocks = kvmalloc(size * sizeof(spinlock_t), gfp);
-		else
-			tlocks = kmalloc_array(size, sizeof(spinlock_t), gfp);
+		tlocks = kvmalloc_array(size, sizeof(spinlock_t), gfp);
 		if (!tlocks)
 			return -ENOMEM;
 		for (i = 0; i < size; i++)
diff --git a/mm/util.c b/mm/util.c
index 45fc3169e7b0..c6586c146995 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -391,7 +391,8 @@ EXPORT_SYMBOL(vm_mmap);
  * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
  * preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
- * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
+ * Please note that any use of gfp flags outside of GFP_KERNEL is careful to not
+ * fall back to vmalloc.
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
 {
@@ -402,7 +403,8 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
 	 * so the given set of flags has to be compatible.
 	 */
-	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
+	if ((flags & GFP_KERNEL) != GFP_KERNEL)
+		return kmalloc_node(size, flags, node);
 
 	/*
 	 * We want to attempt a large physically contiguous block first because
-- 
2.17.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-05-31 15:29 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 21:11 [PATCH -next 0/6] rhashtable: guarantee first allocation Davidlohr Bueso
2018-05-24 21:11 ` [PATCH 1/6] lib/rhashtable: convert param sanitations to WARN_ON Davidlohr Bueso
2018-05-28  9:40   ` Herbert Xu
2018-05-28 13:12     ` Davidlohr Bueso
2018-05-28 15:54       ` Herbert Xu
2018-05-28 15:51         ` Davidlohr Bueso
2018-05-24 21:11 ` [PATCH 2/6] lib/rhashtable: guarantee initial hashtable allocation Davidlohr Bueso
2018-05-25  3:26   ` Davidlohr Bueso
2018-05-28  9:49   ` Herbert Xu
2018-05-29 17:03     ` Davidlohr Bueso
2018-05-29 18:04       ` Herbert Xu
2018-05-29 17:59         ` Davidlohr Bueso
2018-05-29 18:27           ` Herbert Xu
2018-05-30 14:29             ` Davidlohr Bueso
2018-05-28 10:02   ` Herbert Xu
2018-05-29 16:42     ` Davidlohr Bueso
2018-05-29 18:03       ` Herbert Xu
2018-05-29 17:55         ` Davidlohr Bueso
2018-05-29 18:15           ` Herbert Xu
2018-05-29 18:05             ` Davidlohr Bueso
2018-05-24 21:11 ` [PATCH 3/6] lib/bucket_locks: use kvmalloc_array() Davidlohr Bueso
2018-05-24 21:37   ` Linus Torvalds
2018-05-29 14:43     ` Michal Hocko
2018-05-29 14:51       ` Michal Hocko
2018-05-29 20:46         ` Linus Torvalds
2018-05-30  7:42           ` Michal Hocko
2018-05-31 15:01             ` Linus Torvalds
2018-05-31 15:29               ` Michal Hocko
2018-05-24 21:11 ` [PATCH 4/6] ipc: get rid of ids->tables_initialized hack Davidlohr Bueso
2018-05-24 21:11 ` [PATCH 5/6] ipc: simplify ipc initialization Davidlohr Bueso
2018-05-24 21:11 ` [PATCH 6/6] lib/test_rhashtable: rhashtable_init() can no longer fail Davidlohr Bueso
2018-05-24 21:41 ` [PATCH -next 0/6] rhashtable: guarantee first allocation Linus Torvalds
2018-05-25  3:34   ` Davidlohr Bueso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).