LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v7 0/4] ipc: IPCMNI limit check for *mni & increase that limit
@ 2018-05-07 20:59 Waiman Long
  2018-05-07 20:59 ` [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni Waiman Long
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Waiman Long @ 2018-05-07 20:59 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

v6->v7:
 - Drop the range clamping code and just return error instead for now
   until there is user request for clamping support.
 - Fix compilation error when CONFIG_SYSVIPC_SYSCTL isn't defined.

v5->v6:
 - Consolidate the 3 ctl_table flags into 2.
 - Make similar changes to proc_doulongvec_minmax() and its associates
   to complete the clamping change.
 - Remove the sysctl registration failure test patch for now for later
   consideration.
 - Add extra braces to patch 1 to reduce code diff in a later patch.

v4->v5:
 - Revert the flags back to 16-bit so that there will be no change to
   the size of ctl_table.
 - Enhance the sysctl_check_flags() as requested by Luis to perform more
   checks to spot incorrect ctl_table entries.
 - Change the sysctl selftest to use dummy sysctls instead of production
   ones & enhance it to do more checks.
 - Add one more sysctl selftest for registration failure.
 - Add 2 ipc patches to add an extended mode to increase IPCMNI from
   32k to 2M.
 - Miscellaneous change to incorporate feedback comments from
   reviewers.

v3->v4:
 - Remove v3 patches 1 & 2 as they have been merged into the mm tree.
 - Change flags from uint16_t to unsigned int.
 - Remove CTL_FLAGS_OOR_WARNED and use pr_warn_ratelimited() instead.
 - Simplify the warning message code.
 - Add a new patch to fail the ctl_table registration with invalid flag.
 - Add a test case for range clamping in sysctl selftest.

v2->v3:
 - Fix kdoc comment errors.
 - Incorporate comments and suggestions from Luis R. Rodriguez.
 - Add a patch to fix a typo error in fs/proc/proc_sysctl.c.

v1->v2:
 - Add kdoc comments to the do_proc_do{u}intvec_minmax_conv_param
   structures.
 - Add a new flags field to the ctl_table structure for specifying
   whether range clamping should be activated instead of adding new
   sysctl parameter handlers.
 - Clamp the semmni value embedded in the multi-values sem parameter.

v4 patch: https://lkml.org/lkml/2018/3/12/867
v5 patch: https://lkml.org/lkml/2018/3/16/1106
v6 patch: https://lkml.org/lkml/2018/4/27/1094

The sysctl parameters msgmni, shmmni and semmni have an inherent limit
of IPC_MNI (32k). However, users may not be aware of that because they
can write a value much higher than that without getting any error or
notification. Reading the parameters back will show the newly written
values which are not real.

The real IPCMNI limit is now enforced to make sure that users won't
put in an unrealistic value. The first 2 patches enforce the limits.

There are also users out there requesting increase in the IPCMNI value.
The last 2 patches attempt to do that by using a boot kernel parameter
"ipcmni_extend" to increase the IPCMNI limit from 32k to 2M if the users
really want the extended value.

Waiman Long (4):
  ipc: IPCMNI limit check for msgmni and shmmni
  ipc: IPCMNI limit check for semmni
  ipc: Allow boot time extension of IPCMNI from 32k to 2M
  ipc: Conserve sequence numbers in extended IPCMNI mode

 Documentation/admin-guide/kernel-parameters.txt |  3 ++
 include/linux/ipc_namespace.h                   |  1 +
 ipc/ipc_sysctl.c                                | 42 +++++++++++++++++++--
 ipc/util.c                                      | 41 ++++++++++++++-------
 ipc/util.h                                      | 49 +++++++++++++++++++++----
 5 files changed, 112 insertions(+), 24 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni
  2018-05-07 20:59 [PATCH v7 0/4] ipc: IPCMNI limit check for *mni & increase that limit Waiman Long
@ 2018-05-07 20:59 ` Waiman Long
  2018-05-07 22:39   ` Luis R. Rodriguez
  2018-05-07 20:59 ` [PATCH v7 2/4] ipc: IPCMNI limit check for semmni Waiman Long
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Waiman Long @ 2018-05-07 20:59 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

A user can write arbitrary integer values to msgmni and shmmni sysctl
parameters without getting error, but the actual limit is really
IPCMNI (32k). This can mislead users as they think they can get a
value that is not real.

The right limits are now set for msgmni and shmmni so that the users
will become aware if they set a value outside of the acceptable range.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 ipc/ipc_sysctl.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 8ad93c2..f87cb29 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 static int zero;
 static int one = 1;
 static int int_max = INT_MAX;
+static int ipc_mni = IPCMNI;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -120,7 +121,9 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 		.data		= &init_ipc_ns.shm_ctlmni,
 		.maxlen		= sizeof(init_ipc_ns.shm_ctlmni),
 		.mode		= 0644,
-		.proc_handler	= proc_ipc_dointvec,
+		.proc_handler	= proc_ipc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &ipc_mni,
 	},
 	{
 		.procname	= "shm_rmid_forced",
@@ -147,7 +150,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 		.mode		= 0644,
 		.proc_handler	= proc_ipc_dointvec_minmax,
 		.extra1		= &zero,
-		.extra2		= &int_max,
+		.extra2		= &ipc_mni,
 	},
 	{
 		.procname	= "auto_msgmni",
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v7 2/4] ipc: IPCMNI limit check for semmni
  2018-05-07 20:59 [PATCH v7 0/4] ipc: IPCMNI limit check for *mni & increase that limit Waiman Long
  2018-05-07 20:59 ` [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni Waiman Long
@ 2018-05-07 20:59 ` Waiman Long
  2018-05-07 20:59 ` [PATCH v7 3/4] ipc: Allow boot time extension of IPCMNI from 32k to 2M Waiman Long
  2018-05-07 20:59 ` [PATCH v7 4/4] ipc: Conserve sequence numbers in extended IPCMNI mode Waiman Long
  3 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2018-05-07 20:59 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

For SysV semaphores, the semmni value is the last part of the 4-element
sem number array. To make semmni behave in a similar way to msgmni and
shmmni, we can't directly use the _minmax handler. Instead, a special
sem specific handler is added to check the last argument to make sure
that it is limited to the [0, IPCMNI] range. An error will be returned
if this is not the case.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 ipc/ipc_sysctl.c | 23 ++++++++++++++++++++++-
 ipc/util.h       |  9 +++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index f87cb29..49f9bf4 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -88,12 +88,33 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 	return proc_dointvec_minmax(&ipc_table, write, buffer, lenp, ppos);
 }
 
+static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
+	void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret, semmni;
+	struct ipc_namespace *ns = current->nsproxy->ipc_ns;
+
+	semmni = ns->sem_ctls[3];
+	ret = proc_ipc_dointvec(table, write, buffer, lenp, ppos);
+
+	if (!ret)
+		ret = sem_check_semmni(current->nsproxy->ipc_ns);
+
+	/*
+	 * Reset the semmni value if an error happens.
+	 */
+	if (ret)
+		ns->sem_ctls[3] = semmni;
+	return ret;
+}
+
 #else
 #define proc_ipc_doulongvec_minmax NULL
 #define proc_ipc_dointvec	   NULL
 #define proc_ipc_dointvec_minmax   NULL
 #define proc_ipc_dointvec_minmax_orphans   NULL
 #define proc_ipc_auto_msgmni	   NULL
+#define proc_ipc_sem_dointvec	   NULL
 #endif
 
 static int zero;
@@ -175,7 +196,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
 		.data		= &init_ipc_ns.sem_ctls,
 		.maxlen		= 4*sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_ipc_dointvec,
+		.proc_handler	= proc_ipc_sem_dointvec,
 	},
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	{
diff --git a/ipc/util.h b/ipc/util.h
index acc5159..8b413f1 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -218,6 +218,15 @@ int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
 void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
 		void (*free)(struct ipc_namespace *, struct kern_ipc_perm *));
 
+static inline int sem_check_semmni(struct ipc_namespace *ns) {
+	/*
+	 * Check semmni range [0, IPCMNI]
+	 * semmni is the last element of sem_ctls[4] array
+	 */
+	return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > IPCMNI))
+		? -ERANGE : 0;
+}
+
 #ifdef CONFIG_COMPAT
 #include <linux/compat.h>
 struct compat_ipc_perm {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v7 3/4] ipc: Allow boot time extension of IPCMNI from 32k to 2M
  2018-05-07 20:59 [PATCH v7 0/4] ipc: IPCMNI limit check for *mni & increase that limit Waiman Long
  2018-05-07 20:59 ` [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni Waiman Long
  2018-05-07 20:59 ` [PATCH v7 2/4] ipc: IPCMNI limit check for semmni Waiman Long
@ 2018-05-07 20:59 ` Waiman Long
  2018-05-07 23:17   ` Luis R. Rodriguez
  2018-05-07 20:59 ` [PATCH v7 4/4] ipc: Conserve sequence numbers in extended IPCMNI mode Waiman Long
  3 siblings, 1 reply; 11+ messages in thread
From: Waiman Long @ 2018-05-07 20:59 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

The maximum number of unique System V IPC identifiers was limited to
32k.  That limit should be big enough for most use cases.

However, there are some users out there requesting for more. To satisfy
the need of those users, a new boot time kernel option "ipcmni_extend"
is added to extend the IPCMNI value to 2M. This is a 64X increase which
hopefully is big enough for them.

This new option does have the side effect of reducing the maximum
number of unique sequence numbers from 64k down to 1k. So it is
a trade-off.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  3 ++
 ipc/ipc_sysctl.c                                | 12 ++++++-
 ipc/util.c                                      | 12 +++----
 ipc/util.h                                      | 42 +++++++++++++++++++------
 4 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 11fc28e..00bc0cb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1735,6 +1735,9 @@
 	ip=		[IP_PNP]
 			See Documentation/filesystems/nfs/nfsroot.txt.
 
+	ipcmni_extend	[KNL] Extend the maximum number of unique System V
+			IPC identifiers from 32768 to 2097152.
+
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
 
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index 49f9bf4..d62335f 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -120,7 +120,8 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
 static int zero;
 static int one = 1;
 static int int_max = INT_MAX;
-static int ipc_mni = IPCMNI;
+int ipc_mni __read_mostly = IPCMNI;
+int ipc_mni_shift __read_mostly = IPCMNI_SHIFT;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -246,3 +247,12 @@ static int __init ipc_sysctl_init(void)
 }
 
 device_initcall(ipc_sysctl_init);
+
+static int __init ipc_mni_extend(char *str)
+{
+	ipc_mni = IPCMNI_EXTEND;
+	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	pr_info("IPCMNI extended to %d.\n", ipc_mni);
+	return 0;
+}
+early_param("ipcmni_extend", ipc_mni_extend);
diff --git a/ipc/util.c b/ipc/util.c
index 4e81182..782a8d0 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -113,7 +113,7 @@ static int __init ipc_init(void)
  * @ids: ipc identifier set
  *
  * Set up the sequence range to use for the ipc identifier range (limited
- * below IPCMNI) then initialise the keys hashtable and ids idr.
+ * below ipc_mni) then initialise the keys hashtable and ids idr.
  */
 int ipc_init_ids(struct ipc_ids *ids)
 {
@@ -214,7 +214,7 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 		ids->next_id = -1;
 	}
 
-	return SEQ_MULTIPLIER * new->seq + id;
+	return (new->seq << SEQ_SHIFT) + id;
 }
 
 #else
@@ -228,7 +228,7 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 	if (ids->seq > IPCID_SEQ_MAX)
 		ids->seq = 0;
 
-	return SEQ_MULTIPLIER * new->seq + id;
+	return (new->seq << SEQ_SHIFT) + id;
 }
 
 #endif /* CONFIG_CHECKPOINT_RESTORE */
@@ -252,8 +252,8 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int limit)
 	kgid_t egid;
 	int id, err;
 
-	if (limit > IPCMNI)
-		limit = IPCMNI;
+	if (limit > ipc_mni)
+		limit = ipc_mni;
 
 	if (!ids->tables_initialized || ids->in_use >= limit)
 		return -ENOSPC;
@@ -777,7 +777,7 @@ static struct kern_ipc_perm *sysvipc_find_ipc(struct ipc_ids *ids, loff_t pos,
 	if (total >= ids->in_use)
 		return NULL;
 
-	for (; pos < IPCMNI; pos++) {
+	for (; pos < ipc_mni; pos++) {
 		ipc = idr_find(&ids->ipcs_idr, pos);
 		if (ipc != NULL) {
 			*new_pos = pos + 1;
diff --git a/ipc/util.h b/ipc/util.h
index 8b413f1..9df177f 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -15,8 +15,30 @@
 #include <linux/err.h>
 #include <linux/ipc_namespace.h>
 
-#define IPCMNI 32768  /* <= MAX_INT limit for ipc arrays (including sysctl changes) */
-#define SEQ_MULTIPLIER	(IPCMNI)
+/*
+ * By default, the ipc arrays can have up to 32k (15 bits) entries.
+ * When IPCMNI extension mode is turned on, the ipc arrays can have up
+ * to 2M (21 bits) entries. However, the space for sequence number will
+ * be shrunk from 16 bits to 10 bits.
+ */
+#define IPCMNI_SHIFT		15
+#define IPCMNI_EXTEND_SHIFT	21
+#define IPCMNI			(1 << IPCMNI_SHIFT)
+#define IPCMNI_EXTEND		(1 << IPCMNI_EXTEND_SHIFT)
+
+#ifdef CONFIG_SYSVIPC_SYSCTL
+extern int ipc_mni;
+extern int ipc_mni_shift;
+
+#define SEQ_SHIFT		ipc_mni_shift
+#define SEQ_MASK		((1 << ipc_mni_shift) - 1)
+
+#else /* CONFIG_SYSVIPC_SYSCTL */
+
+#define ipc_mni 		IPCMNI
+#define SEQ_SHIFT		IPCMNI_SHIFT
+#define SEQ_MASK		((1 << IPCMNI_SHIFT) - 1)
+#endif /* CONFIG_SYSVIPC_SYSCTL */
 
 int sem_init(void);
 int msg_init(void);
@@ -96,9 +118,9 @@ void __init ipc_init_proc_interface(const char *path, const char *header,
 #define IPC_MSG_IDS	1
 #define IPC_SHM_IDS	2
 
-#define ipcid_to_idx(id) ((id) % SEQ_MULTIPLIER)
-#define ipcid_to_seqx(id) ((id) / SEQ_MULTIPLIER)
-#define IPCID_SEQ_MAX min_t(int, INT_MAX/SEQ_MULTIPLIER, USHRT_MAX)
+#define ipcid_to_idx(id)  ((id) & SEQ_MASK)
+#define ipcid_to_seqx(id) ((id) >> SEQ_SHIFT)
+#define IPCID_SEQ_MAX	  (INT_MAX >> SEQ_SHIFT)
 
 /* must be called with ids->rwsem acquired for writing */
 int ipc_addid(struct ipc_ids *, struct kern_ipc_perm *, int);
@@ -123,8 +145,8 @@ static inline int ipc_get_maxid(struct ipc_ids *ids)
 	if (ids->in_use == 0)
 		return -1;
 
-	if (ids->in_use == IPCMNI)
-		return IPCMNI - 1;
+	if (ids->in_use == ipc_mni)
+		return ipc_mni - 1;
 
 	return ids->max_id;
 }
@@ -175,7 +197,7 @@ static inline void ipc_update_pid(struct pid **pos, struct pid *pid)
 
 static inline int ipc_checkid(struct kern_ipc_perm *ipcp, int uid)
 {
-	return uid / SEQ_MULTIPLIER != ipcp->seq;
+	return (uid >> SEQ_SHIFT) != ipcp->seq;
 }
 
 static inline void ipc_lock_object(struct kern_ipc_perm *perm)
@@ -220,10 +242,10 @@ void free_ipcs(struct ipc_namespace *ns, struct ipc_ids *ids,
 
 static inline int sem_check_semmni(struct ipc_namespace *ns) {
 	/*
-	 * Check semmni range [0, IPCMNI]
+	 * Check semmni range [0, ipc_mni]
 	 * semmni is the last element of sem_ctls[4] array
 	 */
-	return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > IPCMNI))
+	return ((ns->sem_ctls[3] < 0) || (ns->sem_ctls[3] > ipc_mni))
 		? -ERANGE : 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v7 4/4] ipc: Conserve sequence numbers in extended IPCMNI mode
  2018-05-07 20:59 [PATCH v7 0/4] ipc: IPCMNI limit check for *mni & increase that limit Waiman Long
                   ` (2 preceding siblings ...)
  2018-05-07 20:59 ` [PATCH v7 3/4] ipc: Allow boot time extension of IPCMNI from 32k to 2M Waiman Long
@ 2018-05-07 20:59 ` Waiman Long
  3 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2018-05-07 20:59 UTC (permalink / raw)
  To: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet
  Cc: linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman, Waiman Long

The mixing in of a sequence number into the IPC IDs is probably to
avoid ID reuse in userspace as much as possible. With extended IPCMNI
mode, the number of usable sequence numbers is greatly reduced leading
to higher chance of ID reuse.

To address this issue, we need to conserve the sequence number space
as much as possible. Right now, the sequence number is incremented
for every new ID created. In reality, we only need to increment the
sequence number when one or more IDs have been removed previously to
make sure that those IDs will not be reused when a new one is built.
This is being done in the extended IPCMNI mode,

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/ipc_namespace.h |  1 +
 ipc/ipc_sysctl.c              |  2 ++
 ipc/util.c                    | 29 ++++++++++++++++++++++-------
 ipc/util.h                    |  2 ++
 4 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index b5630c8..9c86fd9 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -16,6 +16,7 @@
 struct ipc_ids {
 	int in_use;
 	unsigned short seq;
+	unsigned short deleted;
 	bool tables_initialized;
 	struct rw_semaphore rwsem;
 	struct idr ipcs_idr;
diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
index d62335f..1d32941 100644
--- a/ipc/ipc_sysctl.c
+++ b/ipc/ipc_sysctl.c
@@ -122,6 +122,7 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
 static int int_max = INT_MAX;
 int ipc_mni __read_mostly = IPCMNI;
 int ipc_mni_shift __read_mostly = IPCMNI_SHIFT;
+bool ipc_mni_extended __read_mostly;
 
 static struct ctl_table ipc_kern_table[] = {
 	{
@@ -252,6 +253,7 @@ static int __init ipc_mni_extend(char *str)
 {
 	ipc_mni = IPCMNI_EXTEND;
 	ipc_mni_shift = IPCMNI_EXTEND_SHIFT;
+	ipc_mni_extended = true;
 	pr_info("IPCMNI extended to %d.\n", ipc_mni);
 	return 0;
 }
diff --git a/ipc/util.c b/ipc/util.c
index 782a8d0..7c8e733 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -119,7 +119,8 @@ int ipc_init_ids(struct ipc_ids *ids)
 {
 	int err;
 	ids->in_use = 0;
-	ids->seq = 0;
+	ids->deleted = false;
+	ids->seq = ipc_mni_extended ? 0 : -1; /* seq # is pre-incremented */
 	init_rwsem(&ids->rwsem);
 	err = rhashtable_init(&ids->key_ht, &ipc_kht_params);
 	if (err)
@@ -193,6 +194,11 @@ static struct kern_ipc_perm *ipc_findkey(struct ipc_ids *ids, key_t key)
 	return NULL;
 }
 
+/*
+ * To conserve sequence number space with extended ipc_mni when new ID
+ * is built, the sequence number is incremented only when one or more
+ * IDs have been removed previously.
+ */
 #ifdef CONFIG_CHECKPOINT_RESTORE
 /*
  * Specify desired id for next allocated IPC object.
@@ -206,9 +212,13 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 			      struct kern_ipc_perm *new)
 {
 	if (ids->next_id < 0) { /* default, behave as !CHECKPOINT_RESTORE */
-		new->seq = ids->seq++;
-		if (ids->seq > IPCID_SEQ_MAX)
-			ids->seq = 0;
+		if (!ipc_mni_extended || ids->deleted) {
+			ids->seq++;
+			if (ids->seq > IPCID_SEQ_MAX)
+				ids->seq = 0;
+			ids->deleted = false;
+		}
+		new->seq = ids->seq;
 	} else {
 		new->seq = ipcid_to_seqx(ids->next_id);
 		ids->next_id = -1;
@@ -224,9 +234,13 @@ static inline int ipc_buildid(int id, struct ipc_ids *ids,
 static inline int ipc_buildid(int id, struct ipc_ids *ids,
 			      struct kern_ipc_perm *new)
 {
-	new->seq = ids->seq++;
-	if (ids->seq > IPCID_SEQ_MAX)
-		ids->seq = 0;
+	if (!ipc_mni_extended || ids->deleted) {
+		ids->seq++;
+		if (ids->seq > IPCID_SEQ_MAX)
+			ids->seq = 0;
+		ids->deleted = false;
+	}
+	new->seq = ids->seq;
 
 	return (new->seq << SEQ_SHIFT) + id;
 }
@@ -436,6 +450,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm *ipcp)
 	idr_remove(&ids->ipcs_idr, lid);
 	ipc_kht_remove(ids, ipcp);
 	ids->in_use--;
+	ids->deleted = true;
 	ipcp->deleted = true;
 
 	if (unlikely(lid == ids->max_id)) {
diff --git a/ipc/util.h b/ipc/util.h
index 9df177f..0ef381c 100644
--- a/ipc/util.h
+++ b/ipc/util.h
@@ -29,6 +29,7 @@
 #ifdef CONFIG_SYSVIPC_SYSCTL
 extern int ipc_mni;
 extern int ipc_mni_shift;
+extern bool ipc_mni_extended;
 
 #define SEQ_SHIFT		ipc_mni_shift
 #define SEQ_MASK		((1 << ipc_mni_shift) - 1)
@@ -36,6 +37,7 @@
 #else /* CONFIG_SYSVIPC_SYSCTL */
 
 #define ipc_mni 		IPCMNI
+#define ipc_mni_extended	false
 #define SEQ_SHIFT		IPCMNI_SHIFT
 #define SEQ_MASK		((1 << IPCMNI_SHIFT) - 1)
 #endif /* CONFIG_SYSVIPC_SYSCTL */
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni
  2018-05-07 20:59 ` [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni Waiman Long
@ 2018-05-07 22:39   ` Luis R. Rodriguez
  2018-05-07 23:57     ` Waiman Long
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2018-05-07 22:39 UTC (permalink / raw)
  To: Waiman Long
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman

On Mon, May 07, 2018 at 04:59:09PM -0400, Waiman Long wrote:
> A user can write arbitrary integer values to msgmni and shmmni sysctl
> parameters without getting error, but the actual limit is really
> IPCMNI (32k). This can mislead users as they think they can get a
> value that is not real.
> 
> The right limits are now set for msgmni and shmmni so that the users
> will become aware if they set a value outside of the acceptable range.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  ipc/ipc_sysctl.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> index 8ad93c2..f87cb29 100644
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
>  static int zero;
>  static int one = 1;
>  static int int_max = INT_MAX;
> +static int ipc_mni = IPCMNI;
>  
>  static struct ctl_table ipc_kern_table[] = {
>  	{
> @@ -120,7 +121,9 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
>  		.data		= &init_ipc_ns.shm_ctlmni,
>  		.maxlen		= sizeof(init_ipc_ns.shm_ctlmni),
>  		.mode		= 0644,
> -		.proc_handler	= proc_ipc_dointvec,
> +		.proc_handler	= proc_ipc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &ipc_mni,
>  	},
>  	{
>  		.procname	= "shm_rmid_forced",
> @@ -147,7 +150,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
>  		.mode		= 0644,
>  		.proc_handler	= proc_ipc_dointvec_minmax,
>  		.extra1		= &zero,
> -		.extra2		= &int_max,
> +		.extra2		= &ipc_mni,
>  	},
>  	{
>  		.procname	= "auto_msgmni",
> -- 
> 1.8.3.1

It seems negative values are not allowed, if true then having
a caller to use proc_douintvec_minmax() would help with ensuring
no invalid negative input values are used as well.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 3/4] ipc: Allow boot time extension of IPCMNI from 32k to 2M
  2018-05-07 20:59 ` [PATCH v7 3/4] ipc: Allow boot time extension of IPCMNI from 32k to 2M Waiman Long
@ 2018-05-07 23:17   ` Luis R. Rodriguez
  2018-05-08  0:04     ` Waiman Long
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2018-05-07 23:17 UTC (permalink / raw)
  To: Waiman Long
  Cc: Luis R. Rodriguez, Kees Cook, Christoph Lameter, Andrew Morton,
	Jonathan Corbet, linux-kernel, linux-fsdevel, linux-doc, Al Viro,
	tglx, arnd, Matthew Wilcox, Eric W. Biederman

On Mon, May 07, 2018 at 04:59:11PM -0400, Waiman Long wrote:
> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> index 49f9bf4..d62335f 100644
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -120,7 +120,8 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
>  static int zero;
>  static int one = 1;
>  static int int_max = INT_MAX;
> -static int ipc_mni = IPCMNI;
> +int ipc_mni __read_mostly = IPCMNI;
> +int ipc_mni_shift __read_mostly = IPCMNI_SHIFT;
>  
>  static struct ctl_table ipc_kern_table[] = {
>  	{

Is use of ipc_mni and ipc_mni_shift a hot path? As per Christoph Lameter,
its use should be reserved for data that is actually used frequently in hot
paths, and typically this was done after performance traces reveal contention
because a neighboring variable was frequently written to [0]. These would also
be tightly packed, to reduce the number of cachelines needed to execute a
critical path, so we should be selective about what variables use it.

Your commit log does not describe why you'd use __read_mostly here. It would
be useful if it did.

[0] https://lkml.kernel.org/r/alpine.DEB.2.11.1504301343190.28879@gentwo.org

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni
  2018-05-07 22:39   ` Luis R. Rodriguez
@ 2018-05-07 23:57     ` Waiman Long
  2018-05-09 19:32       ` Luis R. Rodriguez
  0 siblings, 1 reply; 11+ messages in thread
From: Waiman Long @ 2018-05-07 23:57 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Kees Cook, Andrew Morton, Jonathan Corbet, linux-kernel,
	linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman

On 05/07/2018 06:39 PM, Luis R. Rodriguez wrote:
> On Mon, May 07, 2018 at 04:59:09PM -0400, Waiman Long wrote:
>> A user can write arbitrary integer values to msgmni and shmmni sysctl
>> parameters without getting error, but the actual limit is really
>> IPCMNI (32k). This can mislead users as they think they can get a
>> value that is not real.
>>
>> The right limits are now set for msgmni and shmmni so that the users
>> will become aware if they set a value outside of the acceptable range.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>  ipc/ipc_sysctl.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
>> index 8ad93c2..f87cb29 100644
>> --- a/ipc/ipc_sysctl.c
>> +++ b/ipc/ipc_sysctl.c
>> @@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
>>  static int zero;
>>  static int one = 1;
>>  static int int_max = INT_MAX;
>> +static int ipc_mni = IPCMNI;
>>  
>>  static struct ctl_table ipc_kern_table[] = {
>>  	{
>> @@ -120,7 +121,9 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
>>  		.data		= &init_ipc_ns.shm_ctlmni,
>>  		.maxlen		= sizeof(init_ipc_ns.shm_ctlmni),
>>  		.mode		= 0644,
>> -		.proc_handler	= proc_ipc_dointvec,
>> +		.proc_handler	= proc_ipc_dointvec_minmax,
>> +		.extra1		= &zero,
>> +		.extra2		= &ipc_mni,
>>  	},
>>  	{
>>  		.procname	= "shm_rmid_forced",
>> @@ -147,7 +150,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
>>  		.mode		= 0644,
>>  		.proc_handler	= proc_ipc_dointvec_minmax,
>>  		.extra1		= &zero,
>> -		.extra2		= &int_max,
>> +		.extra2		= &ipc_mni,
>>  	},
>>  	{
>>  		.procname	= "auto_msgmni",
>> -- 
>> 1.8.3.1
> It seems negative values are not allowed, if true then having
> a caller to use proc_douintvec_minmax() would help with ensuring
> no invalid negative input values are used as well.
>
>   Luis

Negative value doesn't mean sense here. So it is true that we can use
proc_douintvec_minmax() instead. However, the data types themselves are
defined as "int". So I think it is better to keep using
proc_dointvec_minmax() to be consistent with the data type.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 3/4] ipc: Allow boot time extension of IPCMNI from 32k to 2M
  2018-05-07 23:17   ` Luis R. Rodriguez
@ 2018-05-08  0:04     ` Waiman Long
  0 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2018-05-08  0:04 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Kees Cook, Christoph Lameter, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro, tglx, arnd,
	Matthew Wilcox, Eric W. Biederman

On 05/07/2018 07:17 PM, Luis R. Rodriguez wrote:
> On Mon, May 07, 2018 at 04:59:11PM -0400, Waiman Long wrote:
>> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
>> index 49f9bf4..d62335f 100644
>> --- a/ipc/ipc_sysctl.c
>> +++ b/ipc/ipc_sysctl.c
>> @@ -120,7 +120,8 @@ static int proc_ipc_sem_dointvec(struct ctl_table *table, int write,
>>  static int zero;
>>  static int one = 1;
>>  static int int_max = INT_MAX;
>> -static int ipc_mni = IPCMNI;
>> +int ipc_mni __read_mostly = IPCMNI;
>> +int ipc_mni_shift __read_mostly = IPCMNI_SHIFT;
>>  
>>  static struct ctl_table ipc_kern_table[] = {
>>  	{
> Is use of ipc_mni and ipc_mni_shift a hot path? As per Christoph Lameter,
> its use should be reserved for data that is actually used frequently in hot
> paths, and typically this was done after performance traces reveal contention
> because a neighboring variable was frequently written to [0]. These would also
> be tightly packed, to reduce the number of cachelines needed to execute a
> critical path, so we should be selective about what variables use it.
>
> Your commit log does not describe why you'd use __read_mostly here. It would
> be useful if it did.
>
> [0] https://lkml.kernel.org/r/alpine.DEB.2.11.1504301343190.28879@gentwo.org
I used __read_mostly to reduce the performance impact of transitioning
from a constant to a variable. But you are right, their use are probably
not in a hot path. So even the use of regular variables shouldn't show
any noticeable performance difference. I can take that out in the my
next version after I gather enough feedback.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni
  2018-05-07 23:57     ` Waiman Long
@ 2018-05-09 19:32       ` Luis R. Rodriguez
  2018-06-18  9:52         ` Waiman Long
  0 siblings, 1 reply; 11+ messages in thread
From: Luis R. Rodriguez @ 2018-05-09 19:32 UTC (permalink / raw)
  To: Waiman Long
  Cc: Luis R. Rodriguez, Kees Cook, Andrew Morton, Jonathan Corbet,
	linux-kernel, linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman

On Mon, May 07, 2018 at 07:57:12PM -0400, Waiman Long wrote:
> On 05/07/2018 06:39 PM, Luis R. Rodriguez wrote:
> > On Mon, May 07, 2018 at 04:59:09PM -0400, Waiman Long wrote:
> >> A user can write arbitrary integer values to msgmni and shmmni sysctl
> >> parameters without getting error, but the actual limit is really
> >> IPCMNI (32k). This can mislead users as they think they can get a
> >> value that is not real.
> >>
> >> The right limits are now set for msgmni and shmmni so that the users
> >> will become aware if they set a value outside of the acceptable range.
> >>
> >> Signed-off-by: Waiman Long <longman@redhat.com>
> >> ---
> >>  ipc/ipc_sysctl.c | 7 +++++--
> >>  1 file changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> >> index 8ad93c2..f87cb29 100644
> >> --- a/ipc/ipc_sysctl.c
> >> +++ b/ipc/ipc_sysctl.c
> >> @@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
> >>  static int zero;
> >>  static int one = 1;
> >>  static int int_max = INT_MAX;
> >> +static int ipc_mni = IPCMNI;
> >>  
> >>  static struct ctl_table ipc_kern_table[] = {
> >>  	{
> >> @@ -120,7 +121,9 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
> >>  		.data		= &init_ipc_ns.shm_ctlmni,
> >>  		.maxlen		= sizeof(init_ipc_ns.shm_ctlmni),
> >>  		.mode		= 0644,
> >> -		.proc_handler	= proc_ipc_dointvec,
> >> +		.proc_handler	= proc_ipc_dointvec_minmax,
> >> +		.extra1		= &zero,
> >> +		.extra2		= &ipc_mni,
> >>  	},
> >>  	{
> >>  		.procname	= "shm_rmid_forced",
> >> @@ -147,7 +150,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
> >>  		.mode		= 0644,
> >>  		.proc_handler	= proc_ipc_dointvec_minmax,
> >>  		.extra1		= &zero,
> >> -		.extra2		= &int_max,
> >> +		.extra2		= &ipc_mni,
> >>  	},
> >>  	{
> >>  		.procname	= "auto_msgmni",
> >> -- 
> >> 1.8.3.1
> > It seems negative values are not allowed, if true then having
> > a caller to use proc_douintvec_minmax() would help with ensuring
> > no invalid negative input values are used as well.
> >
> >   Luis
> 
> Negative value doesn't mean sense here. So it is true that we can use
> proc_douintvec_minmax() instead. However, the data types themselves are
> defined as "int". So I think it is better to keep using
> proc_dointvec_minmax() to be consistent with the data type.

Huh, no... If you *know* the valid values *are* only positive, the right
thing to do is to then *change* the data type. Tons of odd bugs can creep
up because of these stupid things.

  Luis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni
  2018-05-09 19:32       ` Luis R. Rodriguez
@ 2018-06-18  9:52         ` Waiman Long
  0 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2018-06-18  9:52 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Kees Cook, Andrew Morton, Jonathan Corbet, linux-kernel,
	linux-fsdevel, linux-doc, Al Viro, Matthew Wilcox,
	Eric W. Biederman

On 05/10/2018 03:32 AM, Luis R. Rodriguez wrote:
> On Mon, May 07, 2018 at 07:57:12PM -0400, Waiman Long wrote:
>> On 05/07/2018 06:39 PM, Luis R. Rodriguez wrote:
>>> On Mon, May 07, 2018 at 04:59:09PM -0400, Waiman Long wrote:
>>>> A user can write arbitrary integer values to msgmni and shmmni sysctl
>>>> parameters without getting error, but the actual limit is really
>>>> IPCMNI (32k). This can mislead users as they think they can get a
>>>> value that is not real.
>>>>
>>>> The right limits are now set for msgmni and shmmni so that the users
>>>> will become aware if they set a value outside of the acceptable range.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>  ipc/ipc_sysctl.c | 7 +++++--
>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
>>>> index 8ad93c2..f87cb29 100644
>>>> --- a/ipc/ipc_sysctl.c
>>>> +++ b/ipc/ipc_sysctl.c
>>>> @@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
>>>>  static int zero;
>>>>  static int one = 1;
>>>>  static int int_max = INT_MAX;
>>>> +static int ipc_mni = IPCMNI;
>>>>  
>>>>  static struct ctl_table ipc_kern_table[] = {
>>>>  	{
>>>> @@ -120,7 +121,9 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
>>>>  		.data		= &init_ipc_ns.shm_ctlmni,
>>>>  		.maxlen		= sizeof(init_ipc_ns.shm_ctlmni),
>>>>  		.mode		= 0644,
>>>> -		.proc_handler	= proc_ipc_dointvec,
>>>> +		.proc_handler	= proc_ipc_dointvec_minmax,
>>>> +		.extra1		= &zero,
>>>> +		.extra2		= &ipc_mni,
>>>>  	},
>>>>  	{
>>>>  		.procname	= "shm_rmid_forced",
>>>> @@ -147,7 +150,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, int write,
>>>>  		.mode		= 0644,
>>>>  		.proc_handler	= proc_ipc_dointvec_minmax,
>>>>  		.extra1		= &zero,
>>>> -		.extra2		= &int_max,
>>>> +		.extra2		= &ipc_mni,
>>>>  	},
>>>>  	{
>>>>  		.procname	= "auto_msgmni",
>>>> -- 
>>>> 1.8.3.1
>>> It seems negative values are not allowed, if true then having
>>> a caller to use proc_douintvec_Fminmax() would help with ensuring
>>> no invalid negative input values are used as well.
>>>
>>>   Luis
>> Negative value doesn't mean sense here. So it is true that we can use
>> proc_douintvec_minmax() instead. However, the data types themselves are
>> defined as "int". So I think it is better to keep using
>> proc_dointvec_minmax() to be consistent with the data type.
> Huh, no... If you *know* the valid values *are* only positive, the right
> thing to do is to then *change* the data type. Tons of odd bugs can creep
> up because of these stupid things.
>
>   Luis

Sorry for the late reply.

First of all, negative value will not be accepted because of the zero
lower limit check. The type of msgmni, shmmni and semmni are defined as
int in the uapi/linux/msg.h and uapi/linux/shm.h and uapi/linux/sem.h.
They are exposed to the userspace and changing them to "unsigned int"
may cause some undesirable consequence. Again this is a case of
introducing risk without any noticeable benefit.

I understand your desire of cleaning thing up. However, I am hesitant to
take this risk without seeing any real benefit in this case.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-06-18  9:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-07 20:59 [PATCH v7 0/4] ipc: IPCMNI limit check for *mni & increase that limit Waiman Long
2018-05-07 20:59 ` [PATCH v7 1/4] ipc: IPCMNI limit check for msgmni and shmmni Waiman Long
2018-05-07 22:39   ` Luis R. Rodriguez
2018-05-07 23:57     ` Waiman Long
2018-05-09 19:32       ` Luis R. Rodriguez
2018-06-18  9:52         ` Waiman Long
2018-05-07 20:59 ` [PATCH v7 2/4] ipc: IPCMNI limit check for semmni Waiman Long
2018-05-07 20:59 ` [PATCH v7 3/4] ipc: Allow boot time extension of IPCMNI from 32k to 2M Waiman Long
2018-05-07 23:17   ` Luis R. Rodriguez
2018-05-08  0:04     ` Waiman Long
2018-05-07 20:59 ` [PATCH v7 4/4] ipc: Conserve sequence numbers in extended IPCMNI mode Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).