LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC][PATCH 0/6] Automatice kernel tunables (AKT)
@ 2007-01-16  6:15 Nadia.Derbey
  2007-01-16  6:15 ` [RFC][PATCH 1/6] Tunable structure and registration routines Nadia.Derbey
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Nadia.Derbey @ 2007-01-16  6:15 UTC (permalink / raw)
  To: linux-kernel

This is a series of patches that introduces a feature that makes the kernel
automatically change the tunables values as it sees resources running out.

The AKT framework is made of 2 parts:

1) Kernel part:
Interfaces are provided to the kernel subsystems, to (un)register the
tunables that might be automatically tuned in the future.

Registering a tunable consists in the following steps:
- a structure is declared and filled by the kernel subsystem for the
registered tunable
- that tunable structure is registered into sysfs

Registration should be done during the kernel subsystem initialization step.


Another interface is provided to the kernel subsystems, to activate the
automatic tuning for a registered tunable. It can be called during resource
allocation to tune up, and during resource freeing to tune down the registered
tunable. The automatic tuning routine is called only if the tunable has
been enabled to be automatically tuning in sysfs.

2) User part:

AKT uses sysfs to enable the tunables management from the user world (mainly
making them automatic or manual).

akt uses sysfs in the following way:
- a tunables subsystem (tunables_subsys) is declared and registered during akt
initialization.
- registering a tunable is equivalent to registering the corresponding kobject
within that subsystem.
- each tunable kobject has 3 associated attributes, all with a RW mode (i.e.
the show() and store() methods are provided for them):
        . autotune: enables to (de)activate automatic tuning for the tunable
        . max: enables to set a new maximum value for the tunable
        . min: enables to set a new minimum value for the tunable

The only way to activate automatic tuning is from user side:
- the directory /sys/tunables is created during the init phase.
- each time a tunable is registered by a kernel subsystem, a directory is
created for it under /sys/tunables.
- This directory contains 1 file for each tunable kobject attribute



These patches should be applied to 2.6.20-rc4, in the following order:

[PATCH 1/6]: tunables_registration.patch
[PATCH 2/6]: auto_tuning_activation.patch
[PATCH 3/6]: auto_tuning_kobjects.patch
[PATCH 4/6]: tunable_min_max_kobjects.patch
[PATCH 5/6]: per_namespace_tunables.patch
[PATCH 6/6]: auto_tune_applied.patch

--

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 1/6] Tunable structure and registration routines
  2007-01-16  6:15 [RFC][PATCH 0/6] Automatice kernel tunables (AKT) Nadia.Derbey
@ 2007-01-16  6:15 ` Nadia.Derbey
  2007-01-25  0:32   ` Randy Dunlap
  2007-01-16  6:15 ` [RFC][PATCH 2/6] auto_tuning activation Nadia.Derbey
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Nadia.Derbey @ 2007-01-16  6:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nadia Derbey

[-- Attachment #1: tunables_registration.patch --]
[-- Type: text/plain, Size: 35037 bytes --]

[PATCH 01/06]

Defines the auto_tune structure: this is the structure that contains the
information needed by the adjustment routine for a given tunable.
Also defines the registration routines.

The fork kernel component defines a tunable structure for the threads-max
tunable and registers it.


Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>


---
 Documentation/00-INDEX      |    2 
 Documentation/auto_tune.txt |  333 ++++++++++++++++++++++++++++++++++++++++++++
 fs/Kconfig                  |    2 
 include/linux/akt.h         |  186 ++++++++++++++++++++++++
 include/linux/akt_ops.h     |  186 ++++++++++++++++++++++++
 init/main.c                 |    2 
 kernel/Makefile             |    1 
 kernel/autotune/Kconfig     |   30 +++
 kernel/autotune/Makefile    |    7 
 kernel/autotune/akt.c       |  123 ++++++++++++++++
 kernel/fork.c               |   18 ++
 11 files changed, 890 insertions(+)

Index: linux-2.6.20-rc4/Documentation/00-INDEX
===================================================================
--- linux-2.6.20-rc4.orig/Documentation/00-INDEX	2007-01-15 13:08:13.000000000 +0100
+++ linux-2.6.20-rc4/Documentation/00-INDEX	2007-01-15 14:17:22.000000000 +0100
@@ -52,6 +52,8 @@ applying-patches.txt
 	- description of various trees and how to apply their patches.
 arm/
 	- directory with info about Linux on the ARM architecture.
+auto_tune.txt
+	- info on the Automatic Kernel Tunables (AKT) feature.
 basic_profiling.txt
 	- basic instructions for those who wants to profile Linux kernel.
 binfmt_misc.txt
Index: linux-2.6.20-rc4/Documentation/auto_tune.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/Documentation/auto_tune.txt	2007-01-15 14:19:18.000000000 +0100
@@ -0,0 +1,333 @@
+			Automatic Kernel Tunables
+                        =========================
+
+		   Nadia Derbey (Nadia.Derbey@bull.net)
+
+
+
+This feature aims at making the kernel automatically change the tunables
+values as it sees resources running out.
+
+The AKT framework is made of 2 parts:
+
+1) Kernel part:
+Interfaces are provided to the kernel subsystems, to (un)register the
+tunables that might be automatically tuned in the future.
+
+Registering a tunable consists in the following steps:
+- a structure is declared and filled by the kernel subsystem for the
+registered tunable
+- that tunable structure is registered into sysfs
+
+Registration should be done during the kernel subsystem initialization step.
+
+Unregistering a tunable is the reverse operation. It should not be necessary
+for the kernel subsystems: it is only useful when unloading modules that would
+have registered a tunable during their loading step.
+
+The routines interfaces are the following:
+
+1.1) Declaring a tunable:
+
+A tunable structure should be declared and defined by the kernel subsystems as
+follows:
+
+DEFINE_TUNABLE(structure_name, threshold, min, max,
+		tunable_variable_ptr, checked_variable_ptr,
+		tunable_variable_type);
+
+Parameters:
+- structure_name: this is the name of the tunable structure
+
+- threshold: percentage to apply to the tunable value to detect if adjustment
+is needed
+
+- min: minimum value the tunable can ever reach (needed when adjusting down
+the tunable)
+
+- max: maximum value the tunable can ever reach (needed when adjusting up the
+tunable)
+
+- tunable_variable_ptr: address of the tunable that will be adjusted if
+needed.
+(ex: in kernel/fork.c it is max_threads's address)
+
+- checked_variable_ptr: address of the variable that is controlled by the
+tunable. This is the calling subsystem's object counter.
+(ex: in kernel/fork.c it is nr_threads's address: nr_threads should
+always remain < max_threads)
+
+- tunable_variable_type: this type is important since it helps choosing the
+appropriate automatic tuning routine.
+It can be one of short / ushort / int / uint / size_t / long / ulong
+
+The automatic tuning routine (i.e. the routine that should be called when
+automatic tuning is activated) is set to the default one:
+default_auto_tuning_<type>().
+<type> is chosen according to the tunable_variable_type parameters.
+All the previously listed parameters are useful to this routine.
+Refer to the description of the automatic adjustment routine to see how
+these parameters are actually used.
+
+Refer to "Updating the auto-tuning function pointer" to know how to set
+this routine to another one.
+
+
+1.2) Updating a tunable's characteristics
+
+1.2.1) Updating min / max values:
+
+Sometimes, when calling DEFINE_TUNABLE(), the min and max values are not
+exactly known, yet. In that case, the following routine should be called
+once these values are known:
+
+set_tunable_min_max(structure_name, new_min, new_max)
+
+Parameters:
+- structure_name: this is the name of the tunable structure
+
+- new_min: minimum value the tunable can ever reach
+
+- new_max: maximum value the tunable can ever reach
+
+1.2.2) Updating the auto-tuning function pointer:
+
+If the default auto-tuning routine doesn't fit your needs, you can define
+another one and associate it to the tunable using the following routine:
+
+set_autotuning_routine(structure_name, auto_tune)
+
+Parameters:
+- structure_name: this is the name of the tunable structure
+
+- auto_tune: routine that should be called when automatic tuning is activated.
+If this parameter is not NULL, it should be set to a function pointer defined
+by the kernel subsystem caller. See 1.5) for the routine prototype. See also
+maxfiles_auto_tuning() in fs/file_table.c for an example.
+
+
+1.3) Registering a tunable:
+
+Once declared and its min / max / auto_tuning routine updated, the tunable
+structure should be registered using the following routine:
+
+int register_tunable(struct auto_tune *tunable_addr);
+
+Parameters:
+- tunable_addr: address of the tunable structure previsouly declared.
+
+Return value:
+- 0 : successful
+- < 0 : failure
+
+
+Registering a tunable makes it potentially automatically adjustable:
+the tunable is viewed as a kobject with 3 attributes (i.e. 3 files at sysfs
+level):
+- autotune (rw): enables to (de)activate the auto tuning for that tunable
+- min (rw): enables to play with the min tunable value
+- max (rw): enables to play with the max tunable value
+
+The only way to make a registered tunable automatically adjustable is through
+sysfs (see the sysfs part for more details).
+
+
+
+1.4) Unregistering a tunable:
+
+int unregister_tunable(struct auto_tune *reg_tun_addr);
+
+Parameters:
+- reg_tun_addr: address of the tunable structure to unregister
+
+
+This routine is only useful for modules: when unloading, they should
+unregister any previously registered tunable.
+
+
+
+1.5) Automatic tuning routine:
+
+The 2nd main service provided by the kernel part is a function pointer
+(auto_tune_func): it points to the routine that actually automatically
+adjusts the tunable passed in as a parameter.
+
+This is accomplished by one of the following:
+- if an automatic tuning routine has been provided during the tunable
+declaration, that routine will actually be called.
+- if no automatic tuning routine has been provided, the default one is called.
+NOTE: it can process one of the following types, depending on the type used
+	when declaring the tunable (see DEFINE_TUNABLE above): short, ushort,
+	int, uint, size-t, long, ulong.
+
+
+If the automatic tuning routine is provided by the kernel subsystem caller,
+it should be declared as follows:
+
+int <routine_name>(int cmd, struct auto_tune *params);
+
+Parameters:
+- cmd: tuning direction
+	. AKT_UP: the tunable will be adjusted upwards (i.e. its value is
+		increased if needed)
+	. AKT_DOWN: the tunable is adjusted downwards (i.e. its value is
+		decreased if needed)
+- params: pointer to the previously registered tunable structure
+
+
+Any kernel subsystem that has registered a tunable should call
+auto_tune_func() as follows:
+
++-------------------------+--------------------------------------------+
+| Step                    | Routine to call                            |
++-------------------------+--------------------------------------------+
+| Declaration phase       | DEFINE_TUNABLE(name, values...);           |
++-------------------------+--------------------------------------------+
+| Initialization routine  | set_tunable_min_max(name, min, max);       |
+|                         | set_autotuning_routine(name, routine);     |
+|                         | register_tunable(&name);                   |
+| Note: the 1st 2 calls   |                                            |
+|       are optional      |                                            |
++-------------------------+--------------------------------------------+
+| Alloc                   | activate_auto_tuning(AKT_UP, &name);       |
++-------------------------+--------------------------------------------+
+| Free                    | activate_auto_tuning(AKT_DOWN, &name);     |
++-------------------------+--------------------------------------------+
+| module_exit() routine   | unregister_tunable(&name);                 |
++-------------------------+--------------------------------------------+
+
+activate_auto_tuning is a static inline defined in akt.h, that does the
+following:
+. if <tunable is registered> and <auto tuning is allowd for tunable>
+.   call the routine stored in tunable->auto_tune
+
+
+The effect of the default automatic tuning routine is the following:
+
+           +----------------------------------------------------------------+
+           |                 Tunable automatically adjustable               |
+           +---------------+------------------------------------------------+
+           |      NO       |                      YES                       |
++----------+---------------+------------------------------------------------+
+| AKT_UP   | No effect     | If the tunable value exceeds the specified     |
+|          |               | threshold, that value is increased up to a     |
+|          |               | maximum value.                                 |
+|          |               | The maximum value is specified during the      |
+|          |               | tunable declaration and can be changed at any  |
+|          |               | time through sysfs                             |
++----------+---------------+------------------------------------------------+
+| AKT_DOWN | No effect     | If the tunable value falls under the specified |
+|          |               | threshold, that value is decreased down to a   |
+|          |               | minimum value.                                 |
+|          |               | The minimum value is specified during the      |
+|          |               | tunable declaration and can be changed at any  |
+|          |               | time through sysfs                             |
++----------+---------------+------------------------------------------------+
+
+
+1.6. Default automatic adjustment routine
+
+The last service provided by AKT at the kernel level is the default automatic
+adjustment routine. As seen, above, this routine supports various tunables
+types. It works as follows (only the AKT_UP direction is described here -
+AKT_DOWN does the reverse operation):
+
+The 2nd parameter passed in to this routine is a pointer to a previously
+registerd tunable structure. That structure contains the following fields (see
+1.1 for the detailed description):
+- threshold
+- key
+- min
+- max
+- tunable
+- checked
+
+When this routine is entered, it does the following:
+1. <*checked> is compared to <*tunable> * threshold
+2. if <*checked> is greater, <*tunable> is set to:
+	<*tunable> + (<*tunable> * (100 - threshold) / 100)
+
+
+
+1.6) akt and sysfs:
+
+AKT uses sysfs to enable the tunables management from the user world (mainly
+making them automatic or manual).
+
+akt uses sysfs in the following way:
+- a tunables subsystem (tunables_subsys) is declared and registered during akt
+initialization.
+- registering a tunable is equivalent to registering the corresponding kobject
+within that subsystem.
+- each tunable kobject has 3 associated attributes, all with a RW mode (i.e.
+the show() and store() methods are provided for them):
+	. autotune: enables to (de)activate automatic tuning for the tunable
+	. max: enables to set a new maximum value for the tunable
+	. min: enables to set a new minimum value for the tunable
+
+
+1.7) tunables that are namespace dependent
+
+In this paragraph, the particular case of tunables that are namespace
+dependent is presented.
+
+1.7.1) Declaring a tunable:
+
+The tunable structure for such tunables should be declared in the namespace
+structure that contains the associated tunable (ex: the tunable structure for
+msg_ctlmni should be declared in the ipc_namespace structure).
+
+The tunable structure should be declared as follows:
+
+DECLARE_TUNABLE(structure_name);
+
+Parameters:
+- structure_name: this is the name of the tunable structure
+
+1.7.2) Initializing the tunable structure
+
+Then the tunable structure should be initialized by calling the following
+routine:
+
+init_tunable_ipcns(namespace_ptr, structure_name, threshold, min, max,
+		tunable_variable_ptr, checked_variable_ptr,
+		tunable_variable_type);
+
+Parameters:
+- namespace_ptr: pointer to the namespace the tunable belongs to.
+
+See DEFINE_TUNABLE for the other parameters
+
+1.7.3) Registering the tunable structure
+
+register_tunable should be called, giving it the tunable structure address
+that belongs to the init namespace.
+
+This applies to activate_auto_tuning too.
+
+All the routines that show/store attributes or that do the auto tuning are
+namespace dependent.
+
+
+2) User part:
+
+As seen above, the only way to activate automatic tuning is from user side:
+- the directory /sys/tunables is created during the init phase.
+- each time a tunable is registered by a kernel subsystem, a directory is
+created for it under /sys/tunables.
+- This directory contains 1 file for each tunable kobject attribute:
++-----------+---------------+-------------------+----------------------------+
+| attribute | default value | how to set it     | effect                     |
++-----------+---------------+-------------------+----------------------------+
+| autotune  | 0             | echo 1 > autotune | makes the tunable automatic|
+|           |               | echo 0 > autotune | makes the tunable manual   |
++-----------+---------------+-------------------+----------------------------+
+| max       | max value set | echo <M> > max    | sets the tunable max value |
+|           | during tunable|                   | to <M>                     |
+|           | definition    |                   |                            |
++-----------+---------------+-------------------+----------------------------+
+| min       | min value set | echo <m> > min    | sets the tunable min value |
+|           | during tunable|                   | to <m>                     |
+|           | definition    |                   |                            |
++-----------+---------------+-------------------+----------------------------+
+
Index: linux-2.6.20-rc4/fs/Kconfig
===================================================================
--- linux-2.6.20-rc4.orig/fs/Kconfig	2007-01-15 13:08:14.000000000 +0100
+++ linux-2.6.20-rc4/fs/Kconfig	2007-01-15 14:20:20.000000000 +0100
@@ -925,6 +925,8 @@ config PROC_KCORE
 	bool "/proc/kcore support" if !ARM
 	depends on PROC_FS && MMU
 
+source "kernel/autotune/Kconfig"
+
 config PROC_VMCORE
         bool "/proc/vmcore support (EXPERIMENTAL)"
         depends on PROC_FS && EXPERIMENTAL && CRASH_DUMP
Index: linux-2.6.20-rc4/include/linux/akt.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/akt.h	2007-01-15 14:26:24.000000000 +0100
@@ -0,0 +1,186 @@
+/*
+ * linux/include/akt.h
+ *
+ * Automatic Kernel Tunables support for Linux.
+ * This file contains structures definitions and prototypes needed for AKT
+ * support.
+ *
+ * Copyright (C) 2006 Bull S.A.S
+ *
+ * Author: Nadia Derbey <Nadia.Derbey@bull.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef AKT_H
+#define AKT_H
+
+#include <linux/types.h>
+#include <linux/kobject.h>
+
+
+
+/*
+ * First parameter passed to the adjustment routine
+ */
+#define AKT_UP   0   /* adjustment "up" */
+#define AKT_DOWN 1   /* adjustment "down" */
+
+
+struct auto_tune;
+/*
+ * Automatic adjustment routine.
+ * Returns 0, if the tunable value has not been changed, 1 else
+ */
+typedef int (*auto_tune_fn)(int, struct auto_tune *);
+
+
+/*
+ * Structure used to describe the min / max values for a tunable inside the
+ * auto_tune structure.
+ * These values are type dependent and are used as high / low boundaries when
+ * tuning up or down.
+ * The type is known when the tunable is defined (see DEFINE_TUNABLE macro).
+ */
+struct typed_value {
+	union {
+		short  val_short;
+		ushort val_ushort;
+		int    val_int;
+		uint   val_uint;
+		size_t val_size_t;
+		long   val_long;
+		ulong  val_ulong;
+	} value;
+};
+
+
+
+/*
+ * This is the structure that describes a tunable. One of these structures is
+ * allocated for each registered tunable, and the associated kobject exported
+ * via sysfs.
+ *
+ * The structure lock (tunable_lck) protects
+ * against concurrent accesses to tunable and checked pointers
+ *
+ * A pointer to this structure is passed in to  the automatic adjustment
+ * routine.
+ * automatic adjustment principle is the following:
+ *    AKT_UP:
+ *       1. *checked is compared to *tunable * threshold
+ *       2. if *checked is greater, the tunable is adjusted up
+ *    AKT_DOWN: reverse operation
+ */
+struct auto_tune {
+	spinlock_t tunable_lck; /* serializes access to the stucture fields */
+	auto_tune_fn auto_tune; /* auto tuning routine registered by the */
+				/* calling kernel susbsystem. If NULL, the */
+				/* auto tuning routine that will be called */
+				/* is the default one that processes uints */
+	int (*check_parms)(struct auto_tune *);	/* min / max checking */
+						/* routine ptr: points to */
+						/* the appropriate routine */
+						/* depending on the */
+						/* tunable type */
+	const char *name;
+	char flags;	/* Only 2 bits are meaningful: */
+			/* bit 0: set to 1 if the associated tunable can */
+			/*        be automatically adjusted */
+			/* bits 1: set to 1 if the tunable has been */
+			/*         registered */
+			/* bits 2-7: useless */
+	char threshold;	/* threshold to enable the adjustment expressed as */
+			/* a %age */
+	struct typed_value min;	/* min value the tunable can ever reach */
+				/* and associated show / store routines) */
+	struct typed_value max;	/* max value the tunable can ever reach */
+				/* and associated show / store routines) */
+	void *tunable;	/* address of the tunable to adjust */
+	void *checked;	/* address of the variable that is controlled by */
+			/* the tunable. This is the calling subsystem's */
+			/* object counter */
+};
+
+
+/*
+ * Flags for a registered tunable
+ */
+#define TUNABLE_REGISTERED  0x02
+
+
+/*
+ * When calling this routine the tunable lock should be held
+ */
+static inline int is_tunable_registered(struct auto_tune *tunable)
+{
+	return (tunable->flags & TUNABLE_REGISTERED) == TUNABLE_REGISTERED;
+}
+
+
+#ifdef CONFIG_AKT
+
+
+
+#define TUNABLE_INIT(_name, _thresh, _min, _max, _tun, _chk, type)	\
+	{								\
+		.tunable_lck	= SPIN_LOCK_UNLOCKED,			\
+		.auto_tune	= default_auto_tuning_##type,		\
+		.check_parms	= check_parms_##type,			\
+		.name		= (_name),				\
+		.flags		= 0,					\
+		.threshold	= (_thresh),				\
+		.min	= {						\
+			.value		= { .val_##type = (_min), },	\
+		},							\
+		.max	= {						\
+			.value		= { .val_##type = (_max), },	\
+		},							\
+		.tunable	= (_tun),				\
+		.checked	= (_chk),				\
+	}
+
+
+#define DEFINE_TUNABLE(s, thr, min, max, tun, chk, type)		\
+	struct auto_tune s = TUNABLE_INIT(#s, thr, min, max, tun, chk, type)
+
+#define set_tunable_min_max(s, _min, _max, type)	\
+	do {						\
+		(s).min.value.val_##type = _min;	\
+		(s).max.value.val_##type = _max;	\
+	} while (0)
+
+
+
+extern int register_tunable(struct auto_tune *);
+extern int unregister_tunable(struct auto_tune *);
+
+
+#else	/* CONFIG_AKT */
+
+
+#define DEFINE_TUNABLE(s, thresh, min, max, tun, chk, type)
+#define set_tunable_min_max(s, min, max, type)   do { } while (0)
+
+
+#define register_tunable(a)                 0
+#define unregister_tunable(a)               0
+
+
+#endif	/* CONFIG_AKT */
+
+extern void fork_late_init(void);
+
+#endif /* AKT_H */
Index: linux-2.6.20-rc4/include/linux/akt_ops.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/include/linux/akt_ops.h	2007-01-15 14:28:16.000000000 +0100
@@ -0,0 +1,186 @@
+/*
+ * linux/include/akt_ops.h
+ *
+ * Automatic Kernel Tunables support for Linux.
+ * This file contains the definitions for the type dependent routines
+ * needed for AKT support.
+ *
+ * Copyright (C) 2006 Bull S.A.S
+ *
+ * Author: Nadia Derbey <Nadia.Derbey@bull.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#ifndef AKT_OPS_H
+#define AKT_OPS_H
+
+#include <linux/errno.h>
+
+
+/*
+ * Checks that min and max values are coherent
+ * Called by register_tunable()
+ * Type independent - can be one of short / ushort / int / uint / long /
+ * ulong / size_t
+ */
+#define __check_parms(p, type)						\
+( {									\
+	int __rc;							\
+	type _min = p->min.value.val_##type;				\
+	type _max = p->max.value.val_##type;				\
+									\
+	if (_min > _max)						\
+		__rc = 1;						\
+	else								\
+		__rc = 0;						\
+	__rc;								\
+} )
+
+static inline int check_parms_short(struct auto_tune *p)
+{
+	return __check_parms(p, short);
+}
+
+static inline int check_parms_ushort(struct auto_tune *p)
+{
+	return __check_parms(p, ushort);
+}
+
+static inline int check_parms_int(struct auto_tune *p)
+{
+	return __check_parms(p, int);
+}
+
+static inline int check_parms_uint(struct auto_tune *p)
+{
+	return __check_parms(p, uint);
+}
+
+static inline int check_parms_size_t(struct auto_tune *p)
+{
+	return __check_parms(p, size_t);
+}
+
+static inline int check_parms_long(struct auto_tune *p)
+{
+	return __check_parms(p, long);
+}
+
+static inline int check_parms_ulong(struct auto_tune *p)
+{
+	return __check_parms(p, ulong);
+}
+
+
+/*
+ * FUNCTION:    This is the routine called to accomplish auto tuning if none
+ *              has been specified for a tunable.
+ *              It can be called by any kernel subsystem that is allocating or
+ *              freeing an object whose maximum value is controlled by a
+ *              tunable.
+ *              ex: max # of semaphore ids is controlled by sc_semmni
+ *              ==> this routine might be called by sys_semget() to "adjust up"
+ *                  and by semctl_down() to "adjust down"
+ *
+ *              Upwards adjustment:
+ *                  Adjustment is needed if the checked variable has reached
+ *                  (threshold / 100 * tunable)
+ *                  In that case, tunable is set to
+ *                  (tunable + tunable * (100 - threshold) / 100)
+ *
+ *              Downards adjustment:
+ *                   Adjustment is needed if the checked variable has fallen
+ *                   under (threshold / 100 * tunable previous value)
+ *                   In that case tunable is set back to its previous value,
+ *                   i.e. to (tunable * 100 / (200 - threshold))
+ *
+ * PARAMETERS:  direction: controls the adjustment direction (up / down)
+ *              p: pointer to the registered tunable structure
+ *
+ * EXECUTION ENVIRONMENT: This routine should be called with the
+ *                        p->tunable_lck lock held
+ *
+ * Type independent - can be one of short / ushort / int / uint / long /
+ * ulong / size_t
+ *
+ * RETURN VALUE: 1 if tunable has been adjusted
+ *               0 else
+ */
+#define __default_auto_tuning(direction, p, type)			\
+( {									\
+	int __rc;							\
+	type _chk = *((type *) p->checked);				\
+	type _tun = *((type *) p->tunable);				\
+	type _thr = (type) p->threshold;				\
+	type _min = (type) p->min.value.val_##type;			\
+	type _max = (type) p->max.value.val_##type;			\
+									\
+	if (direction == AKT_UP) {					\
+		if ((_chk >= (_tun * _thr) / 100) && (_tun < _max)) {	\
+			type ___x = (_tun * (200 - _thr)) / 100;	\
+			*((type *) p->tunable) = min(_max, ___x);	\
+			__rc = 1;					\
+		} else							\
+			__rc = 0;					\
+	} else {							\
+		if ((_chk < (_tun * _thr) / (200 - _thr)) && (_tun>_min)) { \
+			type ___x = (_tun * 100) / (200 - _thr);	\
+			*((type *) p->tunable) = max(_min, ___x);	\
+			__rc = 1;					\
+		} else							\
+			__rc = 0;					\
+	}								\
+	__rc;								\
+} )
+
+static inline int default_auto_tuning_short(int dir, struct auto_tune *p)
+{
+	return __default_auto_tuning(dir, p, short);
+}
+
+static inline int default_auto_tuning_ushort(int dir, struct auto_tune *p)
+{
+	return __default_auto_tuning(dir, p, ushort);
+}
+
+static inline int default_auto_tuning_int(int dir, struct auto_tune *p)
+{
+	return __default_auto_tuning(dir, p, int);
+}
+
+static inline int default_auto_tuning_uint(int dir, struct auto_tune *p)
+{
+	return __default_auto_tuning(dir, p, uint);
+}
+
+static inline int default_auto_tuning_size_t(int dir, struct auto_tune *p)
+{
+	return __default_auto_tuning(dir, p, size_t);
+}
+
+static inline int default_auto_tuning_long(int dir, struct auto_tune *p)
+{
+	return __default_auto_tuning(dir, p, long);
+}
+
+static inline int default_auto_tuning_ulong(int dir, struct auto_tune *p)
+{
+	return __default_auto_tuning(dir, p, ulong);
+}
+
+
+
+#endif /* AKT_OPS_H */
Index: linux-2.6.20-rc4/init/main.c
===================================================================
--- linux-2.6.20-rc4.orig/init/main.c	2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/init/main.c	2007-01-15 14:29:17.000000000 +0100
@@ -54,6 +54,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/compile.h>
 #include <linux/device.h>
+#include <linux/akt.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -613,6 +614,7 @@ asmlinkage void __init start_kernel(void
 	signals_init();
 	/* rootfs populating might need page-writeback */
 	page_writeback_init();
+	fork_late_init();
 #ifdef CONFIG_PROC_FS
 	proc_root_init();
 #endif
Index: linux-2.6.20-rc4/kernel/Makefile
===================================================================
--- linux-2.6.20-rc4.orig/kernel/Makefile	2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/kernel/Makefile	2007-01-15 14:30:43.000000000 +0100
@@ -50,6 +50,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_UTS_NS) += utsname.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_AKT) += autotune/
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6.20-rc4/kernel/autotune/Kconfig
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/kernel/autotune/Kconfig	2007-01-15 14:31:25.000000000 +0100
@@ -0,0 +1,30 @@
+#
+# Automatic Kernel Tunables
+#
+
+menu "Automatic Kernel Tunables"
+
+config AKT
+	bool "Automatic kernel tunable (kernel support)"
+	depends on PROC_FS && SYSFS
+	help
+	  This is a functionality that enables automatic adjustment of kernel
+	  tunables: when this feature is enabled the kernel can automatically
+	  change the tunables values as it sees resources running out.
+
+	  The list of kernel tunables that can potentially be automatically
+	  adjusted can found under /sys/tunables.
+
+	  In order to make a tunable actually automatic, issue the following
+	  command:
+	  echo 1 > /sys/tunables/<tunable_name>/autotune
+
+	  In order to make it manual, issue the following command:
+	  echo 0 > /sys/tunables/<tunable_name>/autotune
+
+	  See Documentation/auto_tune.txt for more details.
+
+	  If unsure, say N.
+
+endmenu
+
Index: linux-2.6.20-rc4/kernel/autotune/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/kernel/autotune/Makefile	2007-01-15 14:31:57.000000000 +0100
@@ -0,0 +1,7 @@
+#
+# Makefile for akt
+#
+
+obj-y := akt.o
+
+
Index: linux-2.6.20-rc4/kernel/autotune/akt.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/kernel/autotune/akt.c	2007-01-15 14:51:54.000000000 +0100
@@ -0,0 +1,123 @@
+/*
+ * linux/kernel/autotune/akt.c
+ *
+ * Automatic Kernel Tunables for Linux - Kernel support
+ *
+ * Copyright (C) 2006 Bull S.A.S
+ *
+ * Author: Nadia Derbey <Nadia.Derbey@bull.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+/*
+ *   FUNCTIONS:
+ *              register_tunable           (exported)
+ *              unregister_tunable         (exported)
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/akt.h>
+
+
+
+
+
+
+
+/*
+ * FUNCTION:    Inserts a tunable structure into sysfs
+ *              This routine serves also as a checker for the tunable
+ *              structure fields.
+ *              This routine is called by any kernel subsystem that wants to
+ *              use akt services (automatic tunables adjustment) in the future
+ *
+ * NOTE: when calling this routine, the tunable structure should have already
+ *       been filled by defining it with DEFINE_TUNABLE()
+ *
+ * RETURN VALUE: 0: successful
+ *               <0 if failure
+ */
+int register_tunable(struct auto_tune *tun)
+{
+	if (tun == NULL) {
+		printk(KERN_ERR "\tBad tunable structure pointer (NULL)\n");
+		return -EINVAL;
+	}
+
+	if (tun->threshold <= 0 || tun->threshold >= 100) {
+		printk(KERN_ERR "\tBad threshold (%d) value "
+			"- should be in the [1-99] interval\n",
+			tun->threshold);
+		return -EINVAL;
+	}
+
+	if (tun->tunable == NULL) {
+		printk(KERN_ERR "\tBad tunable pointer (NULL)\n");
+		return -EINVAL;
+	}
+
+	if (tun->checked == NULL) {
+		printk(KERN_ERR "\tBad checked value pointer (NULL)\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Check the min / max value
+	 */
+	if (tun->check_parms(tun)) {
+		printk(KERN_ERR "\tBad min / max values\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+
+/*
+ * FUNCTION:    Removes a tunable structure from sysfs.
+ *              This routine is called by any kernel subsystem that doesn't
+ *              need the akt services anymore
+ *
+ * NOTE:  reg_tun should point to a previously registered tunable
+ *
+ * RETURN VALUE: 0: successful
+ *               <0 if failure
+ */
+int unregister_tunable(struct auto_tune *reg_tun)
+{
+	if (reg_tun == NULL) {
+		printk(KERN_ERR "\tBad tunable address (NULL)\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&reg_tun->tunable_lck);
+
+	BUG_ON(!is_tunable_registered(reg_tun));
+
+	reg_tun->flags = 0;
+
+	spin_unlock(&reg_tun->tunable_lck);
+
+	return 0;
+}
+
+
+
+
+EXPORT_SYMBOL_GPL(register_tunable);
+EXPORT_SYMBOL_GPL(unregister_tunable);
Index: linux-2.6.20-rc4/kernel/fork.c
===================================================================
--- linux-2.6.20-rc4.orig/kernel/fork.c	2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/kernel/fork.c	2007-01-15 14:36:48.000000000 +0100
@@ -49,6 +49,8 @@
 #include <linux/delayacct.h>
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
+#include <linux/akt.h>
+#include <linux/akt_ops.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -65,6 +67,13 @@ int nr_threads; 		/* The idle threads do
 
 int max_threads;		/* tunable limit on nr_threads */
 
+#define THREADTHRESH 80
+/*
+ * The actual values for min and max will be known during fork_init
+ */
+DEFINE_TUNABLE(max_threads_akt, THREADTHRESH, 0, 0, &max_threads,
+		&nr_threads, int);
+
 DEFINE_PER_CPU(unsigned long, process_counts) = 0;
 
 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
@@ -152,12 +161,21 @@ void __init fork_init(unsigned long memp
 	if(max_threads < 20)
 		max_threads = 20;
 
+	set_tunable_min_max(max_threads_akt, max_threads, mempages / 2, int);
+
 	init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
 	init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
 	init_task.signal->rlim[RLIMIT_SIGPENDING] =
 		init_task.signal->rlim[RLIMIT_NPROC];
 }
 
+void __init fork_late_init(void)
+{
+	if (register_tunable(&max_threads_akt))
+		printk(KERN_WARNING
+			"Failed registering tunable max_threads\n");
+}
+
 static struct task_struct *dup_task_struct(struct task_struct *orig)
 {
 	struct task_struct *tsk;

--

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 2/6] auto_tuning activation
  2007-01-16  6:15 [RFC][PATCH 0/6] Automatice kernel tunables (AKT) Nadia.Derbey
  2007-01-16  6:15 ` [RFC][PATCH 1/6] Tunable structure and registration routines Nadia.Derbey
@ 2007-01-16  6:15 ` Nadia.Derbey
  2007-01-16  6:15 ` [RFC][PATCH 3/6] tunables associated kobjects Nadia.Derbey
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Nadia.Derbey @ 2007-01-16  6:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nadia Derbey

[-- Attachment #1: auto_tuning_activation.patch --]
[-- Type: text/plain, Size: 4225 bytes --]

[PATCH 02/06]

Introduces the auto-tuning activation routine

The auto-tuning routine is called by the fork kernel component


Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>


---
 include/linux/akt.h |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/exit.c       |   11 +++++++++++
 kernel/fork.c       |    2 ++
 3 files changed, 63 insertions(+)

Index: linux-2.6.20-rc4/include/linux/akt.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/akt.h	2007-01-15 14:26:24.000000000 +0100
+++ linux-2.6.20-rc4/include/linux/akt.h	2007-01-15 15:00:31.000000000 +0100
@@ -118,12 +118,22 @@ struct auto_tune {
 /*
  * Flags for a registered tunable
  */
+#define AUTO_TUNE_ENABLE  0x01
 #define TUNABLE_REGISTERED  0x02
 
 
 /*
  * When calling this routine the tunable lock should be held
  */
+static inline int is_auto_tune_enabled(struct auto_tune *tunable)
+{
+	return (tunable->flags & AUTO_TUNE_ENABLE) == AUTO_TUNE_ENABLE;
+}
+
+
+/*
+ * When calling this routine the tunable lock should be held
+ */
 static inline int is_tunable_registered(struct auto_tune *tunable)
 {
 	return (tunable->flags & TUNABLE_REGISTERED) == TUNABLE_REGISTERED;
@@ -163,6 +173,44 @@ static inline int is_tunable_registered(
 	} while (0)
 
 
+static inline void set_autotuning_routine(struct auto_tune *tunable,
+					auto_tune_fn fn)
+{
+	if (fn != NULL)
+		tunable->auto_tune = fn;
+}
+
+
+/*
+ * direction may be one of:
+ *    AKT_UP: adjust up (i.e. increase tunable value when needed)
+ *    AKT_DOWN: adjust down (i.e. decrease tunable value when needed)
+ */
+static inline int activate_auto_tuning(int direction,
+					struct auto_tune *tunable)
+{
+	int ret = 0;
+
+	BUG_ON(direction != AKT_UP && direction != AKT_DOWN);
+
+	if (tunable == NULL)
+		return 0;
+
+	spin_lock(&tunable->tunable_lck);
+
+	if (!is_auto_tune_enabled(tunable) ||
+					!is_tunable_registered(tunable)) {
+		spin_unlock(&tunable->tunable_lck);
+		return 0;
+	}
+
+	ret = tunable->auto_tune(direction, tunable);
+
+	spin_unlock(&tunable->tunable_lck);
+	return ret;
+}
+
+
 
 extern int register_tunable(struct auto_tune *);
 extern int unregister_tunable(struct auto_tune *);
@@ -173,7 +221,9 @@ extern int unregister_tunable(struct aut
 
 #define DEFINE_TUNABLE(s, thresh, min, max, tun, chk, type)
 #define set_tunable_min_max(s, min, max, type)   do { } while (0)
+#define set_autotuning_routine(s, fn)            do { } while (0)
 
+#define activate_auto_tuning(direction, tunable) ( { 0; } )
 
 #define register_tunable(a)                 0
 #define unregister_tunable(a)               0
Index: linux-2.6.20-rc4/kernel/fork.c
===================================================================
--- linux-2.6.20-rc4.orig/kernel/fork.c	2007-01-15 14:36:48.000000000 +0100
+++ linux-2.6.20-rc4/kernel/fork.c	2007-01-15 14:57:28.000000000 +0100
@@ -995,6 +995,8 @@ static struct task_struct *copy_process(
 	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
 		return ERR_PTR(-EINVAL);
 
+	activate_auto_tuning(AKT_UP, &max_threads_akt);
+
 	retval = security_task_create(clone_flags);
 	if (retval)
 		goto fork_out;
Index: linux-2.6.20-rc4/kernel/exit.c
===================================================================
--- linux-2.6.20-rc4.orig/kernel/exit.c	2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/kernel/exit.c	2007-01-15 14:58:23.000000000 +0100
@@ -42,12 +42,15 @@
 #include <linux/audit.h> /* for audit_free() */
 #include <linux/resource.h>
 #include <linux/blkdev.h>
+#include <linux/akt.h>
 
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include <asm/pgtable.h>
 #include <asm/mmu_context.h>
 
+extern struct auto_tune max_threads_akt;
+
 extern void sem_exit (void);
 
 static void exit_mm(struct task_struct * tsk);
@@ -172,6 +175,14 @@ repeat:
 
 	sched_exit(p);
 	write_unlock_irq(&tasklist_lock);
+
+	/*
+	 * nr_threads has been decremented in __unhash_process: adjust
+	 * max_threads down if needed
+	 * We do it here to avoid calling activate_auto_tuning under lock
+	 */
+	activate_auto_tuning(AKT_DOWN, &max_threads_akt);
+
 	proc_flush_task(p);
 	release_thread(p);
 	call_rcu(&p->rcu, delayed_put_task_struct);

--

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 3/6] tunables associated kobjects
  2007-01-16  6:15 [RFC][PATCH 0/6] Automatice kernel tunables (AKT) Nadia.Derbey
  2007-01-16  6:15 ` [RFC][PATCH 1/6] Tunable structure and registration routines Nadia.Derbey
  2007-01-16  6:15 ` [RFC][PATCH 2/6] auto_tuning activation Nadia.Derbey
@ 2007-01-16  6:15 ` Nadia.Derbey
  2007-01-16  6:15 ` [RFC][PATCH 4/6] min and max kobjects Nadia.Derbey
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Nadia.Derbey @ 2007-01-16  6:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nadia Derbey

[-- Attachment #1: auto_tuning_kobjects.patch --]
[-- Type: text/plain, Size: 13186 bytes --]

[PATCH 03/06]


Introduces the kobjects associated to each tunable and the sysfs registration


Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>


---
 include/linux/akt.h         |   25 ++++-
 init/main.c                 |    1 
 kernel/autotune/Makefile    |    2 
 kernel/autotune/akt.c       |   86 +++++++++++++++++
 kernel/autotune/akt_sysfs.c |  214 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 324 insertions(+), 4 deletions(-)

Index: linux-2.6.20-rc4/include/linux/akt.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/akt.h	2007-01-15 15:00:31.000000000 +0100
+++ linux-2.6.20-rc4/include/linux/akt.h	2007-01-15 15:08:41.000000000 +0100
@@ -48,6 +48,16 @@ typedef int (*auto_tune_fn)(int, struct 
 
 
 /*
+ * for sysfs support
+ */
+struct tunable_kobject {
+	struct kobject kobj;
+	struct auto_tune *tun;
+};
+
+
+
+/*
  * Structure used to describe the min / max values for a tunable inside the
  * auto_tune structure.
  * These values are type dependent and are used as high / low boundaries when
@@ -73,7 +83,12 @@ struct typed_value {
  * allocated for each registered tunable, and the associated kobject exported
  * via sysfs.
  *
- * The structure lock (tunable_lck) protects
+ * This structure may be accessed in 2 ways:
+ *   . directly from inside the kernel susbsystem that uses it (during tunable
+ *     automatic adjustment)
+ *   . from sysfs, while updating the kobject attributes
+ *
+ * In both cases, the structure lock (tunable_lck) is taken: it protects
  * against concurrent accesses to tunable and checked pointers
  *
  * A pointer to this structure is passed in to  the automatic adjustment
@@ -108,6 +123,7 @@ struct auto_tune {
 				/* and associated show / store routines) */
 	struct typed_value max;	/* max value the tunable can ever reach */
 				/* and associated show / store routines) */
+	struct tunable_kobject    tun_kobj;	/* used for sysfs support */
 	void *tunable;	/* address of the tunable to adjust */
 	void *checked;	/* address of the variable that is controlled by */
 			/* the tunable. This is the calling subsystem's */
@@ -158,6 +174,7 @@ static inline int is_tunable_registered(
 		.max	= {						\
 			.value		= { .val_##type = (_max), },	\
 		},							\
+		.tun_kobj	= { .tun = NULL, },			\
 		.tunable	= (_tun),				\
 		.checked	= (_chk),				\
 	}
@@ -211,9 +228,12 @@ static inline int activate_auto_tuning(i
 }
 
 
-
+extern void init_auto_tuning(void);
 extern int register_tunable(struct auto_tune *);
 extern int unregister_tunable(struct auto_tune *);
+extern int tunable_sysfs_setup(struct auto_tune *);
+extern ssize_t show_tuning_mode(struct auto_tune *, char *);
+extern ssize_t store_tuning_mode(struct auto_tune *, const char *, size_t);
 
 
 #else	/* CONFIG_AKT */
@@ -228,6 +248,7 @@ extern int unregister_tunable(struct aut
 #define register_tunable(a)                 0
 #define unregister_tunable(a)               0
 
+static inline void init_auto_tuning(void)   { }
 
 #endif	/* CONFIG_AKT */
 
Index: linux-2.6.20-rc4/init/main.c
===================================================================
--- linux-2.6.20-rc4.orig/init/main.c	2007-01-15 14:29:17.000000000 +0100
+++ linux-2.6.20-rc4/init/main.c	2007-01-15 15:09:27.000000000 +0100
@@ -614,6 +614,7 @@ asmlinkage void __init start_kernel(void
 	signals_init();
 	/* rootfs populating might need page-writeback */
 	page_writeback_init();
+	init_auto_tuning();
 	fork_late_init();
 #ifdef CONFIG_PROC_FS
 	proc_root_init();
Index: linux-2.6.20-rc4/kernel/autotune/Makefile
===================================================================
--- linux-2.6.20-rc4.orig/kernel/autotune/Makefile	2007-01-15 14:31:57.000000000 +0100
+++ linux-2.6.20-rc4/kernel/autotune/Makefile	2007-01-15 15:09:57.000000000 +0100
@@ -2,6 +2,6 @@
 # Makefile for akt
 #
 
-obj-y := akt.o
+obj-y := akt.o akt_sysfs.o
 
 
Index: linux-2.6.20-rc4/kernel/autotune/akt.c
===================================================================
--- linux-2.6.20-rc4.orig/kernel/autotune/akt.c	2007-01-15 14:51:54.000000000 +0100
+++ linux-2.6.20-rc4/kernel/autotune/akt.c	2007-01-15 15:13:31.000000000 +0100
@@ -26,6 +26,8 @@
  *   FUNCTIONS:
  *              register_tunable           (exported)
  *              unregister_tunable         (exported)
+ *              show_tuning_mode           (exported)
+ *              store_tuning_mode          (exported)
  */
 
 #include <linux/init.h>
@@ -36,6 +38,8 @@
 
 
 
+#define AKT_AUTO   1
+#define AKT_MANUAL 0
 
 
 
@@ -54,6 +58,8 @@
  */
 int register_tunable(struct auto_tune *tun)
 {
+	int rc = 0;
+
 	if (tun == NULL) {
 		printk(KERN_ERR "\tBad tunable structure pointer (NULL)\n");
 		return -EINVAL;
@@ -84,7 +90,10 @@ int register_tunable(struct auto_tune *t
 		return -EINVAL;
 	}
 
-	return 0;
+	if (!(rc = tunable_sysfs_setup(tun)))
+		tun->flags |= TUNABLE_REGISTERED;
+
+	return rc;
 }
 
 
@@ -117,6 +126,81 @@ int unregister_tunable(struct auto_tune 
 }
 
 
+/*
+ * FUNCTION:    Get operation called by tunable_attr_show (i.e. when the file
+ *              /sys/tunables/<tunable>/autotune is displayed).
+ *              Outputs "1" if the corresponding tunable is automatically
+ *              adjustable, "0" else
+ *
+ * RETURN VALUE: >0 : output string length (including the '\0')
+ *               <0 : failure
+ */
+ssize_t show_tuning_mode(struct auto_tune *tun_addr, char *buf)
+{
+	int valid;
+
+	if (tun_addr == NULL) {
+		printk(KERN_ERR
+			" show_tuning_mode(): tunable address is invalid\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&tun_addr->tunable_lck);
+
+	valid = is_auto_tune_enabled(tun_addr);
+
+	spin_unlock(&tun_addr->tunable_lck);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", valid);
+}
+
+
+/*
+ * NAME:        store_tuning_mode
+ *
+ * FUNCTION:    Set operation called by tunable_attr_store (i.e. when a
+ *              string is stored into /sys/tunables/<tunable>/autotune).
+ *              "1" makes the corresponding tunable automatically adjustable
+ *              "0" makes the corresponding tunable manually adjustable
+ *
+ * PARAMETERS: count: input buffer size (including the '\0')
+ *
+ * RETURN VALUE: >0: number of characters used from the input buffer
+ *               <= 0: failure
+ */
+ssize_t store_tuning_mode(struct auto_tune *tun_addr, const char *buffer,
+			size_t count)
+{
+	int new_value;
+	int rc;
+
+	if ((rc = sscanf(buffer, "%d", &new_value)) != 1)
+		return -EINVAL;
+
+	if (new_value != AKT_AUTO && new_value != AKT_MANUAL)
+		return -EINVAL;
+
+	if (tun_addr == NULL) {
+		printk(KERN_ERR
+			" store_tuning_mode(): NULL pointer  passed in\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&tun_addr->tunable_lck);
+
+	switch (new_value) {
+	case AKT_AUTO:
+		tun_addr->flags |= AUTO_TUNE_ENABLE;
+		break;
+	case AKT_MANUAL:
+		tun_addr->flags &= ~AUTO_TUNE_ENABLE;
+		break;
+	}
+
+	spin_unlock(&tun_addr->tunable_lck);
+
+	return strnlen(buffer, PAGE_SIZE);
+}
 
 
 EXPORT_SYMBOL_GPL(register_tunable);
Index: linux-2.6.20-rc4/kernel/autotune/akt_sysfs.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.20-rc4/kernel/autotune/akt_sysfs.c	2007-01-15 15:14:55.000000000 +0100
@@ -0,0 +1,214 @@
+/*
+ * linux/kernel/autotune/akt_sysfs.c
+ *
+ * Automatic Kernel Tunables for Linux
+ * sysfs bindings for AKT
+ *
+ * Copyright (C) 2006 Bull S.A.S
+ *
+ * Author: Nadia Derbey <Nadia.Derbey@bull.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+/*
+ * FUNCTIONS:
+ *            tunable_attr_show      (static)
+ *            tunable_attr_store     (static)
+ *            tunable_sysfs_setup
+ *            add_tunable_attrs      (static)
+ *            init_auto_tuning
+ */
+
+
+#include <linux/init.h>
+#include <linux/stat.h>
+#include <linux/module.h>
+#include <linux/akt.h>
+
+
+
+
+struct tunable_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct auto_tune *, char *);
+	ssize_t (*store)(struct auto_tune *, const char *, size_t);
+};
+
+#define TUNABLE_ATTR(_name, _mode, _show, _store)	\
+struct tunable_attribute tun_attr_##_name = __ATTR(_name, _mode, _show, _store)
+
+
+static TUNABLE_ATTR(autotune, S_IWUSR | S_IRUGO, show_tuning_mode,
+		store_tuning_mode);
+
+static struct tunable_attribute *tunable_sysfs_attrs[] = {
+	&tun_attr_autotune,	/* to (de)activate auto tuning */
+	NULL,
+};
+
+
+
+#define to_tunable_kobj(obj)  container_of(obj, struct tunable_kobject, kobj)
+#define to_tunable(obj)       container_of(obj, struct auto_tune, tun_kobj)
+#define to_tunable_attr(_attr)	\
+	container_of(_attr, struct tunable_attribute, attr)
+
+
+static int add_tunable_attrs(struct auto_tune *);
+
+
+/*
+ * FUNCTION:    show method for the tunables subsystem
+ *              Forwards any read call to the show method of the attribute
+ *              owner
+ *
+ * PARAMETERS:  attr: can be one of
+ *                 . tun_attr_autotune
+ *                 . tun_attr_min
+ *                 . tun_attr_max
+ *
+ * RETURN VALUE: number of bytes printed into the buffer
+ *               <0 if failure
+ */
+static ssize_t tunable_attr_show(struct kobject *kobj,
+				struct attribute *attr,
+				char *buf)
+{
+	struct tunable_attribute *tun_attr = to_tunable_attr(attr);
+	struct tunable_kobject *tkobj = to_tunable_kobj(kobj);
+	struct auto_tune *tunable = to_tunable(tkobj);
+	ssize_t count = -EIO;
+
+	if (tun_attr->show)
+		count = tun_attr->show(tunable, buf);
+	return count;
+}
+
+
+/*
+ * FUNCTION:    store method for the tunables subsystem
+ *              Forwards any write call to the store method of the attribute
+ *              owner
+ *
+ * PARAMETERS: attr: can be one of
+ *                 . tun_attr_autotune
+ *                 . tun_attr_min
+ *                 . tun_attr_max
+ *
+ * RETURN VALUE: number of bytes used from the buffer
+ *               <0 if failure
+ */
+static ssize_t tunable_attr_store(struct kobject *kobj,
+				struct attribute *attr,
+				const char *buf,
+				size_t count)
+{
+	struct tunable_attribute *tun_attr = to_tunable_attr(attr);
+	struct tunable_kobject *tkobj = to_tunable_kobj(kobj);
+	struct auto_tune *tunable = to_tunable(tkobj);
+	ssize_t ret = -EIO;
+
+	if (tun_attr->store)
+		ret = tun_attr->store(tunable, buf, count);
+	return ret;
+}
+
+
+static struct sysfs_ops tunables_sysfs_ops = {
+	.show	= tunable_attr_show,
+	.store	= tunable_attr_store,
+};
+
+
+static struct kobj_type tunables_ktype = {
+	.sysfs_ops	= &tunables_sysfs_ops,
+};
+
+
+decl_subsys(tunables, &tunables_ktype, NULL);
+
+
+/*
+ * FUNCTION:    Registers one tunable into sysfs
+ *              (called by register_tunable())
+ *              The tunable is a kobject with 3 attributes:
+ *                 min (rw): enables to play with the min tunable value
+ *                 max (rw): enables to play with the max tunable value
+ *                 autotune (rw): enables to (de)activate the auto tuning for
+ *                                that tunable
+ *
+ * RETURN VALUE: 0 if tunable has been successfully registered
+ *               <0 else
+ */
+
+#define tunable_kobj(t) t->tun_kobj.kobj
+
+int tunable_sysfs_setup(struct auto_tune *tunable)
+{
+	int err = 0;
+
+	memset(&(tunable_kobj(tunable)), 0, sizeof(tunable_kobj(tunable)));
+	if ((err = kobject_set_name(&(tunable_kobj(tunable)), "%s",
+							tunable->name)))
+		return err;
+
+	kobj_set_kset_s(&(tunable->tun_kobj), tunables_subsys);
+	tunable->tun_kobj.tun = tunable;
+
+	if ((err = kobject_register(&(tunable_kobj(tunable)))))
+		return err;
+
+	if ((err = add_tunable_attrs(tunable)))
+		kobject_unregister(&(tunable_kobj(tunable)));
+
+	return err;
+}
+
+
+/*
+ * FUNCTION:    adds the set of predefined attributes for a tunable being
+ *              registered (called by tunable_sysfs_setup())
+ *
+ * RETURN VALUE: 0 if akt has been successfully registered
+ *               <0 else
+ */
+static int add_tunable_attrs(struct auto_tune *tunable)
+{
+	struct tunable_attribute *attr;
+	int error = 0;
+	int i;
+
+	for (i = 0; (attr = tunable_sysfs_attrs[i]) && !error; i++) {
+		error = sysfs_create_file(&(tunable_kobj(tunable)),
+			&(attr->attr));
+	}
+
+	return error;
+}
+
+
+/*
+ * FUNCTION:    akt subsystem initialization
+ *
+ * RETURN VALUE: 0 always
+ */
+void __init init_auto_tuning(void)
+{
+	int error = subsystem_register(&tunables_subsys);
+
+	if (error)
+		printk(KERN_ERR "Failed registering tunables subsystem\n");
+}

--

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 4/6] min and max kobjects
  2007-01-16  6:15 [RFC][PATCH 0/6] Automatice kernel tunables (AKT) Nadia.Derbey
                   ` (2 preceding siblings ...)
  2007-01-16  6:15 ` [RFC][PATCH 3/6] tunables associated kobjects Nadia.Derbey
@ 2007-01-16  6:15 ` Nadia.Derbey
  2007-01-24 22:41   ` Randy Dunlap
  2007-01-16  6:15 ` [RFC][PATCH 5/6] per namespace tunables Nadia.Derbey
  2007-01-16  6:15 ` [RFC][PATCH 6/6] automatic tuning applied to some kernel components Nadia.Derbey
  5 siblings, 1 reply; 25+ messages in thread
From: Nadia.Derbey @ 2007-01-16  6:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nadia Derbey

[-- Attachment #1: tunable_min_max_kobjects.patch --]
[-- Type: text/plain, Size: 15777 bytes --]

[PATCH 04/06]


Introduces the kobjects associated to each tunable min and max value


Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>


---
 include/linux/akt.h         |   30 ++++
 include/linux/akt_ops.h     |  311 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/autotune/akt.c       |  120 ++++++++++++++++
 kernel/autotune/akt_sysfs.c |    8 +
 4 files changed, 469 insertions(+)

Index: linux-2.6.20-rc4/include/linux/akt.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/akt.h	2007-01-15 15:08:41.000000000 +0100
+++ linux-2.6.20-rc4/include/linux/akt.h	2007-01-15 15:21:47.000000000 +0100
@@ -62,6 +62,13 @@ struct tunable_kobject {
  * auto_tune structure.
  * These values are type dependent and are used as high / low boundaries when
  * tuning up or down.
+ * The show and store routines (thare are type dependent too) are here for
+ * sysfs support (since the min and max can be updated through sysfs).
+ * The abs_value field is used to check that we are not:
+ *   . falling under the very 1st min value when updating the min value
+ *     through sysfs
+ *   . going over the very 1st max value when updating the max value
+ *     through sysfs
  * The type is known when the tunable is defined (see DEFINE_TUNABLE macro).
  */
 struct typed_value {
@@ -74,6 +81,17 @@ struct typed_value {
 		long   val_long;
 		ulong  val_ulong;
 	} value;
+	union {
+		short  val_short;
+		ushort val_ushort;
+		int    val_int;
+		uint   val_uint;
+		size_t val_size_t;
+		long   val_long;
+		ulong  val_ulong;
+	} abs_value;
+	ssize_t (*show)(struct auto_tune *, char *);
+	ssize_t (*store)(struct auto_tune *, const char *, size_t);
 };
 
 
@@ -170,9 +188,15 @@ static inline int is_tunable_registered(
 		.threshold	= (_thresh),				\
 		.min	= {						\
 			.value		= { .val_##type = (_min), },	\
+			.abs_value	= { .val_##type = (_min), },	\
+			.show		= show_tunable_min_##type,	\
+			.store		= store_tunable_min_##type,	\
 		},							\
 		.max	= {						\
 			.value		= { .val_##type = (_max), },	\
+			.abs_value	= { .val_##type = (_max), },	\
+			.show		= show_tunable_max_##type,	\
+			.store		= store_tunable_max_##type,	\
 		},							\
 		.tun_kobj	= { .tun = NULL, },			\
 		.tunable	= (_tun),				\
@@ -186,7 +210,9 @@ static inline int is_tunable_registered(
 #define set_tunable_min_max(s, _min, _max, type)	\
 	do {						\
 		(s).min.value.val_##type = _min;	\
+		(s).min.abs_value.val_##type = _min;	\
 		(s).max.value.val_##type = _max;	\
+		(s).max.abs_value.val_##type = _max;	\
 	} while (0)
 
 
@@ -234,6 +260,10 @@ extern int unregister_tunable(struct aut
 extern int tunable_sysfs_setup(struct auto_tune *);
 extern ssize_t show_tuning_mode(struct auto_tune *, char *);
 extern ssize_t store_tuning_mode(struct auto_tune *, const char *, size_t);
+extern ssize_t show_tunable_min(struct auto_tune *, char *);
+extern ssize_t store_tunable_min(struct auto_tune *, const char *, size_t);
+extern ssize_t show_tunable_max(struct auto_tune *, char *);
+extern ssize_t store_tunable_max(struct auto_tune *, const char *, size_t);
 
 
 #else	/* CONFIG_AKT */
Index: linux-2.6.20-rc4/include/linux/akt_ops.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/akt_ops.h	2007-01-15 14:28:16.000000000 +0100
+++ linux-2.6.20-rc4/include/linux/akt_ops.h	2007-01-15 15:22:53.000000000 +0100
@@ -182,5 +182,316 @@ static inline int default_auto_tuning_ul
 }
 
 
+/*
+ * member can be one of min / max
+ */
+#define __show_tunable_member(member, p, type, buf, format, y)	\
+do {								\
+	type _xx = (type) p->member.value.val_##type;		\
+								\
+	y = snprintf(buf, PAGE_SIZE, format "\n", _xx);		\
+} while (0)
+
+/*
+ * Show routines for the min and max tunables values
+ */
+static inline ssize_t show_tunable_min_short(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(min, p, short, buf, "%d", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_min_ushort(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(min, p, ushort, buf, "%u", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_min_int(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(min, p, int, buf, "%d", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_min_uint(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(min, p, uint, buf, "%u", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_min_size_t(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(min, p, ulong, buf, "%lu", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_min_long(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(min, p, long, buf, "%ld", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_min_ulong(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(min, p, ulong, buf, "%lu", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_max_short(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(max, p, short, buf, "%d", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_max_ushort(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(max, p, ushort, buf, "%u", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_max_int(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(max, p, int, buf, "%d", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_max_uint(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(max, p, uint, buf, "%u", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_max_size_t(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(max, p, ulong, buf, "%lu", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_max_long(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(max, p, long, buf, "%ld", _count);
+	return _count;
+}
+
+static inline ssize_t show_tunable_max_ulong(struct auto_tune *p, char *buf)
+{
+	ssize_t _count;
+	__show_tunable_member(max, p, ulong, buf, "%lu", _count);
+	return _count;
+}
+
+
+/*
+ * when setting the min: we don't accept to fall under the absolute min
+ *                       (the very 1st one that has been set)
+ */
+#define __store_tunable_min(p, type, buf, y)				\
+do {									\
+	long _vv;							\
+	char *_rr;							\
+									\
+	_vv = simple_strtol(buf, &_rr, 0);				\
+	if (_rr == buf)							\
+		y = -EINVAL;						\
+	else {								\
+		if (_vv >= p->min.abs_value.val_##type &&		\
+				_vv < p->max.value.val_##type) {	\
+			p->min.value.val_##type = _vv;			\
+			y = _rr - buf;					\
+		} else							\
+			y = -EINVAL;					\
+	}								\
+} while (0)
+
+#define __store_tunable_umin(p, type, buf, y)				\
+do {									\
+	ulong _vv;							\
+	char *_rr;							\
+									\
+	_vv = simple_strtoul(buf, &_rr, 0);				\
+	if (_rr == buf)							\
+		y = -EINVAL;						\
+	else {								\
+		if (_vv >= p->min.abs_value.val_##type &&		\
+				_vv < p->max.value.val_##type) {	\
+			p->min.value.val_##type = _vv;			\
+			y = _rr - buf;					\
+		} else							\
+			y = -EINVAL;					\
+	}								\
+} while (0)
+
+/*
+ * Store routines for the min tunables values
+ */
+static inline ssize_t store_tunable_min_short(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_min(p, short, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_min_ushort(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_umin(p, ushort, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_min_int(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_min(p, int, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_min_uint(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_umin(p, uint, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_min_size_t(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_umin(p, size_t, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_min_long(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_min(p, long, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_min_ulong(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_umin(p, ulong, buf, _count);
+	return _count;
+}
+
+
+/*
+ * when setting the max: we don't accept to go over the absolute max
+ *                       (the very 1st one that has been set)
+ *
+ */
+#define __store_tunable_max(p, type, buf, y)				\
+do {									\
+	long _vv;							\
+	char *_rr;							\
+									\
+	_vv = simple_strtol(buf, &_rr, 0);				\
+	if (_rr == buf)							\
+		y = -EINVAL;						\
+	else {								\
+		if (_vv <= p->max.abs_value.val_##type &&		\
+				_vv > p->min.value.val_##type) {	\
+			p->max.value.val_##type = _vv;			\
+			y = _rr - buf;					\
+		} else							\
+			y = -EINVAL;					\
+	}								\
+} while (0)
+
+#define __store_tunable_umax(p, type, buf, y)				\
+do {									\
+	ulong _vv;							\
+	char *_rr;							\
+									\
+	_vv = simple_strtoul(buf, &_rr, 0);				\
+	if (_rr == buf)							\
+		y = -EINVAL;						\
+	else {								\
+		if (_vv <= p->max.abs_value.val_##type &&		\
+				_vv > p->min.value.val_##type) {	\
+			p->max.value.val_##type = _vv;			\
+			y = _rr - buf;					\
+		} else							\
+			y = -EINVAL;					\
+	}								\
+} while (0)
+
+/*
+ * Store routines for the max tunables values
+ */
+static inline ssize_t store_tunable_max_short(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_umax(p, short, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_max_ushort(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_umax(p, ushort, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_max_int(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_max(p, int, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_max_uint(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_umax(p, uint, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_max_size_t(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_umax(p, size_t, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_max_long(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_max(p, long, buf, _count);
+	return _count;
+}
+
+static inline ssize_t store_tunable_max_ulong(struct auto_tune *p,
+					const char *buf, size_t count)
+{
+	ssize_t _count;
+	__store_tunable_umax(p, ulong, buf, _count);
+	return _count;
+}
 
 #endif /* AKT_OPS_H */
Index: linux-2.6.20-rc4/kernel/autotune/akt.c
===================================================================
--- linux-2.6.20-rc4.orig/kernel/autotune/akt.c	2007-01-15 15:13:31.000000000 +0100
+++ linux-2.6.20-rc4/kernel/autotune/akt.c	2007-01-15 15:25:35.000000000 +0100
@@ -28,6 +28,10 @@
  *              unregister_tunable         (exported)
  *              show_tuning_mode           (exported)
  *              store_tuning_mode          (exported)
+ *              show_tunable_min           (exported)
+ *              store_tunable_min          (exported)
+ *              show_tunable_max           (exported)
+ *              store_tunable_max          (exported)
  */
 
 #include <linux/init.h>
@@ -203,5 +207,121 @@ ssize_t store_tuning_mode(struct auto_tu
 }
 
 
+/*
+ * FUNCTION:    Get operation called by tunable_attr_show (i.e. when the file
+ *              /sys/tunables/<tunable>/min is displayed).
+ *              Outputs the current tunable minimum value
+ *
+ * RETURN VALUE: >0 : output string length (including the '\0')
+ *               <0 : failure
+ */
+ssize_t show_tunable_min(struct auto_tune *tun_addr, char *buf)
+{
+	ssize_t rc;
+
+	if (tun_addr == NULL) {
+		printk(KERN_ERR
+			" show_tunable_min(): tunable address is invalid\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&tun_addr->tunable_lck);
+
+	rc = tun_addr->min.show(tun_addr, buf);
+
+	spin_unlock(&tun_addr->tunable_lck);
+
+	return rc;
+}
+
+
+/*
+ * FUNCTION:    Set operation called by tunable_attr_store (i.e. when a
+ *              string is stored into /sys/tunables/<tunable>/min).
+ *
+ * PARAMETERS:  count: input buffer size (including the '\0')
+ *
+ * RETURN VALUE: >0: number of characters used from the input buffer
+ *               <= 0: failure
+ */
+ssize_t store_tunable_min(struct auto_tune *tun_addr, const char *buf,
+			size_t count)
+{
+	ssize_t rc;
+
+	if (tun_addr == NULL) {
+		printk(KERN_ERR
+			" store_tunable_min(): tunable address is invalid\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&tun_addr->tunable_lck);
+
+	rc = tun_addr->min.store(tun_addr, buf, count);
+
+	spin_unlock(&tun_addr->tunable_lck);
+
+	return rc;
+}
+
+
+/*
+ * FUNCTION:    Get operation called by tunable_attr_show (i.e. when the file
+ *              /sys/tunables/<tunable>/max is displayed).
+ *              Outputs the current tunable maximum value
+ *
+ * RETURN VALUE: >0 : output string length (including the '\0')
+ *               <0 : failure
+ */
+ssize_t show_tunable_max(struct auto_tune *tun_addr, char *buf)
+{
+	ssize_t rc;
+
+	if (tun_addr == NULL) {
+		printk(KERN_ERR
+			" show_tunable_max(): tunable address is invalid\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&tun_addr->tunable_lck);
+
+	rc = tun_addr->max.show(tun_addr, buf);
+
+	spin_unlock(&tun_addr->tunable_lck);
+
+	return rc;
+}
+
+
+/*
+ * FUNCTION:    Set operation called by tunable_attr_store (i.e. when a
+ *              string is stored into /sys/tunables/<tunable>/max).
+ *
+ * PARAMETERS:  count: input buffer size (including the '\0')
+ *
+ * RETURN VALUE: >0: number of characters used from the input buffer
+ *               <= 0: failure
+ */
+ssize_t store_tunable_max(struct auto_tune *tun_addr, const char *buf,
+			size_t count)
+{
+	ssize_t rc;
+
+	if (tun_addr == NULL) {
+		printk(KERN_ERR
+			" store_tunable_max(): tunable address is invalid\n");
+		return -EINVAL;
+	}
+
+	spin_lock(&tun_addr->tunable_lck);
+
+	rc = tun_addr->max.store(tun_addr, buf, count);
+
+	spin_unlock(&tun_addr->tunable_lck);
+
+	return rc;
+}
+
+
 EXPORT_SYMBOL_GPL(register_tunable);
 EXPORT_SYMBOL_GPL(unregister_tunable);
Index: linux-2.6.20-rc4/kernel/autotune/akt_sysfs.c
===================================================================
--- linux-2.6.20-rc4.orig/kernel/autotune/akt_sysfs.c	2007-01-15 15:14:55.000000000 +0100
+++ linux-2.6.20-rc4/kernel/autotune/akt_sysfs.c	2007-01-15 15:26:31.000000000 +0100
@@ -54,8 +54,16 @@ struct tunable_attribute tun_attr_##_nam
 static TUNABLE_ATTR(autotune, S_IWUSR | S_IRUGO, show_tuning_mode,
 		store_tuning_mode);
 
+static TUNABLE_ATTR(min, S_IWUSR | S_IRUGO, show_tunable_min,
+		store_tunable_min);
+
+static TUNABLE_ATTR(max, S_IWUSR | S_IRUGO, show_tunable_max,
+		store_tunable_max);
+
 static struct tunable_attribute *tunable_sysfs_attrs[] = {
 	&tun_attr_autotune,	/* to (de)activate auto tuning */
+	&tun_attr_min,		/* to play with the tunable min value */
+	&tun_attr_max,		/* to play with the tunable max value */
 	NULL,
 };
 

--

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 5/6] per namespace tunables
  2007-01-16  6:15 [RFC][PATCH 0/6] Automatice kernel tunables (AKT) Nadia.Derbey
                   ` (3 preceding siblings ...)
  2007-01-16  6:15 ` [RFC][PATCH 4/6] min and max kobjects Nadia.Derbey
@ 2007-01-16  6:15 ` Nadia.Derbey
  2007-01-24 22:41   ` Randy Dunlap
  2007-01-16  6:15 ` [RFC][PATCH 6/6] automatic tuning applied to some kernel components Nadia.Derbey
  5 siblings, 1 reply; 25+ messages in thread
From: Nadia.Derbey @ 2007-01-16  6:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nadia Derbey

[-- Attachment #1: per_namespace_tunables.patch --]
[-- Type: text/plain, Size: 6805 bytes --]

[PATCH 05/06]


This patch introduces all that is needed to process per namespace tunables.


Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>


---
 include/linux/akt.h   |   12 +++++++
 kernel/autotune/akt.c |   80 ++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 73 insertions(+), 19 deletions(-)

Index: linux-2.6.20-rc4/include/linux/akt.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/akt.h	2007-01-15 15:21:47.000000000 +0100
+++ linux-2.6.20-rc4/include/linux/akt.h	2007-01-15 15:31:44.000000000 +0100
@@ -154,6 +154,7 @@ struct auto_tune {
  */
 #define AUTO_TUNE_ENABLE  0x01
 #define TUNABLE_REGISTERED  0x02
+#define TUNABLE_IPC_NS      0x04
 
 
 /*
@@ -204,6 +205,8 @@ static inline int is_tunable_registered(
 	}
 
 
+#define DECLARE_TUNABLE(s)	struct auto_tune s;
+
 #define DEFINE_TUNABLE(s, thr, min, max, tun, chk, type)		\
 	struct auto_tune s = TUNABLE_INIT(#s, thr, min, max, tun, chk, type)
 
@@ -215,6 +218,13 @@ static inline int is_tunable_registered(
 		(s).max.abs_value.val_##type = _max;	\
 	} while (0)
 
+#define init_tunable_ipcns(ns, s, thr, min, max, tun, chk, type)	\
+	do {								\
+		DEFINE_TUNABLE(s, thr, min, max, tun, chk, type);	\
+		s.flags |= TUNABLE_IPC_NS;				\
+		ns->s = s;						\
+	} while (0)
+
 
 static inline void set_autotuning_routine(struct auto_tune *tunable,
 					auto_tune_fn fn)
@@ -269,7 +279,9 @@ extern ssize_t store_tunable_max(struct 
 #else	/* CONFIG_AKT */
 
 
+#define DECLARE_TUNABLE(s)
 #define DEFINE_TUNABLE(s, thresh, min, max, tun, chk, type)
+#define init_tunable_ipcns(ns, s, th, m, M, tun, chk, type)  do { } while (0)
 #define set_tunable_min_max(s, min, max, type)   do { } while (0)
 #define set_autotuning_routine(s, fn)            do { } while (0)
 
Index: linux-2.6.20-rc4/kernel/autotune/akt.c
===================================================================
--- linux-2.6.20-rc4.orig/kernel/autotune/akt.c	2007-01-15 15:25:35.000000000 +0100
+++ linux-2.6.20-rc4/kernel/autotune/akt.c	2007-01-15 15:37:16.000000000 +0100
@@ -32,6 +32,7 @@
  *              store_tunable_min          (exported)
  *              show_tunable_max           (exported)
  *              store_tunable_max          (exported)
+ *              get_ns_tunable             (static)
  */
 
 #include <linux/init.h>
@@ -45,6 +46,8 @@
 #define AKT_AUTO   1
 #define AKT_MANUAL 0
 
+static struct auto_tune *get_ns_tunable(struct auto_tune *);
+
 
 
 /*
@@ -142,6 +145,7 @@ int unregister_tunable(struct auto_tune 
 ssize_t show_tuning_mode(struct auto_tune *tun_addr, char *buf)
 {
 	int valid;
+	struct auto_tune *which;
 
 	if (tun_addr == NULL) {
 		printk(KERN_ERR
@@ -149,11 +153,13 @@ ssize_t show_tuning_mode(struct auto_tun
 		return -EINVAL;
 	}
 
-	spin_lock(&tun_addr->tunable_lck);
+	which = get_ns_tunable(tun_addr);
+
+	spin_lock(&which->tunable_lck);
 
-	valid = is_auto_tune_enabled(tun_addr);
+	valid = is_auto_tune_enabled(which);
 
-	spin_unlock(&tun_addr->tunable_lck);
+	spin_unlock(&which->tunable_lck);
 
 	return snprintf(buf, PAGE_SIZE, "%d\n", valid);
 }
@@ -176,6 +182,7 @@ ssize_t store_tuning_mode(struct auto_tu
 			size_t count)
 {
 	int new_value;
+	struct auto_tune *which;
 	int rc;
 
 	if ((rc = sscanf(buffer, "%d", &new_value)) != 1)
@@ -190,18 +197,20 @@ ssize_t store_tuning_mode(struct auto_tu
 		return -EINVAL;
 	}
 
-	spin_lock(&tun_addr->tunable_lck);
+	which = get_ns_tunable(tun_addr);
+
+	spin_lock(&which->tunable_lck);
 
 	switch (new_value) {
 	case AKT_AUTO:
-		tun_addr->flags |= AUTO_TUNE_ENABLE;
+		which->flags |= AUTO_TUNE_ENABLE;
 		break;
 	case AKT_MANUAL:
-		tun_addr->flags &= ~AUTO_TUNE_ENABLE;
+		which->flags &= ~AUTO_TUNE_ENABLE;
 		break;
 	}
 
-	spin_unlock(&tun_addr->tunable_lck);
+	spin_unlock(&which->tunable_lck);
 
 	return strnlen(buffer, PAGE_SIZE);
 }
@@ -218,6 +227,7 @@ ssize_t store_tuning_mode(struct auto_tu
 ssize_t show_tunable_min(struct auto_tune *tun_addr, char *buf)
 {
 	ssize_t rc;
+	struct auto_tune *which;
 
 	if (tun_addr == NULL) {
 		printk(KERN_ERR
@@ -225,11 +235,13 @@ ssize_t show_tunable_min(struct auto_tun
 		return -EINVAL;
 	}
 
-	spin_lock(&tun_addr->tunable_lck);
+	which = get_ns_tunable(tun_addr);
 
-	rc = tun_addr->min.show(tun_addr, buf);
+	spin_lock(&which->tunable_lck);
 
-	spin_unlock(&tun_addr->tunable_lck);
+	rc = which->min.show(which, buf);
+
+	spin_unlock(&which->tunable_lck);
 
 	return rc;
 }
@@ -248,6 +260,7 @@ ssize_t store_tunable_min(struct auto_tu
 			size_t count)
 {
 	ssize_t rc;
+	struct auto_tune *which;
 
 	if (tun_addr == NULL) {
 		printk(KERN_ERR
@@ -255,11 +268,13 @@ ssize_t store_tunable_min(struct auto_tu
 		return -EINVAL;
 	}
 
-	spin_lock(&tun_addr->tunable_lck);
+	which = get_ns_tunable(tun_addr);
+
+	spin_lock(&which->tunable_lck);
 
-	rc = tun_addr->min.store(tun_addr, buf, count);
+	rc = which->min.store(which, buf, count);
 
-	spin_unlock(&tun_addr->tunable_lck);
+	spin_unlock(&which->tunable_lck);
 
 	return rc;
 }
@@ -276,6 +291,7 @@ ssize_t store_tunable_min(struct auto_tu
 ssize_t show_tunable_max(struct auto_tune *tun_addr, char *buf)
 {
 	ssize_t rc;
+	struct auto_tune *which;
 
 	if (tun_addr == NULL) {
 		printk(KERN_ERR
@@ -283,11 +299,13 @@ ssize_t show_tunable_max(struct auto_tun
 		return -EINVAL;
 	}
 
-	spin_lock(&tun_addr->tunable_lck);
+	which = get_ns_tunable(tun_addr);
 
-	rc = tun_addr->max.show(tun_addr, buf);
+	spin_lock(&which->tunable_lck);
 
-	spin_unlock(&tun_addr->tunable_lck);
+	rc = which->max.show(which, buf);
+
+	spin_unlock(&which->tunable_lck);
 
 	return rc;
 }
@@ -306,6 +324,7 @@ ssize_t store_tunable_max(struct auto_tu
 			size_t count)
 {
 	ssize_t rc;
+	struct auto_tune *which;
 
 	if (tun_addr == NULL) {
 		printk(KERN_ERR
@@ -313,15 +332,38 @@ ssize_t store_tunable_max(struct auto_tu
 		return -EINVAL;
 	}
 
-	spin_lock(&tun_addr->tunable_lck);
+	which = get_ns_tunable(tun_addr);
+
+	spin_lock(&which->tunable_lck);
 
-	rc = tun_addr->max.store(tun_addr, buf, count);
+	rc = which->max.store(which, buf, count);
 
-	spin_unlock(&tun_addr->tunable_lck);
+	spin_unlock(&which->tunable_lck);
 
 	return rc;
 }
 
 
+/*
+ * FUNCTION:    This routine gets the actual auto_tune structure for the
+ *              tunables that are per namespace (presently only ipc ones).
+ *
+ * RETURN VALUE: pointer to the tunable structure for the current namespace
+ */
+static struct auto_tune *get_ns_tunable(struct auto_tune *p)
+{
+	if (p->flags & TUNABLE_IPC_NS) {
+		char *shift = (char *) p;
+		struct ipc_namespace *ns = current->nsproxy->ipc_ns;
+
+		shift = (shift - (char *) &init_ipc_ns) + (char *) ns;
+
+		return (struct auto_tune *) shift;
+	}
+
+	return p;
+}
+
+
 EXPORT_SYMBOL_GPL(register_tunable);
 EXPORT_SYMBOL_GPL(unregister_tunable);

--

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-01-16  6:15 [RFC][PATCH 0/6] Automatice kernel tunables (AKT) Nadia.Derbey
                   ` (4 preceding siblings ...)
  2007-01-16  6:15 ` [RFC][PATCH 5/6] per namespace tunables Nadia.Derbey
@ 2007-01-16  6:15 ` Nadia.Derbey
  2007-01-22 19:56   ` Andrew Morton
  5 siblings, 1 reply; 25+ messages in thread
From: Nadia.Derbey @ 2007-01-16  6:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nadia Derbey

[-- Attachment #1: auto_tune_applied.patch --]
[-- Type: text/plain, Size: 14597 bytes --]

[PATCH 06/06]


The following kernel components register a tunable structure and call the
auto-tuning routine:
  . file system
  . shared memory (per namespace)
  . semaphore (per namespace)
  . message queues (per namespace)


Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>


---
 fs/file_table.c     |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/akt.h |    1 
 include/linux/ipc.h |    6 +++
 init/main.c         |    1 
 ipc/msg.c           |   19 ++++++++++++
 ipc/sem.c           |   41 ++++++++++++++++++++++++++
 ipc/shm.c           |   74 ++++++++++++++++++++++++++++++++++++++++++++---
 7 files changed, 218 insertions(+), 5 deletions(-)

Index: linux-2.6.20-rc4/fs/file_table.c
===================================================================
--- linux-2.6.20-rc4.orig/fs/file_table.c	2007-01-15 13:08:14.000000000 +0100
+++ linux-2.6.20-rc4/fs/file_table.c	2007-01-15 15:44:39.000000000 +0100
@@ -21,6 +21,8 @@
 #include <linux/fsnotify.h>
 #include <linux/sysctl.h>
 #include <linux/percpu_counter.h>
+#include <linux/akt.h>
+#include <linux/akt_ops.h>
 
 #include <asm/atomic.h>
 
@@ -34,6 +36,71 @@ __cacheline_aligned_in_smp DEFINE_SPINLO
 
 static struct percpu_counter nr_files __cacheline_aligned_in_smp;
 
+#ifdef CONFIG_AKT
+
+static int get_nr_files(void);
+
+/********** automatic tuning **********/
+#define FILPTHRESH 80		/* threshold = 80% */
+
+/*
+ * FUNCTION:    This is the routine called to accomplish auto tuning for the
+ *              max_files tunable.
+ *
+ *              Upwards adjustment:
+ *                  Adjustment is needed if nr_files has reached
+ *                  (threshold / 100 * max_files)
+ *                  In that case, max_files is set to
+ *                  (tunable + max_files * (100 - threshold) / 100)
+ *
+ *              Downards adjustment:
+ *                   Adjustment is needed if nr_files has fallen under
+ *                   (threshold / 100 * max_files previous value)
+ *                   In that case max_files is set back to its previous value,
+ *                   i.e. to (max_files * 100 / (200 - threshold))
+ *
+ * PARAMETERS:  cmd: controls the adjustment direction (up / down)
+ *              params: pointer to the registered tunable structure
+ *
+ * EXECUTION ENVIRONMENT: This routine should be called with the
+ *                        params->tunable_lck lock held
+ *
+ * RETURN VALUE: 1 if tunable has been adjusted
+ *               0 else
+ */
+static inline int maxfiles_auto_tuning(int cmd, struct auto_tune *params)
+{
+	int thr = params->threshold;
+	int min = params->min.value.val_int;
+	int max = params->max.value.val_int;
+	int tun = files_stat.max_files;
+
+	if (cmd == AKT_UP) {
+		if (get_nr_files() >= tun * thr / 100 && tun < max) {
+			int new = tun * (200 - thr) / 100;
+
+			files_stat.max_files = min(max, new);
+			return 1;
+		} else
+			return 0;
+	}
+
+	if (get_nr_files() < tun * thr / (200 - thr) && tun > min) {
+		int new = tun * 100 / (200 - thr);
+
+		files_stat.max_files = max(min, new);
+		return 1;
+	} else
+		return 0;
+}
+
+#endif /* CONFIG_AKT */
+
+/* The maximum value will be known later on */
+DEFINE_TUNABLE(maxfiles_akt, FILPTHRESH, 0, 0, &files_stat.max_files,
+		&nr_files, int);
+
+
 static inline void file_free_rcu(struct rcu_head *head)
 {
 	struct file *f =  container_of(head, struct file, f_u.fu_rcuhead);
@@ -44,6 +111,8 @@ static inline void file_free(struct file
 {
 	percpu_counter_dec(&nr_files);
 	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
+
+	activate_auto_tuning(AKT_DOWN, &maxfiles_akt);
 }
 
 /*
@@ -91,6 +160,8 @@ struct file *get_empty_filp(void)
 	static int old_max;
 	struct file * f;
 
+	activate_auto_tuning(AKT_UP, &maxfiles_akt);
+
 	/*
 	 * Privileged users can go above max_files
 	 */
@@ -299,6 +370,16 @@ void __init files_init(unsigned long mem
 	files_stat.max_files = n; 
 	if (files_stat.max_files < NR_FILE)
 		files_stat.max_files = NR_FILE;
+
+	set_tunable_min_max(maxfiles_akt, n, n * 2, int);
+	set_autotuning_routine(&maxfiles_akt, maxfiles_auto_tuning);
+
 	files_defer_init();
 	percpu_counter_init(&nr_files, 0);
 } 
+
+void __init files_late_init(void)
+{
+	if (register_tunable(&maxfiles_akt))
+		printk(KERN_WARNING "Failed registering tunable file-max\n");
+}
Index: linux-2.6.20-rc4/include/linux/akt.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/akt.h	2007-01-15 15:31:44.000000000 +0100
+++ linux-2.6.20-rc4/include/linux/akt.h	2007-01-15 15:45:29.000000000 +0100
@@ -295,5 +295,6 @@ static inline void init_auto_tuning(void
 #endif	/* CONFIG_AKT */
 
 extern void fork_late_init(void);
+extern void files_late_init(void);
 
 #endif /* AKT_H */
Index: linux-2.6.20-rc4/init/main.c
===================================================================
--- linux-2.6.20-rc4.orig/init/main.c	2007-01-15 15:09:27.000000000 +0100
+++ linux-2.6.20-rc4/init/main.c	2007-01-15 15:46:09.000000000 +0100
@@ -616,6 +616,7 @@ asmlinkage void __init start_kernel(void
 	page_writeback_init();
 	init_auto_tuning();
 	fork_late_init();
+	files_late_init();
 #ifdef CONFIG_PROC_FS
 	proc_root_init();
 #endif
Index: linux-2.6.20-rc4/ipc/msg.c
===================================================================
--- linux-2.6.20-rc4.orig/ipc/msg.c	2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/ipc/msg.c	2007-01-15 15:48:16.000000000 +0100
@@ -36,6 +36,8 @@
 #include <linux/seq_file.h>
 #include <linux/mutex.h>
 #include <linux/nsproxy.h>
+#include <linux/akt.h>
+#include <linux/akt_ops.h>
 
 #include <asm/current.h>
 #include <asm/uaccess.h>
@@ -94,6 +96,11 @@ static void __ipc_init __msg_init_ns(str
 	ns->msg_ctlmnb = MSGMNB;
 	ns->msg_ctlmni = MSGMNI;
 	ipc_init_ids(ids, ns->msg_ctlmni);
+
+#define MSGTHRESH 80
+
+	init_tunable_ipcns(ns, msgmni_akt, MSGTHRESH, MSGMNI, IPCMNI,
+		&ns->msg_ctlmni, &ids->in_use, int);
 }
 
 #ifdef CONFIG_IPC_NS
@@ -133,6 +140,10 @@ void msg_exit_ns(struct ipc_namespace *n
 void __init msg_init(void)
 {
 	__msg_init_ns(&init_ipc_ns, &init_msg_ids);
+
+	if (register_tunable(&init_ipc_ns.msgmni_akt))
+		printk(KERN_WARNING " Failed registering tunable msgmni\n");
+
 	ipc_init_proc_interface("sysvipc/msg",
 				"       key      msqid perms      cbytes       qnum lspid lrpid   uid   gid  cuid  cgid      stime      rtime      ctime\n",
 				IPC_MSG_IDS, sysvipc_msg_proc_show);
@@ -262,6 +273,8 @@ asmlinkage long sys_msgget(key_t key, in
 
 	ns = current->nsproxy->ipc_ns;
 	
+	activate_auto_tuning(AKT_UP, &ns->msgmni_akt);
+
 	mutex_lock(&msg_ids(ns).mutex);
 	if (key == IPC_PRIVATE) 
 		ret = newque(ns, key, msgflg);
@@ -391,6 +404,7 @@ asmlinkage long sys_msgctl(int msqid, in
 	struct msg_queue *msq;
 	int err, version;
 	struct ipc_namespace *ns;
+	int destroyed = 0;
 
 	if (msqid < 0 || cmd < 0)
 		return -EINVAL;
@@ -555,11 +569,16 @@ asmlinkage long sys_msgctl(int msqid, in
 	}
 	case IPC_RMID:
 		freeque(ns, msq, msqid);
+		destroyed = 1;
 		break;
 	}
 	err = 0;
 out_up:
 	mutex_unlock(&msg_ids(ns).mutex);
+
+	if (destroyed)
+		activate_auto_tuning(AKT_DOWN, &ns->msgmni_akt);
+
 	return err;
 out_unlock_up:
 	msg_unlock(msq);
Index: linux-2.6.20-rc4/ipc/shm.c
===================================================================
--- linux-2.6.20-rc4.orig/ipc/shm.c	2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/ipc/shm.c	2007-01-15 15:49:00.000000000 +0100
@@ -37,6 +37,8 @@
 #include <linux/seq_file.h>
 #include <linux/mutex.h>
 #include <linux/nsproxy.h>
+#include <linux/akt.h>
+#include <linux/akt_ops.h>
 
 #include <asm/uaccess.h>
 
@@ -75,17 +77,27 @@ static void __ipc_init __shm_init_ns(str
 	ns->shm_ctlmni = SHMMNI;
 	ns->shm_tot = 0;
 	ipc_init_ids(ids, 1);
+
+#define SHMTHRESH 80
+	init_tunable_ipcns(ns, shmmni_akt, SHMTHRESH, SHMMNI, IPCMNI,
+		&ns->shm_ctlmni, &ids->in_use, int);
+	init_tunable_ipcns(ns, shmall_akt, SHMTHRESH, SHMALL,
+		SHMMAX / PAGE_SIZE * (IPCMNI / 16), &ns->shm_ctlall,
+		&ns->shm_tot, size_t);
 }
 
-static void do_shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *shp)
+static int do_shm_rmid(struct ipc_namespace *ns, struct shmid_kernel *shp)
 {
 	if (shp->shm_nattch){
 		shp->shm_perm.mode |= SHM_DEST;
 		/* Do not find it any more */
 		shp->shm_perm.key = IPC_PRIVATE;
 		shm_unlock(shp);
-	} else
+		return 0;
+	} else {
 		shm_destroy(ns, shp);
+		return 1;
+	}
 }
 
 #ifdef CONFIG_IPC_NS
@@ -125,6 +137,13 @@ void shm_exit_ns(struct ipc_namespace *n
 void __init shm_init (void)
 {
 	__shm_init_ns(&init_ipc_ns, &init_shm_ids);
+
+	if (register_tunable(&init_ipc_ns.shmmni_akt))
+		printk(KERN_WARNING "Failed registering tunable shmmni\n");
+
+	if (register_tunable(&init_ipc_ns.shmall_akt))
+		printk(KERN_WARNING "Failed registering tunable shmall\n");
+
 	ipc_init_proc_interface("sysvipc/shm",
 				"       key      shmid perms       size  cpid  lpid nattch   uid   gid  cuid  cgid      atime      dtime      ctime\n",
 				IPC_SHM_IDS, sysvipc_shm_proc_show);
@@ -206,6 +225,7 @@ static void shm_close (struct vm_area_st
 	int id = file->f_path.dentry->d_inode->i_ino;
 	struct shmid_kernel *shp;
 	struct ipc_namespace *ns;
+	int destroyed = 0;
 
 	ns = shm_file_ns(file);
 
@@ -217,11 +237,27 @@ static void shm_close (struct vm_area_st
 	shp->shm_dtim = get_seconds();
 	shp->shm_nattch--;
 	if(shp->shm_nattch == 0 &&
-	   shp->shm_perm.mode & SHM_DEST)
+	   shp->shm_perm.mode & SHM_DEST) {
 		shm_destroy(ns, shp);
-	else
+		destroyed = 1;
+	} else
 		shm_unlock(shp);
 	mutex_unlock(&shm_ids(ns).mutex);
+
+	if (destroyed) {
+		int rc;
+
+		rc = activate_auto_tuning(AKT_DOWN, &ns->shmmni_akt);
+		if (rc)
+			/*
+			 * shm_ctlmni has been adjusted == > change
+			 * shm_ctlall value
+			 */
+			ns->shm_ctlall = ns->shm_ctlmax / PAGE_SIZE *
+				(ns->shm_ctlmni / 16);
+
+		activate_auto_tuning(AKT_DOWN, &ns->shmall_akt);
+	}
 }
 
 static int shm_mmap(struct file * file, struct vm_area_struct * vma)
@@ -355,9 +391,20 @@ asmlinkage long sys_shmget (key_t key, s
 	struct shmid_kernel *shp;
 	int err, id = 0;
 	struct ipc_namespace *ns;
+	int rc;
 
 	ns = current->nsproxy->ipc_ns;
 
+	rc = activate_auto_tuning(AKT_UP, &ns->shmmni_akt);
+	if (rc)
+		/*
+		 * shm_ctlmni has been adjusted == > change shm_ctlall value
+		 */
+		ns->shm_ctlall = ns->shm_ctlmax / PAGE_SIZE
+				* (ns->shm_ctlmni / 16);
+
+	activate_auto_tuning(AKT_UP, &ns->shmall_akt);
+
 	mutex_lock(&shm_ids(ns).mutex);
 	if (key == IPC_PRIVATE) {
 		err = newseg(ns, key, shmflg, size);
@@ -516,6 +563,7 @@ asmlinkage long sys_shmctl (int shmid, i
 	struct shmid_kernel *shp;
 	int err, version;
 	struct ipc_namespace *ns;
+	int destroyed;
 
 	if (cmd < 0 || shmid < 0) {
 		err = -EINVAL;
@@ -701,8 +749,24 @@ asmlinkage long sys_shmctl (int shmid, i
 		if (err)
 			goto out_unlock_up;
 
-		do_shm_rmid(ns, shp);
+		destroyed = do_shm_rmid(ns, shp);
 		mutex_unlock(&shm_ids(ns).mutex);
+
+		if (destroyed) {
+			int rc;
+
+			rc = activate_auto_tuning(AKT_DOWN, &ns->shmmni_akt);
+			if (rc)
+				/*
+				 * shm_ctlmni has been adjusted == > change
+				 * shm_ctlall value
+				 */
+				ns->shm_ctlall = ns->shm_ctlmax / PAGE_SIZE *
+					(ns->shm_ctlmni / 16);
+
+			activate_auto_tuning(AKT_DOWN, &ns->shmall_akt);
+		}
+
 		goto out;
 	}
 
Index: linux-2.6.20-rc4/ipc/sem.c
===================================================================
--- linux-2.6.20-rc4.orig/ipc/sem.c	2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/ipc/sem.c	2007-01-15 15:49:41.000000000 +0100
@@ -83,6 +83,8 @@
 #include <linux/seq_file.h>
 #include <linux/mutex.h>
 #include <linux/nsproxy.h>
+#include <linux/akt.h>
+#include <linux/akt_ops.h>
 
 #include <asm/uaccess.h>
 #include "util.h"
@@ -131,6 +133,12 @@ static void __ipc_init __sem_init_ns(str
 	ns->sc_semmni = SEMMNI;
 	ns->used_sems = 0;
 	ipc_init_ids(ids, ns->sc_semmni);
+
+#define SEMTHRESH 80
+	init_tunable_ipcns(ns, semmni_akt, SEMTHRESH, SEMMNI, IPCMNI,
+		&(ns->sc_semmni), &ids->in_use, int);
+	init_tunable_ipcns(ns, semmns_akt, SEMTHRESH, SEMMNS,
+		IPCMNI * SEMMSL, &(ns->sc_semmns), &ns->used_sems, int);
 }
 
 #ifdef CONFIG_IPC_NS
@@ -170,6 +178,13 @@ void sem_exit_ns(struct ipc_namespace *n
 void __init sem_init (void)
 {
 	__sem_init_ns(&init_ipc_ns, &init_sem_ids);
+
+	if (register_tunable(&init_ipc_ns.semmni_akt))
+		printk(KERN_WARNING "Failed registering tunable semmni\n");
+
+	if (register_tunable(&init_ipc_ns.semmns_akt))
+		printk(KERN_WARNING "Failed registering tunable semmns\n");
+
 	ipc_init_proc_interface("sysvipc/sem",
 				"       key      semid perms      nsems   uid   gid  cuid  cgid      otime      ctime\n",
 				IPC_SEM_IDS, sysvipc_sem_proc_show);
@@ -263,11 +278,22 @@ asmlinkage long sys_semget (key_t key, i
 	int id, err = -EINVAL;
 	struct sem_array *sma;
 	struct ipc_namespace *ns;
+	int rc;
 
 	ns = current->nsproxy->ipc_ns;
 
 	if (nsems < 0 || nsems > ns->sc_semmsl)
 		return -EINVAL;
+
+	rc = activate_auto_tuning(AKT_UP, &ns->semmni_akt);
+	if (rc)
+		/*
+		 * sc_semmni has been adjusted == > change sc_semmns value
+		 */
+		ns->sc_semmns = ns->sc_semmni * ns->sc_semmsl;
+
+	activate_auto_tuning(AKT_UP, &ns->semmns_akt);
+
 	mutex_lock(&sem_ids(ns).mutex);
 	
 	if (key == IPC_PRIVATE) {
@@ -899,6 +925,21 @@ static int semctl_down(struct ipc_namesp
 	case IPC_RMID:
 		freeary(ns, sma, semid);
 		err = 0;
+
+		{
+			int rc;
+
+			rc = activate_auto_tuning(AKT_DOWN, &ns->semmni_akt);
+			if (rc)
+				/*
+				 * sc_semmni has been adjusted ==>
+				 * change sc_semmns value
+				 */
+				ns->sc_semmns = ns->sc_semmni * ns->sc_semmsl;
+
+			activate_auto_tuning(AKT_DOWN, &ns->semmns_akt);
+		}
+
 		break;
 	case IPC_SET:
 		ipcp->uid = setbuf.uid;
Index: linux-2.6.20-rc4/include/linux/ipc.h
===================================================================
--- linux-2.6.20-rc4.orig/include/linux/ipc.h	2007-01-15 13:08:15.000000000 +0100
+++ linux-2.6.20-rc4/include/linux/ipc.h	2007-01-15 15:52:19.000000000 +0100
@@ -52,6 +52,7 @@ struct ipc_perm
 #ifdef __KERNEL__
 
 #include <linux/kref.h>
+#include <linux/akt.h>
 
 #define IPCMNI 32768  /* <= MAX_INT limit for ipc arrays (including sysctl changes) */
 
@@ -77,15 +78,20 @@ struct ipc_namespace {
 
 	int		sem_ctls[4];
 	int		used_sems;
+	DECLARE_TUNABLE(semmni_akt);
+	DECLARE_TUNABLE(semmns_akt);
 
 	int		msg_ctlmax;
 	int		msg_ctlmnb;
 	int		msg_ctlmni;
+	DECLARE_TUNABLE(msgmni_akt);
 
 	size_t		shm_ctlmax;
 	size_t		shm_ctlall;
 	int		shm_ctlmni;
 	int		shm_tot;
+	DECLARE_TUNABLE(shmmni_akt);
+	DECLARE_TUNABLE(shmall_akt);
 };
 
 extern struct ipc_namespace init_ipc_ns;

--

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-01-16  6:15 ` [RFC][PATCH 6/6] automatic tuning applied to some kernel components Nadia.Derbey
@ 2007-01-22 19:56   ` Andrew Morton
  2007-01-23 14:40     ` Nadia Derbey
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2007-01-22 19:56 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: linux-kernel, Nadia.Derbey

> On Tue, 16 Jan 2007 07:15:22 +0100 Nadia.Derbey@bull.net wrote:
> The following kernel components register a tunable structure and call the
> auto-tuning routine:
>   . file system
>   . shared memory (per namespace)
>   . semaphore (per namespace)
>   . message queues (per namespace)

This is the part of the patch series which really matters, and I just don't
understand it :(

Why do we want to autotune these things?  What problem is this patch series
solving?  Please describe this part of the work much, much more completely,
so we can understand the need to add such a large amount of code to the
kernel.

It seems strange that the whole feature is Kconfigurable.  Please also
explain the thinking behind that.

I suspect the patches would be much simpler if you simply required that all
these new tunables be of type `long'.  About seven eighths of the code
would go away.  As would most of those eye-popping macros.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-01-22 19:56   ` Andrew Morton
@ 2007-01-23 14:40     ` Nadia Derbey
  2007-02-07 21:18       ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Nadia Derbey @ 2007-01-23 14:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton wrote:
>>On Tue, 16 Jan 2007 07:15:22 +0100 Nadia.Derbey@bull.net wrote:
>>The following kernel components register a tunable structure and call the
>>auto-tuning routine:
>>  . file system
>>  . shared memory (per namespace)
>>  . semaphore (per namespace)
>>  . message queues (per namespace)
> 
> 
> This is the part of the patch series which really matters, and I just don't
> understand it :(
> 
> Why do we want to autotune these things?  What problem is this patch series
> solving?  Please describe this part of the work much, much more completely,
> so we can understand the need to add such a large amount of code to the
> kernel.

1) why these tunables?
The ipc tunables have been selected as "guinea-pig" tunables for the AKT 
framework because they are likely to be often used in data bases. This 
applies to file-max too.
Now, if the framework itself is accepted, the set of impacted tunables 
can easily be enhanced.

2) why autotuning:
There are at least 3 cases where it can be useful
. for workloads that are known to need a big amount of a given resource 
type (say shared memories), but we don't know what the maximum amount 
needed will be
. to solve the case of multiple applications running on a single system, 
and that need the same tunable to be adjusted to feet their needs
. to make a system correctly react to eventual peak loads for a given 
resource usage, i.e. make it tune up *and down* as needed.

In all these cases, the akt framework will enable the kernel to adapt to 
increasing / decreasing resource consumption:
1) avoid allocating "a priori" a big amount of memory that will be used 
only in extreme cases. This is the effect of doing an "echo <huge_value> 
 > /proc/sys/kernel/shmmni"
2) the system will come back to the default values as soon as the peak 
load is over.

> 
> It seems strange that the whole feature is Kconfigurable.  Please also
> explain the thinking behind that.

We wanted to make it configurable because it adds some overhead in terms of
1) generated kernel size
2) instructions added to the resource creation / removal code paths even 
if auto-tuning is not activated for th corresponding tunable -> 
performance impact.

> 
> I suspect the patches would be much simpler if you simply required that all
> these new tunables be of type `long'.  About seven eighths of the code
> would go away.  As would most of those eye-popping macros.
> 

Yes, agree with you: the idea here was to make the framework more 
generic. But I can change that.

Regards,
Nadia





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 4/6] min and max kobjects
  2007-01-16  6:15 ` [RFC][PATCH 4/6] min and max kobjects Nadia.Derbey
@ 2007-01-24 22:41   ` Randy Dunlap
  2007-01-25 16:34     ` Nadia Derbey
  0 siblings, 1 reply; 25+ messages in thread
From: Randy Dunlap @ 2007-01-24 22:41 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: linux-kernel

On Tue, 16 Jan 2007 07:15:20 +0100 Nadia.Derbey@bull.net wrote:

> [PATCH 04/06]
> 
> Introduces the kobjects associated to each tunable min and max value
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> ---
>  include/linux/akt.h         |   30 ++++
>  include/linux/akt_ops.h     |  311 ++++++++++++++++++++++++++++++++++++++++++++
>  kernel/autotune/akt.c       |  120 ++++++++++++++++
>  kernel/autotune/akt_sysfs.c |    8 +
>  4 files changed, 469 insertions(+)
> 
> Index: linux-2.6.20-rc4/include/linux/akt.h
> ===================================================================
> --- linux-2.6.20-rc4.orig/include/linux/akt.h	2007-01-15 15:08:41.000000000 +0100
> +++ linux-2.6.20-rc4/include/linux/akt.h	2007-01-15 15:21:47.000000000 +0100
> @@ -62,6 +62,13 @@ struct tunable_kobject {
>   * auto_tune structure.
>   * These values are type dependent and are used as high / low boundaries when
>   * tuning up or down.
> + * The show and store routines (thare are type dependent too) are here for

                                   they (or these ?)

> + * sysfs support (since the min and max can be updated through sysfs).
> + * The abs_value field is used to check that we are not:
> + *   . falling under the very 1st min value when updating the min value
> + *     through sysfs
> + *   . going over the very 1st max value when updating the max value
> + *     through sysfs
>   * The type is known when the tunable is defined (see DEFINE_TUNABLE macro).
>   */
>  struct typed_value {

> Index: linux-2.6.20-rc4/kernel/autotune/akt.c
> ===================================================================
> --- linux-2.6.20-rc4.orig/kernel/autotune/akt.c	2007-01-15 15:13:31.000000000 +0100
> +++ linux-2.6.20-rc4/kernel/autotune/akt.c	2007-01-15 15:25:35.000000000 +0100
> @@ -203,5 +207,121 @@ ssize_t store_tuning_mode(struct auto_tu
>  }
>  
>  
> +/*
> + * FUNCTION:    Get operation called by tunable_attr_show (i.e. when the file
> + *              /sys/tunables/<tunable>/min is displayed).
> + *              Outputs the current tunable minimum value
> + *
> + * RETURN VALUE: >0 : output string length (including the '\0')
> + *               <0 : failure
> + */

Since you are providing function comment header blocks, please use
the accepted kernel-doc format for (all of) these.

> +ssize_t show_tunable_min(struct auto_tune *tun_addr, char *buf)
> +{
> +	ssize_t rc;
> +
> +	if (tun_addr == NULL) {
> +		printk(KERN_ERR
> +			" show_tunable_min(): tunable address is invalid\n");
> +		return -EINVAL;
> +	}
> +
> +	spin_lock(&tun_addr->tunable_lck);
> +
> +	rc = tun_addr->min.show(tun_addr, buf);
> +
> +	spin_unlock(&tun_addr->tunable_lck);
> +
> +	return rc;
> +}
> +
> +
> +/*
> + * FUNCTION:    Set operation called by tunable_attr_store (i.e. when a
> + *              string is stored into /sys/tunables/<tunable>/min).
> + *
> + * PARAMETERS:  count: input buffer size (including the '\0')
> + *
> + * RETURN VALUE: >0: number of characters used from the input buffer
> + *               <= 0: failure

I would expect a return value of 0 not to indicate failure;
only <0 should do that.  So is this a typo or a real case where
a return of 0 indicates failure?

> + */
> +ssize_t store_tunable_min(struct auto_tune *tun_addr, const char *buf,
> +			size_t count)
> +{
> +	ssize_t rc;
> +
> +	if (tun_addr == NULL) {
> +		printk(KERN_ERR
> +			" store_tunable_min(): tunable address is invalid\n");
> +		return -EINVAL;
> +	}
> +
> +	spin_lock(&tun_addr->tunable_lck);
> +
> +	rc = tun_addr->min.store(tun_addr, buf, count);
> +
> +	spin_unlock(&tun_addr->tunable_lck);
> +
> +	return rc;
> +}
> +
> +
> +
> +
> +/*
> + * FUNCTION:    Set operation called by tunable_attr_store (i.e. when a
> + *              string is stored into /sys/tunables/<tunable>/max).
> + *
> + * PARAMETERS:  count: input buffer size (including the '\0')
> + *
> + * RETURN VALUE: >0: number of characters used from the input buffer
> + *               <= 0: failure

same question.

> + */
> +ssize_t store_tunable_max(struct auto_tune *tun_addr, const char *buf,
> +			size_t count)
> +{
> +	ssize_t rc;
> +
> +	if (tun_addr == NULL) {
> +		printk(KERN_ERR
> +			" store_tunable_max(): tunable address is invalid\n");
> +		return -EINVAL;
> +	}
> +
> +	spin_lock(&tun_addr->tunable_lck);
> +
> +	rc = tun_addr->max.store(tun_addr, buf, count);
> +
> +	spin_unlock(&tun_addr->tunable_lck);
> +
> +	return rc;
> +}
> +
> +
>  EXPORT_SYMBOL_GPL(register_tunable);
>  EXPORT_SYMBOL_GPL(unregister_tunable);

Thanks.

---
~Randy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 5/6] per namespace tunables
  2007-01-16  6:15 ` [RFC][PATCH 5/6] per namespace tunables Nadia.Derbey
@ 2007-01-24 22:41   ` Randy Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: Randy Dunlap @ 2007-01-24 22:41 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: linux-kernel

On Tue, 16 Jan 2007 07:15:21 +0100 Nadia.Derbey@bull.net wrote:

> [PATCH 05/06]
> 
> This patch introduces all that is needed to process per namespace tunables.
> 
> ---
>  include/linux/akt.h   |   12 +++++++
>  kernel/autotune/akt.c |   80 ++++++++++++++++++++++++++++++++++++++------------
>  2 files changed, 73 insertions(+), 19 deletions(-)
> 
> +/*
> + * FUNCTION:    This routine gets the actual auto_tune structure for the
> + *              tunables that are per namespace (presently only ipc ones).
> + *
> + * RETURN VALUE: pointer to the tunable structure for the current namespace
> + */

Please use kernel-doc format for function comment blocks.
(see Documentation/kernel-doc-nano-HOWTO.txt)

> +static struct auto_tune *get_ns_tunable(struct auto_tune *p)
> +{
> +	if (p->flags & TUNABLE_IPC_NS) {
> +		char *shift = (char *) p;
> +		struct ipc_namespace *ns = current->nsproxy->ipc_ns;
> +
> +		shift = (shift - (char *) &init_ipc_ns) + (char *) ns;
> +
> +		return (struct auto_tune *) shift;
> +	}
> +
> +	return p;
> +}
> +
> +
>  EXPORT_SYMBOL_GPL(register_tunable);
>  EXPORT_SYMBOL_GPL(unregister_tunable);

and put EXPORT_SYMBOL/_GPL() immediately after each function
that is being exported.

---
~Randy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 1/6] Tunable structure and registration routines
  2007-01-16  6:15 ` [RFC][PATCH 1/6] Tunable structure and registration routines Nadia.Derbey
@ 2007-01-25  0:32   ` Randy Dunlap
  2007-01-25 16:26     ` Nadia Derbey
  0 siblings, 1 reply; 25+ messages in thread
From: Randy Dunlap @ 2007-01-25  0:32 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: linux-kernel

On Tue, 16 Jan 2007 07:15:17 +0100 Nadia.Derbey@bull.net wrote:

> [PATCH 01/06]
> 
> Defines the auto_tune structure: this is the structure that contains the
> information needed by the adjustment routine for a given tunable.
> Also defines the registration routines.
> 
> The fork kernel component defines a tunable structure for the threads-max
> tunable and registers it.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> ---
>  Documentation/00-INDEX      |    2 
>  Documentation/auto_tune.txt |  333 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/Kconfig                  |    2 
>  include/linux/akt.h         |  186 ++++++++++++++++++++++++
>  include/linux/akt_ops.h     |  186 ++++++++++++++++++++++++
>  init/main.c                 |    2 
>  kernel/Makefile             |    1 
>  kernel/autotune/Kconfig     |   30 +++
>  kernel/autotune/Makefile    |    7 
>  kernel/autotune/akt.c       |  123 ++++++++++++++++
>  kernel/fork.c               |   18 ++
>  11 files changed, 890 insertions(+)
> 
> Index: linux-2.6.20-rc4/Documentation/auto_tune.txt
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.20-rc4/Documentation/auto_tune.txt	2007-01-15 14:19:18.000000000 +0100
> @@ -0,0 +1,333 @@
> +			Automatic Kernel Tunables
> +                        =========================
> +
> +		   Nadia Derbey (Nadia.Derbey@bull.net)
> +
> +
> +
> +This feature aims at making the kernel automatically change the tunables
> +values as it sees resources running out.
> +
> +The AKT framework is made of 2 parts:
> +
> +1) Kernel part:
> +Interfaces are provided to the kernel subsystems, to (un)register the
> +tunables that might be automatically tuned in the future.
> +
> +Registering a tunable consists in the following steps:

                                 s/in/of/

> +- a structure is declared and filled by the kernel subsystem for the
> +registered tunable
> +- that tunable structure is registered into sysfs
> +
> +Registration should be done during the kernel subsystem initialization step.

...

> +Any kernel subsystem that has registered a tunable should call
> +auto_tune_func() as follows:
> +
> ++-------------------------+--------------------------------------------+
> +| Step                    | Routine to call                            |
> ++-------------------------+--------------------------------------------+
> +| Declaration phase       | DEFINE_TUNABLE(name, values...);           |
> ++-------------------------+--------------------------------------------+
> +| Initialization routine  | set_tunable_min_max(name, min, max);       |
> +|                         | set_autotuning_routine(name, routine);     |
> +|                         | register_tunable(&name);                   |
> +| Note: the 1st 2 calls   |                                            |
> +|       are optional      |                                            |
> ++-------------------------+--------------------------------------------+
> +| Alloc                   | activate_auto_tuning(AKT_UP, &name);       |
> ++-------------------------+--------------------------------------------+
> +| Free                    | activate_auto_tuning(AKT_DOWN, &name);     |

So does Free always use AKT_DOWN?  why does it matter?
Seems unneeded and inconsistent.
How does one activate a tunable for downward adjustment?

> ++-------------------------+--------------------------------------------+
> +| module_exit() routine   | unregister_tunable(&name);                 |
> ++-------------------------+--------------------------------------------+
> +
> +activate_auto_tuning is a static inline defined in akt.h, that does the
> +following:
> +. if <tunable is registered> and <auto tuning is allowd for tunable>

                                                    allowed

> +.   call the routine stored in tunable->auto_tune
> +
> +
> +The effect of the default automatic tuning routine is the following:
> +
> +           +----------------------------------------------------------------+
> +           |                 Tunable automatically adjustable               |
> +           +---------------+------------------------------------------------+
> +           |      NO       |                      YES                       |
> ++----------+---------------+------------------------------------------------+
> +| AKT_UP   | No effect     | If the tunable value exceeds the specified     |
> +|          |               | threshold, that value is increased up to a     |
> +|          |               | maximum value.                                 |
> +|          |               | The maximum value is specified during the      |
> +|          |               | tunable declaration and can be changed at any  |
> +|          |               | time through sysfs                             |
> ++----------+---------------+------------------------------------------------+
> +| AKT_DOWN | No effect     | If the tunable value falls under the specified |
> +|          |               | threshold, that value is decreased down to a   |
> +|          |               | minimum value.                                 |
> +|          |               | The minimum value is specified during the      |
> +|          |               | tunable declaration and can be changed at any  |
> +|          |               | time through sysfs                             |
> ++----------+---------------+------------------------------------------------+
> +
> +
> +1.6. Default automatic adjustment routine
> +
> +The last service provided by AKT at the kernel level is the default automatic
> +adjustment routine. As seen, above, this routine supports various tunables
> +types. It works as follows (only the AKT_UP direction is described here -
> +AKT_DOWN does the reverse operation):
> +
> +The 2nd parameter passed in to this routine is a pointer to a previously
> +registerd tunable structure. That structure contains the following fields (see
   registered

> +1.1 for the detailed description):
> +- threshold
> +- key
> +- min
> +- max
> +- tunable
> +- checked
> +
> +When this routine is entered, it does the following:
> +1. <*checked> is compared to <*tunable> * threshold
> +2. if <*checked> is greater, <*tunable> is set to:
> +	<*tunable> + (<*tunable> * (100 - threshold) / 100)
> +
> +
> +
> +1.6) akt and sysfs:
> +
...

> +
> +1.7) tunables that are namespace dependent
> +
...

> +
> +1.7.2) Initializing the tunable structure
> +
> +Then the tunable structure should be initialized by calling the following
> +routine:
> +
> +init_tunable_ipcns(namespace_ptr, structure_name, threshold, min, max,
> +		tunable_variable_ptr, checked_variable_ptr,
> +		tunable_variable_type);
> +
> +Parameters:
> +- namespace_ptr: pointer to the namespace the tunable belongs to.
> +
> +See DEFINE_TUNABLE for the other parameters

end with a period/full-stop '.'.

> +
> +1.7.3) Registering the tunable structure
> +
...

> +
> +2) User part:
> +
> +As seen above, the only way to activate automatic tuning is from user side:
> +- the directory /sys/tunables is created during the init phase.
> +- each time a tunable is registered by a kernel subsystem, a directory is
> +created for it under /sys/tunables.
> +- This directory contains 1 file for each tunable kobject attribute:

Please try to limit text documentation to 80 columns or less.

> ++-----------+---------------+-------------------+----------------------------+
> +| attribute | default value | how to set it     | effect                     |
> ++-----------+---------------+-------------------+----------------------------+
> +| autotune  | 0             | echo 1 > autotune | makes the tunable automatic|
> +|           |               | echo 0 > autotune | makes the tunable manual   |
> ++-----------+---------------+-------------------+----------------------------+
> +| max       | max value set | echo <M> > max    | sets the tunable max value |
> +|           | during tunable|                   | to <M>                     |
> +|           | definition    |                   |                            |
> ++-----------+---------------+-------------------+----------------------------+
> +| min       | min value set | echo <m> > min    | sets the tunable min value |
> +|           | during tunable|                   | to <m>                     |
> +|           | definition    |                   |                            |
> ++-----------+---------------+-------------------+----------------------------+
> +
> Index: linux-2.6.20-rc4/fs/Kconfig
> ===================================================================
> --- linux-2.6.20-rc4.orig/fs/Kconfig	2007-01-15 13:08:14.000000000 +0100
> +++ linux-2.6.20-rc4/fs/Kconfig	2007-01-15 14:20:20.000000000 +0100
> @@ -925,6 +925,8 @@ config PROC_KCORE
>  	bool "/proc/kcore support" if !ARM
>  	depends on PROC_FS && MMU
>  
> +source "kernel/autotune/Kconfig"

Why is that is the File systems menu?  Seems odd to me
for it to be there.  If it's just because it depends on
PROC_FS and SYSFS, then it should just go completely after
the File systems menu.

>  config PROC_VMCORE
>          bool "/proc/vmcore support (EXPERIMENTAL)"
>          depends on PROC_FS && EXPERIMENTAL && CRASH_DUMP
> Index: linux-2.6.20-rc4/include/linux/akt.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.20-rc4/include/linux/akt.h	2007-01-15 14:26:24.000000000 +0100
> @@ -0,0 +1,186 @@
> +
> +#ifndef AKT_H
> +#define AKT_H
> +
> +#include <linux/types.h>
> +#include <linux/kobject.h>
> +
> +/*
> + * First parameter passed to the adjustment routine
> + */
> +#define AKT_UP   0   /* adjustment "up" */
> +#define AKT_DOWN 1   /* adjustment "down" */
> +
> +
> +struct auto_tune {
> +	spinlock_t tunable_lck; /* serializes access to the stucture fields */
> +	auto_tune_fn auto_tune; /* auto tuning routine registered by the */
> +				/* calling kernel susbsystem. If NULL, the */
> +				/* auto tuning routine that will be called */
> +				/* is the default one that processes uints */
> +	int (*check_parms)(struct auto_tune *);	/* min / max checking */
> +						/* routine ptr: points to */
> +						/* the appropriate routine */
> +						/* depending on the */
> +						/* tunable type */
> +	const char *name;
> +	char flags;	/* Only 2 bits are meaningful: */

Make flags unsigned char so that no sign bit is needed.

> +			/* bit 0: set to 1 if the associated tunable can */
> +			/*        be automatically adjusted */
> +			/* bits 1: set to 1 if the tunable has been */
> +			/*         registered */
> +			/* bits 2-7: useless */

                                     unused ??

> +	char threshold;	/* threshold to enable the adjustment expressed as */
> +			/* a %age */
> +	struct typed_value min;	/* min value the tunable can ever reach */
> +				/* and associated show / store routines) */
> +	struct typed_value max;	/* max value the tunable can ever reach */
> +				/* and associated show / store routines) */
> +	void *tunable;	/* address of the tunable to adjust */
> +	void *checked;	/* address of the variable that is controlled by */
> +			/* the tunable. This is the calling subsystem's */
> +			/* object counter */
> +};
> +

...

> +
> +extern void fork_late_init(void);

Looks like the wrong header file for that extern.

> +#endif /* AKT_H */

> Index: linux-2.6.20-rc4/kernel/autotune/akt.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-2.6.20-rc4/kernel/autotune/akt.c	2007-01-15 14:51:54.000000000 +0100
> @@ -0,0 +1,123 @@
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/akt.h>
> +
> +
> +
> +	Too Much Whitespace.  :)
> +
> +
> +
> +/*
> + * FUNCTION:    Inserts a tunable structure into sysfs
> + *              This routine serves also as a checker for the tunable
> + *              structure fields.
> + *              This routine is called by any kernel subsystem that wants to
> + *              use akt services (automatic tunables adjustment) in the future
> + *
> + * NOTE: when calling this routine, the tunable structure should have already
> + *       been filled by defining it with DEFINE_TUNABLE()
> + *
> + * RETURN VALUE: 0: successful
> + *               <0 if failure
> + */

Please use kernel-doc format for function comment blocks.

> +int register_tunable(struct auto_tune *tun)
> +{
> +	if (tun == NULL) {
> +		printk(KERN_ERR "\tBad tunable structure pointer (NULL)\n");

	Each printk() needs something that tells that module or part
	of the kernel that it's coming from (sometimes called a prefix).
	And drop the \t (tab).  IOW, replace the tab with a prefix, e.g.:

		printk(KERN_ERR "autotune: Bad tunable structure NULL pointer\n");

> +		return -EINVAL;
> +	}
> +
> +	if (tun->threshold <= 0 || tun->threshold >= 100) {
> +		printk(KERN_ERR "\tBad threshold (%d) value "
> +			"- should be in the [1-99] interval\n",
> +			tun->threshold);

Replace \t with a prefix (and more below).

> +		return -EINVAL;
> +	}
> +
> +	if (tun->tunable == NULL) {
> +		printk(KERN_ERR "\tBad tunable pointer (NULL)\n");
> +		return -EINVAL;
> +	}
> +
> +	if (tun->checked == NULL) {
> +		printk(KERN_ERR "\tBad checked value pointer (NULL)\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Check the min / max value
> +	 */
> +	if (tun->check_parms(tun)) {
> +		printk(KERN_ERR "\tBad min / max values\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +
> +/*
> + * FUNCTION:    Removes a tunable structure from sysfs.
> + *              This routine is called by any kernel subsystem that doesn't
> + *              need the akt services anymore
> + *
> + * NOTE:  reg_tun should point to a previously registered tunable
> + *
> + * RETURN VALUE: 0: successful
> + *               <0 if failure
> + */
> +int unregister_tunable(struct auto_tune *reg_tun)
> +{
> +	if (reg_tun == NULL) {
> +		printk(KERN_ERR "\tBad tunable address (NULL)\n");
> +		return -EINVAL;
> +	}
> +
> +	spin_lock(&reg_tun->tunable_lck);
> +
> +	BUG_ON(!is_tunable_registered(reg_tun));
> +
> +	reg_tun->flags = 0;
> +
> +	spin_unlock(&reg_tun->tunable_lck);
> +
> +	return 0;
> +}
> +
> +	Too Much Whitespace....
> +
> +
> +EXPORT_SYMBOL_GPL(register_tunable);
> +EXPORT_SYMBOL_GPL(unregister_tunable);

---
~Randy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 1/6] Tunable structure and registration routines
  2007-01-25  0:32   ` Randy Dunlap
@ 2007-01-25 16:26     ` Nadia Derbey
  2007-01-25 16:34       ` Randy Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Nadia Derbey @ 2007-01-25 16:26 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel

Randy,

Thanks for reviewing the code!
My comments embedded.
I'll re-send the patches as soon as possible.

Regards,
Nadia

Randy Dunlap wrote:
> On Tue, 16 Jan 2007 07:15:17 +0100 Nadia.Derbey@bull.net wrote:
> 
> 
>>[PATCH 01/06]
>>
<snip>
> 
> 
>>+Any kernel subsystem that has registered a tunable should call
>>+auto_tune_func() as follows:
>>+
>>++-------------------------+--------------------------------------------+
>>+| Step                    | Routine to call                            |
>>++-------------------------+--------------------------------------------+
>>+| Declaration phase       | DEFINE_TUNABLE(name, values...);           |
>>++-------------------------+--------------------------------------------+
>>+| Initialization routine  | set_tunable_min_max(name, min, max);       |
>>+|                         | set_autotuning_routine(name, routine);     |
>>+|                         | register_tunable(&name);                   |
>>+| Note: the 1st 2 calls   |                                            |
>>+|       are optional      |                                            |
>>++-------------------------+--------------------------------------------+
>>+| Alloc                   | activate_auto_tuning(AKT_UP, &name);       |
>>++-------------------------+--------------------------------------------+
>>+| Free                    | activate_auto_tuning(AKT_DOWN, &name);     |
> 
> 
> So does Free always use AKT_DOWN?  why does it matter?
> Seems unneeded and inconsistent.

Tuning down is recommended in order to come back to the default tunable 
value.
I agree with you: today it has quite no effect, except on the tunable 
value. If we take the ipc's example, grow_ary() just returns if the new 
tunable value happens to be lower than the previous one.
But we can imagine, in the future, that grow_ary could deallocate the 
unused memory.
+ in that particular case, lowering the tunable value makes the 1st loop 
in ipc_addid() shorter.

> How does one activate a tunable for downward adjustment?

Actually a tunable is activated to be dynamically adjusted (whatever the 
direction).
But you are giving me an idea for a future enhancement: we can imagine a 
tunable that could be allowed to increase only (or decrease only). In 
that case, we should move the autotune sysfs attribute into an 'up' and 
a 'down' attribute?

<snip>

>>+
>>+2) User part:
>>+
>>+As seen above, the only way to activate automatic tuning is from user side:
>>+- the directory /sys/tunables is created during the init phase.
>>+- each time a tunable is registered by a kernel subsystem, a directory is
>>+created for it under /sys/tunables.
>>+- This directory contains 1 file for each tunable kobject attribute:
> 
> 
> Please try to limit text documentation to 80 columns or less.

That's exactly what I did?



<snip>

>>Index: linux-2.6.20-rc4/fs/Kconfig
>>===================================================================
>>--- linux-2.6.20-rc4.orig/fs/Kconfig	2007-01-15 13:08:14.000000000 +0100
>>+++ linux-2.6.20-rc4/fs/Kconfig	2007-01-15 14:20:20.000000000 +0100
>>@@ -925,6 +925,8 @@ config PROC_KCORE
>> 	bool "/proc/kcore support" if !ARM
>> 	depends on PROC_FS && MMU
>> 
>>+source "kernel/autotune/Kconfig"
> 
> 
> Why is that is the File systems menu?  Seems odd to me
> for it to be there.  If it's just because it depends on
> PROC_FS and SYSFS, then it should just go completely after
> the File systems menu.
> 

Since the tunables that are handled in AKT, I wanted the feature to be 
close to CONFIG_PROC_FS.
Now, I do not agree with your proposal: putting it after the FS menu 
means that it would appear in the main menu, right? I'll try to find a 
better place for it.



>>Index: linux-2.6.20-rc4/include/linux/akt.h
>>===================================================================
>>--- /dev/null	1970-01-01 00:00:00.000000000 +0000
>>+++ linux-2.6.20-rc4/include/linux/akt.h	2007-01-15 14:26:24.000000000 +0100
>>@@ -0,0 +1,186 @@
>>+


<snip>

>>+	char flags;	/* Only 2 bits are meaningful: */
> 
> 
> Make flags unsigned char so that no sign bit is needed.
> 
> 
>>+			/* bit 0: set to 1 if the associated tunable can */
>>+			/*        be automatically adjusted */
>>+			/* bits 1: set to 1 if the tunable has been */
>>+			/*         registered */
>>+			/* bits 2-7: useless */
> 
> 
>                                      unused ??

yep

<snip>

> 
> 
>>+
>>+extern void fork_late_init(void);
> 
> 
> Looks like the wrong header file for that extern.
> 
> 

Actually, I wanted the changes to the existing kernel files to be as 
small as possible. That's why everything is concentrated, whenever 
possible, in the added files.

Regards,
Nadia





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 1/6] Tunable structure and registration routines
  2007-01-25 16:26     ` Nadia Derbey
@ 2007-01-25 16:34       ` Randy Dunlap
  2007-01-25 17:01         ` Nadia Derbey
  0 siblings, 1 reply; 25+ messages in thread
From: Randy Dunlap @ 2007-01-25 16:34 UTC (permalink / raw)
  To: Nadia Derbey; +Cc: linux-kernel

On Thu, 25 Jan 2007 17:26:31 +0100 Nadia Derbey wrote:

> Randy,
> 
> Thanks for reviewing the code!
> My comments embedded.
> I'll re-send the patches as soon as possible.

OK, thanks.


> Randy Dunlap wrote:
> > On Tue, 16 Jan 2007 07:15:17 +0100 Nadia.Derbey@bull.net wrote:
> > 
> > 
> >>[PATCH 01/06]
> >>
> <snip>
> > 
> > 
> >>+Any kernel subsystem that has registered a tunable should call
> >>+auto_tune_func() as follows:
> >>+
> >>++-------------------------+--------------------------------------------+
> >>+| Step                    | Routine to call                            |
> >>++-------------------------+--------------------------------------------+
> >>+| Declaration phase       | DEFINE_TUNABLE(name, values...);           |
> >>++-------------------------+--------------------------------------------+
> >>+| Initialization routine  | set_tunable_min_max(name, min, max);       |
> >>+|                         | set_autotuning_routine(name, routine);     |
> >>+|                         | register_tunable(&name);                   |
> >>+| Note: the 1st 2 calls   |                                            |
> >>+|       are optional      |                                            |
> >>++-------------------------+--------------------------------------------+
> >>+| Alloc                   | activate_auto_tuning(AKT_UP, &name);       |
> >>++-------------------------+--------------------------------------------+
> >>+| Free                    | activate_auto_tuning(AKT_DOWN, &name);     |
> > 
> > 
> > So does Free always use AKT_DOWN?  why does it matter?
> > Seems unneeded and inconsistent.
> 
> Tuning down is recommended in order to come back to the default tunable 
> value.

Let me try to be clearer.  What is Alloc?  and why is AKT_UP
associated with Alloc and AFK_DOWN associated with Free (whatever
that means)?


> I agree with you: today it has quite no effect, except on the tunable 
> value. If we take the ipc's example, grow_ary() just returns if the new 
> tunable value happens to be lower than the previous one.
> But we can imagine, in the future, that grow_ary could deallocate the 
> unused memory.
> + in that particular case, lowering the tunable value makes the 1st loop 
> in ipc_addid() shorter.
> 
> > How does one activate a tunable for downward adjustment?
> 
> Actually a tunable is activated to be dynamically adjusted (whatever the 
> direction).
> But you are giving me an idea for a future enhancement: we can imagine a 
> tunable that could be allowed to increase only (or decrease only). In 
> that case, we should move the autotune sysfs attribute into an 'up' and 
> a 'down' attribute?

Couldn't the tunable owner just adjust the min value to a new
(larger) min value, e.g.?


> >>+extern void fork_late_init(void);
> > 
> > 
> > Looks like the wrong header file for that extern.
> > 
> > 
> 
> Actually, I wanted the changes to the existing kernel files to be as 
> small as possible. That's why everything is concentrated, whenever 
> possible, in the added files.

I suppose that's OK for review, but it shouldn't be merged that way.

---
~Randy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 4/6] min and max kobjects
  2007-01-24 22:41   ` Randy Dunlap
@ 2007-01-25 16:34     ` Nadia Derbey
  0 siblings, 0 replies; 25+ messages in thread
From: Nadia Derbey @ 2007-01-25 16:34 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel

Randy Dunlap wrote:
> On Tue, 16 Jan 2007 07:15:20 +0100 Nadia.Derbey@bull.net wrote:
> 
> 
>>[PATCH 04/06]
>>
>>Introduces the kobjects associated to each tunable min and max value
>>

<snip>

>>Index: linux-2.6.20-rc4/kernel/autotune/akt.c
>>===================================================================
>>--- linux-2.6.20-rc4.orig/kernel/autotune/akt.c	2007-01-15 15:13:31.000000000 +0100
>>+++ linux-2.6.20-rc4/kernel/autotune/akt.c	2007-01-15 15:25:35.000000000 +0100
>>@@ -203,5 +207,121 @@ ssize_t store_tuning_mode(struct auto_tu
>> }
>> 
>> 

<snip>

>>+
>>+/*
>>+ * FUNCTION:    Set operation called by tunable_attr_store (i.e. when a
>>+ *              string is stored into /sys/tunables/<tunable>/min).
>>+ *
>>+ * PARAMETERS:  count: input buffer size (including the '\0')
>>+ *
>>+ * RETURN VALUE: >0: number of characters used from the input buffer
>>+ *               <= 0: failure
> 
> 
> I would expect a return value of 0 not to indicate failure;
> only <0 should do that.  So is this a typo or a real case where
> a return of 0 indicates failure?

This is a typo

> 
> 
>>+ */
>>+ssize_t store_tunable_min(struct auto_tune *tun_addr, const char *buf,
>>+			size_t count)
>>+{


<snip>

>>+/*
>>+ * FUNCTION:    Set operation called by tunable_attr_store (i.e. when a
>>+ *              string is stored into /sys/tunables/<tunable>/max).
>>+ *
>>+ * PARAMETERS:  count: input buffer size (including the '\0')
>>+ *
>>+ * RETURN VALUE: >0: number of characters used from the input buffer
>>+ *               <= 0: failure
> 
> 
> same question.

Same answer ;-)

> 
> 


Regards,
Nadia


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 1/6] Tunable structure and registration routines
  2007-01-25 16:34       ` Randy Dunlap
@ 2007-01-25 17:01         ` Nadia Derbey
  0 siblings, 0 replies; 25+ messages in thread
From: Nadia Derbey @ 2007-01-25 17:01 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel

Randy Dunlap wrote:
> On Thu, 25 Jan 2007 17:26:31 +0100 Nadia Derbey wrote:
>>>>+Any kernel subsystem that has registered a tunable should call
>>>>+auto_tune_func() as follows:
>>>>+
>>>>++-------------------------+--------------------------------------------+
>>>>+| Step                    | Routine to call                            |
>>>>++-------------------------+--------------------------------------------+
>>>>+| Declaration phase       | DEFINE_TUNABLE(name, values...);           |
>>>>++-------------------------+--------------------------------------------+
>>>>+| Initialization routine  | set_tunable_min_max(name, min, max);       |
>>>>+|                         | set_autotuning_routine(name, routine);     |
>>>>+|                         | register_tunable(&name);                   |
>>>>+| Note: the 1st 2 calls   |                                            |
>>>>+|       are optional      |                                            |
>>>>++-------------------------+--------------------------------------------+
>>>>+| Alloc                   | activate_auto_tuning(AKT_UP, &name);       |
>>>>++-------------------------+--------------------------------------------+
>>>>+| Free                    | activate_auto_tuning(AKT_DOWN, &name);     |
>>>
>>>
>>>So does Free always use AKT_DOWN?  why does it matter?
>>>Seems unneeded and inconsistent.
>>
>>Tuning down is recommended in order to come back to the default tunable 
>>value.
> 
> 
> Let me try to be clearer.  What is Alloc?  and why is AKT_UP
> associated with Alloc and AFK_DOWN associated with Free (whatever
> that means)?

Alloc stands for resource allocation: in a subsystem where resource 
allocation depends on a tunable value, we should tune up that value 
prior to the alllocation itself. Let's come back to the ipc subsystem 
example: ipc_addid() is the routine that adds an entry to an ipc array. 
The 1st thing it does (via grow_ary()) is to allocate some more space 
for the ipc array if needed, i.e. if the ipc tunable value has 
increased. That's why the tunable should be tuned up before calling 
ipc_addid().

AKT_DOWN is the reverse operation: we are freeing resources, so the 
tunble has no reason to remain with a big value.

> 
> 
> 
>>I agree with you: today it has quite no effect, except on the tunable 
>>value. If we take the ipc's example, grow_ary() just returns if the new 
>>tunable value happens to be lower than the previous one.
>>But we can imagine, in the future, that grow_ary could deallocate the 
>>unused memory.
>>+ in that particular case, lowering the tunable value makes the 1st loop 
>>in ipc_addid() shorter.
>>
>>
>>>How does one activate a tunable for downward adjustment?
>>
>>Actually a tunable is activated to be dynamically adjusted (whatever the 
>>direction).
>>But you are giving me an idea for a future enhancement: we can imagine a 
>>tunable that could be allowed to increase only (or decrease only). In 
>>that case, we should move the autotune sysfs attribute into an 'up' and 
>>a 'down' attribute?
> 
> 
> Couldn't the tunable owner just adjust the min value to a new
> (larger) min value, e.g.?

You're completely right: setting the min value to the default one should 
be enough!

> 
> 
> 
>>>>+extern void fork_late_init(void);
>>>
>>>
>>>Looks like the wrong header file for that extern.
>>>
>>>
>>
>>Actually, I wanted the changes to the existing kernel files to be as 
>>small as possible. That's why everything is concentrated, whenever 
>>possible, in the added files.
> 
> 
> I suppose that's OK for review, but it shouldn't be merged that way.
> 
> ---
> ~Randy
> 


Regards,
Nadia


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-01-23 14:40     ` Nadia Derbey
@ 2007-02-07 21:18       ` Eric W. Biederman
  2007-02-09 12:27         ` Nadia Derbey
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2007-02-07 21:18 UTC (permalink / raw)
  To: Nadia Derbey; +Cc: Andrew Morton, linux-kernel

Nadia Derbey <Nadia.Derbey@bull.net> writes:

>
> 2) why autotuning:
> There are at least 3 cases where it can be useful
> . for workloads that are known to need a big amount of a given resource type
> (say shared memories), but we don't know what the maximum amount needed will be
> . to solve the case of multiple applications running on a single system, and
> that need the same tunable to be adjusted to feet their needs
> . to make a system correctly react to eventual peak loads for a given resource
> usage, i.e. make it tune up *and down* as needed.

>
> In all these cases, the akt framework will enable the kernel to adapt to
> increasing / decreasing resource consumption:
> 1) avoid allocating "a priori" a big amount of memory that will be used only in
> extreme cases. This is the effect of doing an "echo <huge_value>
>> /proc/sys/kernel/shmmni"
>
> 2) the system will come back to the default values as soon as the peak load is
> over.

At least the ipc ones are supposed to be DOS limits not behavior
modifiers.  I do admit from looking at the code that there are some
consequences of increasing things like shmmni.  However I think we
would be better off with  better data structures and implementations
that remove these consequences than this autotuning of
denial-of-service limits.

i.e. I think you are treating the symptom not the problem.

Does this make sense?

Eric


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-02-07 21:18       ` Eric W. Biederman
@ 2007-02-09 12:27         ` Nadia Derbey
  2007-02-09 18:35           ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Nadia Derbey @ 2007-02-09 12:27 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel

Eric W. Biederman wrote:
> Nadia Derbey <Nadia.Derbey@bull.net> writes:
> 
> 
>>2) why autotuning:
>>There are at least 3 cases where it can be useful
>>. for workloads that are known to need a big amount of a given resource type
>>(say shared memories), but we don't know what the maximum amount needed will be
>>. to solve the case of multiple applications running on a single system, and
>>that need the same tunable to be adjusted to feet their needs
>>. to make a system correctly react to eventual peak loads for a given resource
>>usage, i.e. make it tune up *and down* as needed.
> 
> 
>>In all these cases, the akt framework will enable the kernel to adapt to
>>increasing / decreasing resource consumption:
>>1) avoid allocating "a priori" a big amount of memory that will be used only in
>>extreme cases. This is the effect of doing an "echo <huge_value>
>>
>>>/proc/sys/kernel/shmmni"
>>
>>2) the system will come back to the default values as soon as the peak load is
>>over.
> 
> 
> At least the ipc ones are supposed to be DOS limits not behavior
> modifiers.  I do admit from looking at the code that there are some
> consequences of increasing things like shmmni.  However I think we
> would be better off with  better data structures and implementations
> that remove these consequences than this autotuning of
> denial-of-service limits.
> 

I do not fully agree with you:
It is true that some ipc tunables play the role of DoS limits.
But IMHO the *mni ones (semmni, msgmni, shmmni) are used by the ipc 
subsystem to adapt its data structures sizes to what is being asked for 
through the tunable value. I think this is how they manage to take into 
account a new tunable value without a need for rebooting the system: 
reallocate some more memory on demand.

Now, what the akt framework does, is that it takes advantage of this 
concept of "on demand memory allocation" to replace a user (or a daemon) 
that would periodically check its ipcs consumptions and manually adjust 
the ipcs tunables: Doing this from the user space would imply a latency 
that makes it difficult to react fast enough to resources running out.

Now, talking about DoS limits, akt implements them in a sense: each 
tunable managed by akt has 3 attributes exported to sysfs:
. autotune: enable / disable auto-tuning
. min: min value the tunable can ever reach
. max: max value the tunable can ever reach

Enabling a sysadmin to play with these min and max values makes it 
possible to refine the dynamic adjustment, and avoid that the tunable 
reaches really huge values.

Regards,
Nadia


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-02-09 12:27         ` Nadia Derbey
@ 2007-02-09 18:35           ` Eric W. Biederman
  2007-02-13  9:06             ` Nadia Derbey
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2007-02-09 18:35 UTC (permalink / raw)
  To: Nadia Derbey; +Cc: Andrew Morton, linux-kernel

Nadia Derbey <Nadia.Derbey@bull.net> writes:

> I do not fully agree with you:
> It is true that some ipc tunables play the role of DoS limits.
> But IMHO the *mni ones (semmni, msgmni, shmmni) are used by the ipc subsystem to
> adapt its data structures sizes to what is being asked for through the tunable
> value. I think this is how they manage to take into account a new tunable value
> without a need for rebooting the system: reallocate some more memory on demand.

Yes, they do.  However if you are constantly having to play with shmmni or
the others that is the problem and the array should be replaced with
a hash table or some form of radix tree, so it changes it's size to fit
the need.  Once that is done, shmmni does become a simple DOS limit.

So what I'm asking is please fix the problem at the source don't plaster over
it.

> Now, what the akt framework does, is that it takes advantage of this concept of
> "on demand memory allocation" to replace a user (or a daemon) that would
> periodically check its ipcs consumptions and manually adjust the ipcs tunables:
> Doing this from the user space would imply a latency that makes it difficult to
> react fast enough to resources running out.

There may be some sense in this but you haven't found something that inherently
needs tuning.  You have found something that has a poor data structure,
and can more easily be fixed by simply fixing the data structure.

I'm guessing that we have a disconnect somewhere with kernel developers thinking
shm is an old legacy api and doing minimal maintenance, expecting serious users
to use tmpfs or hugetlbfs and users not used to the old stuff using the SYSV apis.

If we have serious users it makes sense to fix these things properly, in a backwards
compatible way, so existing users and applications don't need to be changed.


> Now, talking about DoS limits, akt implements them in a sense: each tunable
> managed by akt has 3 attributes exported to sysfs:
> . autotune: enable / disable auto-tuning
> . min: min value the tunable can ever reach
> . max: max value the tunable can ever reach
>
> Enabling a sysadmin to play with these min and max values makes it possible to
> refine the dynamic adjustment, and avoid that the tunable reaches really huge
> values.

This just shifts the location where you have your DOS limit and could
be done transparently under the covers with shmmni being the maximum.
If we can't get users to switch to something that doesn't need tuning
that has been available for years, I doubt even more user tunables
that tune the tunables will make the situation any better.  I suspect
your changes would just confuse the landscape even more and give us
more weird legacy cases to support that we can never get rid of?

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-02-09 18:35           ` Eric W. Biederman
@ 2007-02-13  9:06             ` Nadia Derbey
  2007-02-13 10:10               ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Nadia Derbey @ 2007-02-13  9:06 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel

Eric W. Biederman wrote:
> Nadia Derbey <Nadia.Derbey@bull.net> writes:
> 
> 
>>I do not fully agree with you:
>>It is true that some ipc tunables play the role of DoS limits.
>>But IMHO the *mni ones (semmni, msgmni, shmmni) are used by the ipc subsystem to
>>adapt its data structures sizes to what is being asked for through the tunable
>>value. I think this is how they manage to take into account a new tunable value
>>without a need for rebooting the system: reallocate some more memory on demand.
> 
> 
> Yes, they do.  However if you are constantly having to play with shmmni or
> the others that is the problem and the array should be replaced with
> a hash table or some form of radix tree, so it changes it's size to fit
> the need.  Once that is done, shmmni does become a simple DOS limit.
> 
> So what I'm asking is please fix the problem at the source don't plaster over
> it.
> 
> 
>>Now, what the akt framework does, is that it takes advantage of this concept of
>>"on demand memory allocation" to replace a user (or a daemon) that would
>>periodically check its ipcs consumptions and manually adjust the ipcs tunables:
>>Doing this from the user space would imply a latency that makes it difficult to
>>react fast enough to resources running out.
> 
> 
> There may be some sense in this but you haven't found something that inherently
> needs tuning.  You have found something that has a poor data structure,
> and can more easily be fixed by simply fixing the data structure.

So, should I understand from this that automatic tuning and the AKT 
framework itself would make sense, given that I find the rigth tunables 
it should be applied to?
Actually, dont' know if you had the opportunity to read all the patches, 
but there are 2 other tunables AKT is proposed to be applied to:
. max_threads, the tunable limit on nr_threads
. max_files, the tunable limit on nr_files

Regards,
Nadia


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-02-13  9:06             ` Nadia Derbey
@ 2007-02-13 10:10               ` Eric W. Biederman
  2007-02-15  7:07                 ` Nadia Derbey
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2007-02-13 10:10 UTC (permalink / raw)
  To: Nadia Derbey; +Cc: Andrew Morton, linux-kernel

Nadia Derbey <Nadia.Derbey@bull.net> writes:

> So, should I understand from this that automatic tuning and the AKT framework
> itself would make sense, given that I find the rigth tunables it should be
> applied to?

Sort of.  The concept of things tuning themselves automatically makes
a lot of sense.

I'm not at all certain about tunables being exported just to be hidden
again.  Ideally you don't even want the fact that these things are
varying visible to the user.

So I think that if you can find a good example that cannot be solved
better another way, you can build a case for your framework.
Currently I am doubt you can find such a case.

> Actually, dont' know if you had the opportunity to read all the patches, but
> there are 2 other tunables AKT is proposed to be applied to:
> . max_threads, the tunable limit on nr_threads
> . max_files, the tunable limit on nr_files

At a quick glance max_threads and max_files appear even more to be
DOS limits and not tunables and even less applicable to needing any
tuning at all.  My gut feel is at worst these values may need a little
better boot time defaults but otherwise they the should be good.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-02-13 10:10               ` Eric W. Biederman
@ 2007-02-15  7:07                 ` Nadia Derbey
  2007-02-15  7:49                   ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Nadia Derbey @ 2007-02-15  7:07 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel

Eric W. Biederman wrote:
> Nadia Derbey <Nadia.Derbey@bull.net> writes:
> 
> 
>>So, should I understand from this that automatic tuning and the AKT framework
>>itself would make sense, given that I find the rigth tunables it should be
>>applied to?
> 
> 
> Sort of.  The concept of things tuning themselves automatically makes
> a lot of sense.
> 
> I'm not at all certain about tunables being exported just to be hidden
> again.  Ideally you don't even want the fact that these things are
> varying visible to the user.
> 
> So I think that if you can find a good example that cannot be solved
> better another way, you can build a case for your framework.
> Currently I am doubt you can find such a case.
> 
> 
>>Actually, dont' know if you had the opportunity to read all the patches, but
>>there are 2 other tunables AKT is proposed to be applied to:
>>. max_threads, the tunable limit on nr_threads
>>. max_files, the tunable limit on nr_files
> 
> 
> At a quick glance max_threads and max_files appear even more to be
> DOS limits and not tunables and even less applicable to needing any
> tuning at all.  My gut feel is at worst these values may need a little
> better boot time defaults but otherwise they the should be good.
> 
But, what do you do with Oracle that's asking maxfiles to be set to 
0x10000, while the default value might be enough for a system that's not 
running Oracle.
I'm afraid that giving boot time values to the max_* tunables we will 
loose all the benefits from /proc (or /sys): it is impossible to 
anticipate what an OS will be used for. So allowing such things to be 
changed without having to reboot the machine is in my mind quite a 
powerful feature we should keep taking adavntage of.

Regards,
Nadia

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-02-15  7:07                 ` Nadia Derbey
@ 2007-02-15  7:49                   ` Eric W. Biederman
  2007-02-15  8:25                     ` Nadia Derbey
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2007-02-15  7:49 UTC (permalink / raw)
  To: Nadia Derbey; +Cc: Andrew Morton, linux-kernel

Nadia Derbey <Nadia.Derbey@bull.net> writes:

> But, what do you do with Oracle that's asking maxfiles to be set to 0x10000,
> while the default value might be enough for a system that's not running Oracle.
> I'm afraid that giving boot time values to the max_* tunables we will loose all
> the benefits from /proc (or /sys): it is impossible to anticipate what an OS
> will be used for. So allowing such things to be changed without having to reboot
> the machine is in my mind quite a powerful feature we should keep taking
> adavntage of.

I'm not saying remove user spaces' ability to set the
denial-of-service limits.  I'm saying if they need to be frequently
changed we need to update the default so they are higher by default.

There really is no cost in moving those values up and down  it is just
an arbitrary integer used in comparisons.  But if we can make a good
guess that still catches runaway programs before they kill the machine
but also allows more programs to work out of the box we are in better
shape.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components
  2007-02-15  7:49                   ` Eric W. Biederman
@ 2007-02-15  8:25                     ` Nadia Derbey
  0 siblings, 0 replies; 25+ messages in thread
From: Nadia Derbey @ 2007-02-15  8:25 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Andrew Morton, linux-kernel

Eric W. Biederman wrote:
> Nadia Derbey <Nadia.Derbey@bull.net> writes:
> 
> 
>>But, what do you do with Oracle that's asking maxfiles to be set to 0x10000,
>>while the default value might be enough for a system that's not running Oracle.
>>I'm afraid that giving boot time values to the max_* tunables we will loose all
>>the benefits from /proc (or /sys): it is impossible to anticipate what an OS
>>will be used for. So allowing such things to be changed without having to reboot
>>the machine is in my mind quite a powerful feature we should keep taking
>>adavntage of.
> 
> 
> I'm not saying remove user spaces' ability to set the
> denial-of-service limits.  I'm saying if they need to be frequently
> changed we need to update the default so they are higher by default.
> 
> There really is no cost in moving those values up and down  it is just
> an arbitrary integer used in comparisons.  But if we can make a good
> guess that still catches runaway programs before they kill the machine
> but also allows more programs to work out of the box we are in better
> shape.
> 
OK, happy to see we are on the same wavelength (and sorry for 
misunderstanding what you were saying ;-) )

Regards,
Nadia

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC][PATCH 6/6] automatic tuning applied to some kernel components 
@ 2007-02-14 13:56 Al Boldi
  0 siblings, 0 replies; 25+ messages in thread
From: Al Boldi @ 2007-02-14 13:56 UTC (permalink / raw)
  To: linux-kernel

ebiederm wrote:
> At a quick glance max_threads and max_files appear even more to be
> DOS limits and not tunables and even less applicable to needing any
> tuning at all.  My gut feel is at worst these values may need a little
> better boot time defaults but otherwise they the should be good.

Autotuning max_threads and max_files by using some sort of rate-limiter could 
possibly be more useful than any kind of fixed default.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-02-15  8:22 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-16  6:15 [RFC][PATCH 0/6] Automatice kernel tunables (AKT) Nadia.Derbey
2007-01-16  6:15 ` [RFC][PATCH 1/6] Tunable structure and registration routines Nadia.Derbey
2007-01-25  0:32   ` Randy Dunlap
2007-01-25 16:26     ` Nadia Derbey
2007-01-25 16:34       ` Randy Dunlap
2007-01-25 17:01         ` Nadia Derbey
2007-01-16  6:15 ` [RFC][PATCH 2/6] auto_tuning activation Nadia.Derbey
2007-01-16  6:15 ` [RFC][PATCH 3/6] tunables associated kobjects Nadia.Derbey
2007-01-16  6:15 ` [RFC][PATCH 4/6] min and max kobjects Nadia.Derbey
2007-01-24 22:41   ` Randy Dunlap
2007-01-25 16:34     ` Nadia Derbey
2007-01-16  6:15 ` [RFC][PATCH 5/6] per namespace tunables Nadia.Derbey
2007-01-24 22:41   ` Randy Dunlap
2007-01-16  6:15 ` [RFC][PATCH 6/6] automatic tuning applied to some kernel components Nadia.Derbey
2007-01-22 19:56   ` Andrew Morton
2007-01-23 14:40     ` Nadia Derbey
2007-02-07 21:18       ` Eric W. Biederman
2007-02-09 12:27         ` Nadia Derbey
2007-02-09 18:35           ` Eric W. Biederman
2007-02-13  9:06             ` Nadia Derbey
2007-02-13 10:10               ` Eric W. Biederman
2007-02-15  7:07                 ` Nadia Derbey
2007-02-15  7:49                   ` Eric W. Biederman
2007-02-15  8:25                     ` Nadia Derbey
2007-02-14 13:56 Al Boldi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).