LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
@ 2004-04-26 22:04 Erik Jacobson
  2004-04-26 23:39 ` Chris Wright
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Erik Jacobson @ 2004-04-26 22:04 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1517 bytes --]

Here, I am proposing Process Aggregates support for the 2.6 kernel.

What is Process Aggregates (PAGG)?
----------------------------------
PAGG provides for the implementation of arbitrary process groups in Linux.
It is a building block for kernel modules that can group processes
together into a single set for specific purposes beyond the traditional
process groups.


What types of things could make use of PAGG?
--------------------------------------------
Some example uses for PAGG include:

 - System accounting - the CSA module makes use of it (see below)

 - Clustering - Keeping track of processes for clustering

 - NUMA placement - Making sure groups of processes run on specific CPUs or
     chunks of memory

 - Batch Queuing systems - Ensure specific jobs run in their designated
     places.


More information about PAGG
---------------------------
There is a web page for the PAGG project at SGI:

http://oss.sgi.com/projects/pagg/

Also, the patch includes the file Documentation/pagg.txt.  You may look there
for detailed information.

One project that makes extensive use of PAGG is Comprehensive System
Accounting (CSA).  Details on that can be found here:
http://oss.sgi.com/projects/csa/


Info on the attached patch
--------------------------
This patch is made to apply to the 2.6.5 kernel.  Patches for some other
versions of the kernel (including 2.4 and 2.6 versions) may be found on
the PAGG web site above.

--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

[-- Attachment #2: Type: TEXT/PLAIN, Size: 34460 bytes --]

diff -Naru 2.6-patch/Documentation/pagg.txt 2.6pagg-patch/Documentation/pagg.txt
--- 2.6-patch/Documentation/pagg.txt	1969-12-31 18:00:00.000000000 -0600
+++ 2.6pagg-patch/Documentation/pagg.txt	2004-04-26 14:36:05.000000000 -0500
@@ -0,0 +1,162 @@
+Linux Process Aggregates (PAGG)
+-------------------------------
+
+1. Description
+
+The process aggregates infrastructure, or PAGG, provides a generalized
+mechanism for providing arbitrary process groups in Linux.  PAGG consists
+of a series of functions for registering and unregistering support
+for new types of process aggregation containers with the kernel.
+This is similar to the support currently provided within Linux that
+allows for dynamic support of filesystems, block and character devices,
+symbol tables, network devices, serial devices, and execution domains.
+This implementation of PAGG provides developers the basic hooks necessary
+to implement kernel modules for specific process containers, such as
+the job container.
+
+The do_fork function in the kernel was altered to support PAGG.  If a
+process is attached to any PAGG containers and subsequently forks a
+child process, the child process will also be attached to the same PAGG
+containers.  The PAGG containers involved during the fork are notified
+that a new process has been attached.  The notification is accomplished
+via a callback function provided by the PAGG module.
+
+The do_exit function in the kernel has also been altered.  If a process
+is attached to any PAGG containers and that process is exiting, the PAGG
+containers are notified that a process has detached from the container.
+The notification is accomplished via a callback function provided by
+the PAGG module.
+
+The sys_execve function has been modified to support an optional callout
+that can be run when a process in a pagg list does an exec.  It can be 
+used, for example, by other kernel modules that wish to do advanced CPU
+placement on multi-processor systems (just one example).
+
+Additional details concerning this implementation of the process aggregates
+infrastructure are described in the sections that follow.
+
+
+2.  Kernel Changes
+
+This section describe the files and data strcutrues that are involved in this
+implementation of PAGG.  Both modified as well as new files and data
+structures are discussed.
+
+3.1. Modified Files
+
+The following files were modified to implement PAGG:
+
+-  include/linux/init_task.h
+-  include/linux/sched.h
+-  init/Config.help
+-  init/Config.in
+-  kernel/Makefile
+-  kernel/exit.c
+-  kernel/fork.c
+-  fs/exec.c
+-  Documentation/Configure.help
+-  init/Kconfig
+
+This implementation of PAGG supports the i386 and ia64 architecture.
+
+2.2. New Files
+
+The following files were added to implement PAGG:
+
+-  Documentation/pagg.txt
+-  include/linux/pagg.h
+-  kernel/pagg.c
+
+
+2.3. Modified Data Structures
+
+The following existing data structures were altered to implement PAGG.
+
+-  struct task_struct:          (include/linux/sched.h)
+     struct pagg_list  pagg_list;     /* List of pagg containers */
+
+This new member in task_struct, pagg_list, points to the list of pagg
+containers to which the process is currently attached.
+
+2.4. New Data Structures
+
+The following new data structures were introduced to implement PAGG.
+
+-  struct pagg:          (include/linux/pagg.h)
+     struct pagg_hook *hook		     /* Ptr to pagg module entry */
+     void 		*data;               /* Task specific data */
+     struct list_head   entry;	   	     /* List connection */	
+     
+-  struct pagg_hook:        (include/linux/pagg.h)
+     struct module *module;                  /* Ptr to PAGG module */
+     char *name;                             /* PAGG hook name - restricted
+					      * to 32 characters.  */
+     int  (*attach)(struct task_struct *, /* Function to attach */
+               struct pagg *,
+               void *);
+     int  (*detach)(struct task_struct *, /* Function to detach */
+               struct pagg *);
+     int  (*init)(struct task_struct *,   /* Load task init func. */
+		     struct pagg *);
+     void  *data;                            /* Module specific data */
+     struct list_head entry;		     /* List connection */
+     void    (*exec)(struct task_struct *, struct pagg *); /* exec func ptr */
+
+The pagg structure provides the process' reference to the PAGG
+containers provided by the PAGG modules.  The attach function pointer
+is the function used to notify the referenced PAGG container that the
+process is being attached.  The detach function pointer is used to notify
+the referenced PAGG container that the process is exiting or otherwise
+detaching from the container.  The exec function pointer is used when a
+process in the pagg container exec's a new process.  This is optional and
+may be set to NULL if it is not needed by the pagg module.
+
+The pagg_hook structure provides the reference to the module that
+implements a type of PAGG container.  In addition to the function pointers
+described concerning pagg, this structure provides an addition
+function pointer.  The init function pointer is currently not used
+but will be available in the future.  Future use of the init function
+will be optional and will used to attach currently running processes to
+a default PAGG container when a PAGG module is loaded on a running system.
+
+
+2.5. Modified Functions
+
+The following functions were changed to implement PAGG:
+
+-  do_fork:     (kernel/fork.c)
+     /* execute the following pseudocode before add to run-queue  */  
+     If parent process pagg list is not empty
+          Call attach_pagg_list function with child task_struct as argument
+-  do_exit:     (kernel/exit.c)
+     /* execute the following pseudocode prior to schedule call */
+     If current process pagg list is not empty
+               Call detach_pagg_list function with current task_struct 
+-  sys_execve:  (fs/exec.c)
+     /* When a process in a pagg exec's, an optional callout can be run.  This
+        is implemented with an optional function pointer in the pagg_hook.  */
+
+2.6 New Functions
+
+The following new functions were added to implement PAGG:
+
+-  int  register_pagg_hook(struct pagg_hook *);  (kernel/pagg.c)
+     Add module entry into table of pagg modules
+-  int unregister_pagg_hook(struct pagg_hook *); (kernel/pagg.c)
+     Find module entry in list of pagg modules
+          Foreach task
+		If task is attached to this pagg module
+			return error
+	  If no tasks are referencing this module
+		remove module entry from list of pagg modules
+-  int attach_pagg_list(struct task_struct *);       (kernel/pagg.c)
+     /* Assumed task pagg list pts to paggs that it attaches to */
+     While another pagg container reference
+          Make copy of pagg container reference & insert into new list
+          Attach task to pagg container using new container reference
+          Get next pagg container reference
+     Make task pagg list use the new pagg list
+-  int detach_pagg_list(struct task_struct *);       (kernel/pagg.c)
+     While another pagg container reference
+          Detach task from pagg container using reference
+
diff -Naru 2.6-patch/fs/exec.c 2.6pagg-patch/fs/exec.c
--- 2.6-patch/fs/exec.c	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/fs/exec.c	2004-04-26 12:23:02.000000000 -0500
@@ -46,6 +46,7 @@
 #include <linux/security.h>
 #include <linux/syscalls.h>
 #include <linux/rmap-locking.h>
+#include <linux/pagg.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
@@ -1151,6 +1152,7 @@
 	retval = search_binary_handler(&bprm,regs);
 	if (retval >= 0) {
 		free_arg_pages(&bprm);
+		exec_pagg_list_chk(current);
 
 		/* execve success */
 		security_bprm_free(&bprm);
diff -Naru 2.6-patch/include/linux/init_task.h 2.6pagg-patch/include/linux/init_task.h
--- 2.6-patch/include/linux/init_task.h	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/include/linux/init_task.h	2004-04-13 21:42:35.000000000 -0500
@@ -2,6 +2,7 @@
 #define _LINUX__INIT_TASK_H
 
 #include <linux/file.h>
+#include <linux/pagg.h>
 
 #define INIT_FILES \
 { 							\
@@ -112,6 +113,7 @@
 	.proc_lock	= SPIN_LOCK_UNLOCKED,				\
 	.switch_lock	= SPIN_LOCK_UNLOCKED,				\
 	.journal_info	= NULL,						\
+	INIT_TASK_PAGG(tsk)						\
 }
 
 
diff -Naru 2.6-patch/include/linux/pagg.h 2.6pagg-patch/include/linux/pagg.h
--- 2.6-patch/include/linux/pagg.h	1969-12-31 18:00:00.000000000 -0600
+++ 2.6pagg-patch/include/linux/pagg.h	2004-04-26 12:23:02.000000000 -0500
@@ -0,0 +1,249 @@
+/* 
+ * PAGG (Process Aggregates) interface
+ *
+ * 
+ * Copyright (c) 2000-2002, 2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Description:	This file, include/linux/pagg.h, contains the data
+ *              structure definitions and function prototypes used to
+ *              implement process aggrefates (paggs). Paggs provides a
+ *              generalized was to implement process groupings or
+ *              containers.  Modules use these functions to register
+ *              with the kernel as providers of process aggregation
+ *              containers. The pagg data structures define the
+ *              callback functions and data access pointers back into
+ *              the pagg modules.
+ */
+
+#ifndef _PAGG_H
+#define _PAGG_H
+
+#include <linux/config.h>
+
+/*
+ * Used by task_struct to manage a list of pagg attachments for the task.
+ * The list will be used to hold references to pagg structures.  
+ * These structures define the pagg attachments for the task.  
+ *
+ * STRUCT MEMBERS:
+ * 	list:		The list head pointer for the list of pagg
+ * 			structures.
+ * 	sem:		The semaphore used  to lock the list.
+ */
+struct pagg_list {
+	struct list_head	head;	
+	struct rw_semaphore	sem;
+};
+
+#ifdef CONFIG_PAGG
+
+#define PAGG_NAMELN	32		/* Max chars in PAGG module name */
+
+
+/* Macro used to initialize a pagg_list structure after declaration 
+ *
+ * Macro arguments:
+ * 	l:	the pagg list (struct pagg_list)
+ */
+#define INIT_PAGG_LIST(l)						\
+do {									\
+	INIT_LIST_HEAD(l.head);						\
+	init_rwsem(l.sem);						\
+} while(0)
+	
+
+/*
+ * Used by task_struct to manage list of pagg attachments for the process.  
+ * Each pagg provides the link between the process and the 
+ * correct pagg container.
+ *
+ * STRUCT MEMBERS:
+ *     hook:	Reference to pagg module structure.  That struct
+ *     		holds the name key and function pointers.
+ *     data:	Opaque data pointer - defined by pagg modules.
+ *     entry:	List pointers
+ */
+struct pagg {
+       struct pagg_hook	*hook;
+       void		*data;
+       struct list_head	entry;
+};
+
+/*
+ * Used by pagg modules to define the callback functions into the 
+ * module.
+ *
+ * STRUCT MEMBERS:
+ *     name:           The name of the pagg container type provided by
+ *                     the module. This will be set by the pagg module.
+ *     attach:         Function pointer to function used when attaching
+ *                     a process to the pagg container referenced by 
+ *                     this struct.
+ *     detach:         Function pointer to function used when detaching
+ *                     a process to the pagg container referenced by 
+ *                     this struct.
+ *     init:           Function pointer to initialization function.  This
+ *                     function is used when the module is loaded to attach
+ *                     existing processes to a default container as defined by
+ *                     the pagg module. This is optional and may be set to 
+ *                     NULL if it is not needed by the pagg module.
+ *     data:           Opaque data pointer - defined by pagg modules.
+ *     module:         Pointer to kernel module struct.  Used to increment & 
+ *                     decrement the use count for the module.
+ *     entry:	       List pointers
+ *     exec:           Function pointer to function used when a process
+ *                     in the pagg container exec's a new process. This
+ *                     is optional and may be set to NULL if it is not 
+ *                     needed by the pagg module.
+ */
+struct pagg_hook {
+       struct module	*module;
+       char		*name;	/* Name Key - restricted to 32 characters */
+       int		(*attach)(struct task_struct *, struct pagg *, void*);
+       int		(*detach)(struct task_struct *, struct pagg *);
+       int		(*init)(struct task_struct *, struct pagg *);
+       void		*data;	/* Opaque module specific data */
+       struct list_head	entry;	/* List pointers */
+       void		(*exec)(struct task_struct *, struct pagg *);
+};
+
+
+/* Kernel service functions for providing PAGG support */
+extern struct pagg *get_pagg(struct task_struct *task, char *key);
+extern struct pagg *alloc_pagg(struct task_struct *task, 
+				      struct pagg_hook *pt);
+extern void free_pagg(struct pagg *pagg);
+extern int register_pagg_hook(struct pagg_hook *pt_new);
+extern int unregister_pagg_hook(struct pagg_hook *pt_old);
+extern int attach_pagg_list(struct task_struct *to_task, 
+					struct task_struct *from_task);
+extern int detach_pagg_list(struct task_struct *task);
+extern int exec_pagg_list(struct task_struct *task);
+
+/* 
+ *  Macro used when a child process must inherit attachment to pagg 
+ *  containers from the parent.
+ *
+ *  Arguments:
+ *	ct:	child (struct task_struct *)
+ *	pt:	parent (struct task_struct *)
+ *	cf:	clone_flags
+ */
+#define attach_pagg_list_chk(ct, pt)					\
+do {									\
+	INIT_PAGG_LIST(&ct->pagg_list);					\
+	if (!list_empty(&pt->pagg_list.head)) {				\
+		if (attach_pagg_list(ct, pt) != 0)			\
+			goto bad_fork_cleanup;				\
+	}								\
+} while(0)
+
+/* 
+ * Macro used when a process must detach from pagg containers to which it
+ * is currenlty a member.
+ *
+ * Aguments:
+ * 	t:	task (struct task_struct *)
+ */
+#define detach_pagg_list_chk(t)					\
+do {									\
+	if (!list_empty(&t->pagg_list.head)) {				\
+		detach_pagg_list(t);					\
+	}								\
+} while(0)
+
+
+/* 
+ * Macro used when a process exec's.
+ *
+ * Aguments:
+ * 	t:	task (struct task_struct *)
+ */
+#define exec_pagg_list_chk(t)						\
+do {									\
+	if (!list_empty(&t->pagg_list.head)) {				\
+		exec_pagg_list(t);					\
+	}								\
+} while(0)
+
+
+/*
+ * Utility macros for pagg handling and locking pagg lists.
+ *
+ * Arguments:
+ * 	t:	task  (struct task_list *)
+ * 	p:	pagg  (struct pagg *)
+ * 	d:	data  (ptr to data maintained by the 
+ * 			pagg module - converts to void ptr)
+ */
+	/* Invoke module detach callback for the pagg & task */
+#define detach_pagg(t, p)		p->hook->detach(t, p)
+	/* Invoke module attach callback for the pagg & task */
+#define attach_pagg(t, p, d)  		p->hook->attach(t, p, (void *)d)
+	/* Allows optional callout at exec */
+#define exec_pagg(t, p)  		do {				\
+						if (p->hook->exec)	\
+						    p->hook->exec(t, p);\
+					} while(0)
+	/* Allows module to set data item for pagg */
+#define set_pagg(p, d)   		p->data = (void *)d
+	/* Down the read semaphore for the task's pagg_list */
+#define read_lock_pagg_list(t)		down_read(&t->pagg_list.sem)
+	/* Up the read semaphore for the task's pagg_list */
+#define read_unlock_pagg_list(t) 	up_read(&t->pagg_list.sem)
+	/* Down the write semaphore for the task's pagg_list */
+#define write_lock_pagg_list(t)		down_write(&t->pagg_list.sem)
+	/* Up the write semaphore for the task's pagg_list */
+#define write_unlock_pagg_list(t) 	up_write(&t->pagg_list.sem)
+
+/*
+ * Marco Used in INIT_TASK to set the head and sem of pagg_list.
+ * If CONFIG_PAGG is off, it is defined as an empty macro below.
+ */
+#define INIT_TASK_PAGG(tsk) \
+	.pagg_list  = {                  \
+	.head = LIST_HEAD_INIT(tsk.pagg_list.head),     \
+	.sem  = __RWSEM_INITIALIZER(tsk.pagg_list.sem)  \
+	}, \
+
+#else  /* CONFIG_PAGG */
+
+/* 
+ * Replacement macros used when PAGG (Process Aggregates) support is not
+ * compiled into the kernel.
+ */
+#define INIT_TASK_PAGG(tsk)
+#define INIT_PAGG_LIST(l) do { } while(0)
+#define attach_pagg_list_chk(ct, pt)  do { } while(0)
+#define detach_pagg_list_chk(t)  do {  } while(0)     
+#define exec_pagg_list_chk(t)  do {  } while(0)     
+
+#endif /* CONFIG_PAGG */
+
+#endif /* _PAGG_H */
diff -Naru 2.6-patch/include/linux/sched.h 2.6pagg-patch/include/linux/sched.h
--- 2.6-patch/include/linux/sched.h	2004-04-05 14:18:05.000000000 -0500
+++ 2.6pagg-patch/include/linux/sched.h	2004-04-14 07:52:16.000000000 -0500
@@ -29,6 +29,7 @@
 #include <linux/completion.h>
 #include <linux/pid.h>
 #include <linux/percpu.h>
+#include <linux/pagg.h>
 
 struct exec_domain;
 
@@ -488,11 +489,15 @@
 
 	struct dentry *proc_dentry;
 	struct backing_dev_info *backing_dev_info;
-
 	struct io_context *io_context;
 
 	unsigned long ptrace_message;
 	siginfo_t *last_siginfo; /* For ptrace use.  */
+
+#if defined(CONFIG_PAGG)
+/* List of pagg (process aggregate) attachments */
+	struct pagg_list pagg_list;
+#endif
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
diff -Naru 2.6-patch/init/Kconfig 2.6pagg-patch/init/Kconfig
--- 2.6-patch/init/Kconfig	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/init/Kconfig	2004-04-26 14:25:25.000000000 -0500
@@ -104,6 +104,14 @@
 	  up to the user level program to do useful things with this
 	  information.  This is generally a good idea, so say Y.
 
+config PAGG
+	bool "Support for process aggregates (PAGGs)"
+	help
+     Say Y here if you will be loading modules which provide support
+     for process aggregate containers.  Examples of such modules include the
+     Linux Jobs module and the Linux Array Sessions module.  If you will not 
+     be using such modules, say N.
+
 config SYSCTL
 	bool "Sysctl support"
 	---help---
diff -Naru 2.6-patch/kernel/exit.c 2.6pagg-patch/kernel/exit.c
--- 2.6-patch/kernel/exit.c	2004-04-05 14:18:05.000000000 -0500
+++ 2.6pagg-patch/kernel/exit.c	2004-04-14 07:52:16.000000000 -0500
@@ -22,7 +22,7 @@
 #include <linux/profile.h>
 #include <linux/mount.h>
 #include <linux/proc_fs.h>
-
+#include <linux/pagg.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/mmu_context.h>
@@ -788,6 +788,9 @@
 		module_put(tsk->binfmt->module);
 
 	tsk->exit_code = code;
+
+	detach_pagg_list_chk(tsk);
+
 	exit_notify(tsk);
 	schedule();
 	BUG();
diff -Naru 2.6-patch/kernel/fork.c 2.6pagg-patch/kernel/fork.c
--- 2.6-patch/kernel/fork.c	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/kernel/fork.c	2004-04-13 21:42:35.000000000 -0500
@@ -31,7 +31,7 @@
 #include <linux/futex.h>
 #include <linux/ptrace.h>
 #include <linux/mount.h>
-
+#include <linux/pagg.h>
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -232,6 +232,9 @@
 
 	init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
 	init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
+
+	/* Initialize the pagg list in pid 0 before it can clone itself. */
+	INIT_PAGG_LIST(&current->pagg_list);
 }
 
 static struct task_struct *dup_task_struct(struct task_struct *orig)
@@ -985,6 +988,12 @@
 	   
 	p->parent_exec_id = p->self_exec_id;
 
+	/*
+	 * call pagg modules to properly attach new process to the same
+	 * process aggregate containers as the parent process.
+	 */
+	attach_pagg_list_chk(p, current);
+
 	/* ok, now we should be set up.. */
 	p->exit_signal = (clone_flags & CLONE_THREAD) ? -1 : (clone_flags & CSIGNAL);
 	p->pdeath_signal = 0;
diff -Naru 2.6-patch/kernel/Makefile 2.6pagg-patch/kernel/Makefile
--- 2.6-patch/kernel/Makefile	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/kernel/Makefile	2004-04-13 21:42:35.000000000 -0500
@@ -7,7 +7,7 @@
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
 	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
-	    kthread.o
+	    kthread.o pagg.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
@@ -18,6 +18,7 @@
 obj-$(CONFIG_PM) += power/
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_COMPAT) += compat.o
+obj-$(CONFIG_PAGG) += pagg.o
 obj-$(CONFIG_IKCONFIG) += configs.o
 obj-$(CONFIG_IKCONFIG_PROC) += configs.o
 obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
diff -Naru 2.6-patch/kernel/pagg.c 2.6pagg-patch/kernel/pagg.c
--- 2.6-patch/kernel/pagg.c	1969-12-31 18:00:00.000000000 -0600
+++ 2.6pagg-patch/kernel/pagg.c	2004-04-26 12:23:02.000000000 -0500
@@ -0,0 +1,426 @@
+/* 
+ * PAGG (Process Aggregates) interface
+ *
+ * 
+ * Copyright (c) 2000-2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Description:  This file, kernel/pagg.c, contains the routines used
+ *               to implement process aggregates (paggs).  The pagg
+ *               extends the task_struct to allow for various process
+ *               aggregation continers.  Examples of such containers
+ *               include "jobs" and cluster applications IDs.  Process
+ *               sessions and groups could have been implemented using
+ *               paggs (although there would be little purpose in
+ *               making that change at this juncture).  The pagg
+ *               structure maintains pointers to callback functions and
+ *               data strucures maintained in modules that have
+ *               registered with the kernel as pagg container
+ *               providers.
+ */
+
+#include <linux/config.h>
+
+#ifdef CONFIG_PAGG
+
+#include <asm/uaccess.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <asm/semaphore.h>
+#include <linux/smp_lock.h>
+#include <linux/proc_fs.h>
+#include <linux/module.h>
+#include <linux/pagg.h>
+
+/* list of pagg hook entries that reference the "module" implementations */
+static LIST_HEAD(pagg_hook_list);
+static DECLARE_RWSEM(pagg_hook_list_sem);
+
+
+/* 
+ * get_pagg
+ *
+ * Given a pagg_list list structure, this function will return
+ * a pointer to the pagg struct that matches the search
+ * key.  If the key is not found, the function will return NULL.
+ *
+ * The caller should hold at least a read lock on the pagg_list
+ * for task using read_lock_pagg_list(task).
+ */
+struct pagg *
+get_pagg(struct task_struct *task, char *key)
+{
+	struct list_head *entry;
+
+	list_for_each(entry, &task->pagg_list.head) {
+		struct pagg *pagg = list_entry(entry, struct pagg, entry);
+		if (!strcmp(pagg->hook->name,key)) {
+			return pagg;
+		}
+	}
+	return NULL;
+}
+
+
+/*
+ * alloc_pagg
+ *
+ * Given a task and a pagg hook, this function will allocate
+ * a new pagg structure, initialize the settings, and insert the pagg into
+ * the pagg_list for the task.
+ *
+ * The caller for this function should hold at least a read lock on the
+ * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be 
+ * removed. If this function was called from the pagg module (usually the
+ * case), then the caller need not hold this lock. The caller should hold 
+ * a write lock on for the tasks pagg_list.sem.  This can be locked using the
+ * write_lock_pagg_list(task) macro.
+ */
+struct pagg *
+alloc_pagg(struct task_struct *task, struct pagg_hook *pagg_hook)
+{
+	struct pagg *pagg;
+
+	pagg = (struct pagg *)kmalloc(sizeof(struct pagg), GFP_KERNEL);
+	if (!pagg)
+		return NULL;
+
+	pagg->hook = pagg_hook;
+	pagg->data = NULL;
+	list_add_tail(&pagg->entry, &task->pagg_list.head);
+	return pagg;
+}
+
+
+/*
+ * free_pagg
+ *
+ * This function will ensure the pagg is deleted form 
+ * the list of pagg entries for the task. Finally, the memory for the 
+ * pagg is discarded.
+ *
+ * The caller of this function should hold a write lock on the pagg_list.sem
+ * for the task. This can be locke dusing the write_lock_pagg_list(task) 
+ * macro.
+ *
+ * Prior to calling free_pagg, the pagg should have been detached from the
+ * pagg container represented by this pagg.  That is usually done using the
+ * macro detach_pagg(pagg).
+ */
+void
+free_pagg(struct pagg *pagg) 
+{
+
+	list_del(&pagg->entry);
+	kfree(pagg);
+}
+
+
+/*
+ * get_pagg_hook
+ *
+ * Given a pagg hook name key, this function will return a pointer
+ * to the pagg_hook struct that contains that matches the name.
+ * 
+ * You should hold either the write or read lock for pagg_hook_list_sem
+ * before using this function.  This will ensure that the pagg_hook_list
+ * does not change while iterating through the list entries.
+ */
+static struct pagg_hook *
+get_pagg_hook(char *key)
+{
+	struct list_head *entry;
+	struct pagg_hook *pagg_hook;
+
+	list_for_each(entry, &pagg_hook_list) {
+		pagg_hook = list_entry(entry, struct pagg_hook, entry);
+		if (!strcmp(pagg_hook->name, key)) {
+			return pagg_hook;
+		}
+	}
+	return NULL;
+}
+
+
+/*
+ * register_pagg_hook
+ *
+ * Used to register a new pagg hook and enter it into the pagg_hook_list.
+ * The service name for a pagg hook is restricted to 32 characters.
+ *
+ * In the future an initialization function may also be defined so that all
+ * existing tasks can be assigned to a default pagg entry for the hook.
+ * However, this would require iterating through the tasklist.  To do that
+ * requires that the tasklist_lock be read locked.  Since the initialization
+ * function might be in a module, and therefore it might sleep (implementors
+ * decision), holding the tasklist_lock seems like a bad idea. It may be a
+ * requirement that the initialization function will be strictly forbidden
+ * from locking - by gentlemans agreement... 
+ *
+ * If a memory error is encountered, the pagg hook is unregistered and any
+ * tasks that have been attached to the initial pagg container are detached
+ * from that container.
+ */
+int
+register_pagg_hook(struct pagg_hook *pagg_hook_new)
+{
+	struct pagg_hook *pagg_hook = NULL;
+
+	/* ADD NEW PAGG MODULE TO ACCESS LIST */
+	if (!pagg_hook_new)
+		return -EINVAL;			/* error */
+	if (!list_empty(&pagg_hook_new->entry))
+		return -EINVAL;			/* error */
+	if (pagg_hook_new->name == NULL || strlen(pagg_hook_new->name) > PAGG_NAMELN) 
+		return -EINVAL;			/* error */
+
+	/* Try to insert new hook entry into the pagg hook list */
+	down_write(&pagg_hook_list_sem);
+
+	pagg_hook = get_pagg_hook(pagg_hook_new->name);
+
+	if (pagg_hook) {
+		up_write(&pagg_hook_list_sem);
+		printk(KERN_WARNING "Attempt to register duplicate"
+				" PAGG support (name=%s)\n", pagg_hook_new->name);
+		return -EBUSY;
+	}
+
+	/* Okay, we can insert into the pagg hook list */
+	list_add_tail(&pagg_hook_new->entry, &pagg_hook_list);
+	up_write(&pagg_hook_list_sem);
+
+	printk(KERN_INFO "Registering PAGG support for (name=%s)\n",
+			pagg_hook_new->name);
+
+	return 0;					/* success */
+
+}
+
+
+/*
+ * unregister_pagg_hook
+ *
+ * Used to unregister pagg hooks and remove them from the pagg_hook_list.
+ * Once the pagg hook entry in the pagg_hook_list is found, all of the
+ * tasks are scanned and detached from any pagg containers defined by the
+ * pagg implementation module.
+ */
+int
+unregister_pagg_hook(struct pagg_hook *pagg_hook_old)
+{
+	struct pagg_hook *pagg_hook;
+	struct task_struct *task;
+
+
+	/* Check the validity of the arguments */
+	if (!pagg_hook_old)
+		return -EINVAL;			/* error */
+	if (list_empty(&pagg_hook_old->entry))
+		return -EINVAL;			/* error */
+	if (pagg_hook_old->name == NULL)
+		return -EINVAL;			/* error */
+
+	down_write(&pagg_hook_list_sem);
+
+	pagg_hook = get_pagg_hook(pagg_hook_old->name);
+	if (pagg_hook && pagg_hook == pagg_hook_old) {
+		/* 
+		 * Scan through processes on system and check for  
+		 * references to pagg containers for this pagg hook.
+		 * 
+		 * The module cannot be unloaded if there are references.
+		 */
+		read_lock(&tasklist_lock);
+		for_each_process(task) {
+			struct pagg *pagg = NULL;
+
+			read_lock_pagg_list(task);
+			pagg = get_pagg(task, pagg_hook_old->name);
+			/* 
+			 * We won't be accessing pagg's memory, just need
+			 * to see if one existed - so we can release the task
+			 * lock now.
+			 */
+			read_unlock_pagg_list(task);
+			if (pagg) {
+				read_unlock(&tasklist_lock);
+				up_write(&pagg_hook_list_sem);
+				return -EBUSY;
+			}
+		}
+		list_del_init(&pagg_hook->entry);
+		read_unlock(&tasklist_lock);
+
+		up_write(&pagg_hook_list_sem);
+
+		printk(KERN_INFO "Unregistering PAGG support for"
+				" (name=%s)\n", pagg_hook_old->name);
+
+		return 0;			/* success */
+	}
+
+	up_write(&pagg_hook_list_sem);
+
+	printk(KERN_WARNING "Attempt to unregister PAGG support (name=%s)"
+			" failed - not found\n", pagg_hook_old->name);
+	
+	return -EINVAL;				/* error */
+}
+
+
+/*
+ * attach_pagg_list
+ *
+ * Used to attach a new task to the same pagg containers to which it's parent
+ * is attached.
+ *
+ * The "from" argument is the parent task.  The "to" argument is the child
+ * task. 
+ *
+ */
+int
+attach_pagg_list(struct task_struct *to_task, struct task_struct *from_task)
+{
+	struct list_head   *entry;
+	int  		   retcode = 0;
+
+
+
+	/* lock the parents pagg_list we are copying from */
+	read_lock_pagg_list(from_task);
+
+	list_for_each(entry, &from_task->pagg_list.head) {
+		struct pagg *to_pagg = NULL;
+		struct pagg *from_pagg = list_entry(entry, struct pagg, 
+							entry);
+		to_pagg = alloc_pagg(to_task, from_pagg->hook);
+		if (!to_pagg) {
+			retcode = -ENOMEM;
+			goto error_return;
+		}
+		retcode = attach_pagg(to_task, to_pagg, from_pagg->data);
+		if (retcode != 0) {
+			/* attach should issue error message */
+			goto error_return;
+		}
+	}
+
+	read_unlock_pagg_list(from_task);
+
+	return 0;					/* success */
+
+  error_return:
+	/* 
+	 * Clean up all the pagg attachments made on behalf of the new
+	 * task.  Set new task pagg ptr to NULL for return.
+	 */
+	read_unlock_pagg_list(from_task);
+	detach_pagg_list(to_task);
+	return retcode;				/* failure */
+}
+
+
+/*
+ * detach_pagg_list
+ *
+ * Used to detach a task from all pagg containers to which it is attached.
+ */
+int
+detach_pagg_list(struct task_struct *task)
+{
+	struct list_head   *entry;
+	int retcode = 0;
+	int rettmp = 0;
+
+	/* Remove ref. to paggs from task immediately */
+	write_lock_pagg_list(task);
+
+	if (list_empty(&task->pagg_list.head)) {
+		write_unlock_pagg_list(task);
+		return retcode;
+	} 
+
+	list_for_each(entry, &task->pagg_list.head) {
+		int rettemp = 0;
+		struct pagg *pagg = list_entry(entry, struct pagg, entry);
+
+		entry = &task->pagg_list.head;
+
+		rettemp = detach_pagg(task, pagg);
+		if (rettmp) {
+			/* an error message should be logged in free_pagg */
+			retcode = rettmp;
+		}
+		free_pagg(pagg);
+	}
+
+	write_unlock_pagg_list(task);
+
+	return retcode;	/* 0 = success, else return last code for failure */
+}
+
+
+/*
+ * exec_pagg_list
+ *
+ * Used to when a process that is in a pagg container does an exec.
+ *
+ * The "from" argument is the task.  The "name" argument is the name
+ * of the process being exec'ed.
+ *
+ */
+int exec_pagg_list(struct task_struct *task) {
+	struct list_head   *entry;
+
+
+
+	/* lock the parents pagg_list we are copying from */
+	read_lock_pagg_list(task);
+
+	list_for_each(entry, &task->pagg_list.head) {
+		struct pagg *pagg = list_entry(entry, struct pagg, 
+							entry);
+		exec_pagg(task, pagg);
+	}
+
+	read_unlock_pagg_list(task);
+	return 0;
+}
+
+
+EXPORT_SYMBOL(get_pagg);
+EXPORT_SYMBOL(alloc_pagg);
+EXPORT_SYMBOL(free_pagg);
+EXPORT_SYMBOL(attach_pagg_list);
+EXPORT_SYMBOL(detach_pagg_list);
+EXPORT_SYMBOL(exec_pagg_list);
+EXPORT_SYMBOL(register_pagg_hook);
+EXPORT_SYMBOL(unregister_pagg_hook);
+
+#endif /* CONFIG_PAGG */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-26 22:04 [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel Erik Jacobson
@ 2004-04-26 23:39 ` Chris Wright
  2004-04-27  0:36   ` Jesse Barnes
  2004-04-27 20:51   ` Erik Jacobson
  2004-04-30  8:54 ` Guillaume Thouvenin
  2004-05-20 21:16 ` Erik Jacobson
  2 siblings, 2 replies; 31+ messages in thread
From: Chris Wright @ 2004-04-26 23:39 UTC (permalink / raw)
  To: Erik Jacobson; +Cc: linux-kernel

* Erik Jacobson (erikj@subway.americas.sgi.com) wrote:
> Here, I am proposing Process Aggregates support for the 2.6 kernel.

This looks like it's just the infrastructure, i.e. nothing is using it.
It seems like PAGG could be done on top of CKRM (albeit, with more
code).  But if the goal is to do some basic accounting, scheduling, etc.
on a resource group, wouldn't CKRM be more generic?

Couple quick comments below.

> +struct pagg_hook {
> +       struct module	*module;

doesn't seem used.

> +       char		*name;	/* Name Key - restricted to 32 characters */

why the restriction?  

> +#define attach_pagg_list_chk(ct, pt)					\
> +do {									\
> +	INIT_PAGG_LIST(&ct->pagg_list);					\
> +	if (!list_empty(&pt->pagg_list.head)) {				\
> +		if (attach_pagg_list(ct, pt) != 0)			\
> +			goto bad_fork_cleanup;				\
> +	}								\
> +} while(0)

Goto a label defined elsewhere, buried in a macro.  Please code this
openly.

> +#define detach_pagg_list_chk(t)					\
> +do {									\
> +	if (!list_empty(&t->pagg_list.head)) {				\
> +		detach_pagg_list(t);					\
> +	}								\
> +} while(0)

All these macros could be type safe inlined functions, and when config'd
off, use your alt. no-op macros.

> +#define read_lock_pagg_list(t)		down_read(&t->pagg_list.sem)
> +	/* Up the read semaphore for the task's pagg_list */
> +#define read_unlock_pagg_list(t) 	up_read(&t->pagg_list.sem)
> +	/* Down the write semaphore for the task's pagg_list */
> +#define write_lock_pagg_list(t)		down_write(&t->pagg_list.sem)
> +	/* Up the write semaphore for the task's pagg_list */
> +#define write_unlock_pagg_list(t) 	up_write(&t->pagg_list.sem)

Just open code these.  There's too much hidden in macros.

> @@ -488,11 +489,15 @@
>  
>  	struct dentry *proc_dentry;
>  	struct backing_dev_info *backing_dev_info;
> -
>  	struct io_context *io_context;
>  
>  	unsigned long ptrace_message;
>  	siginfo_t *last_siginfo; /* For ptrace use.  */
> +
> +#if defined(CONFIG_PAGG)
> +/* List of pagg (process aggregate) attachments */
> +	struct pagg_list pagg_list;
> +#endif

unused?

> +unregister_pagg_hook(struct pagg_hook *pagg_hook_old)
<snip>
> +	down_write(&pagg_hook_list_sem);
> +
> +	pagg_hook = get_pagg_hook(pagg_hook_old->name);
> +	if (pagg_hook && pagg_hook == pagg_hook_old) {
> +		/* 
> +		 * Scan through processes on system and check for  
> +		 * references to pagg containers for this pagg hook.
> +		 * 
> +		 * The module cannot be unloaded if there are references.
> +		 */
> +		read_lock(&tasklist_lock);
> +		for_each_process(task) {
> +			struct pagg *pagg = NULL;
> +
> +			read_lock_pagg_list(task);

Uh-oh, grabbing a semaphore while holding tasklist_lock.

There's too much hidden in macros (like read_lock_pagg_list).

> +attach_pagg_list(struct task_struct *to_task, struct task_struct *from_task)
<snip>
> +		to_pagg = alloc_pagg(to_task, from_pagg->hook);
> +		if (!to_pagg) {
> +			retcode = -ENOMEM;
> +			goto error_return;
> +		}
> +		retcode = attach_pagg(to_task, to_pagg, from_pagg->data);
> +		if (retcode != 0) {
> +			/* attach should issue error message */
> +			goto error_return;
> +		}

This looks like it leaks the just alloc'd to_pagg.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-26 23:39 ` Chris Wright
@ 2004-04-27  0:36   ` Jesse Barnes
  2004-04-27  0:41     ` Chris Wright
  2004-04-27 20:51   ` Erik Jacobson
  1 sibling, 1 reply; 31+ messages in thread
From: Jesse Barnes @ 2004-04-27  0:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Wright, Erik Jacobson

On Monday, April 26, 2004 4:39 pm, Chris Wright wrote:
> * Erik Jacobson (erikj@subway.americas.sgi.com) wrote:
> > Here, I am proposing Process Aggregates support for the 2.6 kernel.
>
> This looks like it's just the infrastructure, i.e. nothing is using it.
> It seems like PAGG could be done on top of CKRM (albeit, with more
> code).  But if the goal is to do some basic accounting, scheduling, etc.
> on a resource group, wouldn't CKRM be more generic?

Quite possibly.  Do you have a pointer to the latest bits/design docs?

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-27  0:36   ` Jesse Barnes
@ 2004-04-27  0:41     ` Chris Wright
  2004-04-27 21:00       ` Erik Jacobson
  2004-04-29 21:10       ` Rik van Riel
  0 siblings, 2 replies; 31+ messages in thread
From: Chris Wright @ 2004-04-27  0:41 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-kernel, Chris Wright, Erik Jacobson

* Jesse Barnes (jbarnes@sgi.com) wrote:
> On Monday, April 26, 2004 4:39 pm, Chris Wright wrote:
> > * Erik Jacobson (erikj@subway.americas.sgi.com) wrote:
> > > Here, I am proposing Process Aggregates support for the 2.6 kernel.
> >
> > This looks like it's just the infrastructure, i.e. nothing is using it.
> > It seems like PAGG could be done on top of CKRM (albeit, with more
> > code).  But if the goal is to do some basic accounting, scheduling, etc.
> > on a resource group, wouldn't CKRM be more generic?
> 
> Quite possibly.  Do you have a pointer to the latest bits/design docs?

Nothing aside from what's on ckrm.sf.net.  I know they've been retooling
it a bit, but I'm not up on the current status.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-26 23:39 ` Chris Wright
  2004-04-27  0:36   ` Jesse Barnes
@ 2004-04-27 20:51   ` Erik Jacobson
  2004-04-27 22:28     ` Chris Wright
  2004-04-28 14:55     ` Christoph Hellwig
  1 sibling, 2 replies; 31+ messages in thread
From: Erik Jacobson @ 2004-04-27 20:51 UTC (permalink / raw)
  To: Chris Wright; +Cc: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3172 bytes --]

Thanks for the comments.

I made a change and responded to your other comments.  Let me know if I
missed something.

I didn't choose to change the macros at this time - however, I'm not against
changing them either - I just haven't done it yet.

I'll attach a new patch.


On Mon, 26 Apr 2004, Chris Wright wrote:
> This looks like it's just the infrastructure, i.e. nothing is using it.
> It seems like PAGG could be done on top of CKRM (albeit, with more
> code).  But if the goal is to do some basic accounting, scheduling, etc.
> on a resource group, wouldn't CKRM be more generic?

Right.  A couple examples of things we have that use it are CSA
(oss.sgi.com/csa) and job.  job provides inescapable job containers that
are also used by csa.

But what I presented here was just the infrastructure as you said.

Patches for inescapable job containers ('job') are available on the pagg web
site as well (oss.sgi.com/pagg).

> Couple quick comments below.
>
> > +struct pagg_hook {
> > +       struct module	*module;

When another kernel module makes use of PAGG, it sets this in the pagg
hook.

> > +       char		*name;	/* Name Key - restricted to 32 characters */
>
> why the restriction?

I'm open to suggestions.  Right now, this is usually set to something like
"job" or similar.  The max length is enforced by the module that makes use
of pagg.  For example, with the job package:

...
#define PAGG_NAMELN  32    /* Max chars in PAGG module name */
...
#define PAGG_JOB  "job" /* PAGG module identifier string */
...


> > +unregister_pagg_hook(struct pagg_hook *pagg_hook_old)
> <snip>
> > +	down_write(&pagg_hook_list_sem);
> > +
> > +	pagg_hook = get_pagg_hook(pagg_hook_old->name);
> > +	if (pagg_hook && pagg_hook == pagg_hook_old) {
> > +		/*
> > +		 * Scan through processes on system and check for
> > +		 * references to pagg containers for this pagg hook.
> > +		 *
> > +		 * The module cannot be unloaded if there are references.
> > +		 */
> > +		read_lock(&tasklist_lock);
> > +		for_each_process(task) {
> > +			struct pagg *pagg = NULL;
> > +
> > +			read_lock_pagg_list(task);
>
> Uh-oh, grabbing a semaphore while holding tasklist_lock.
>
> There's too much hidden in macros (like read_lock_pagg_list).

I fixed the tasklist issue you were concerned about.  Again, I didn't address
the macro issue at this moment.

> > +attach_pagg_list(struct task_struct *to_task, struct task_struct *from_task)
> <snip>
> > +		to_pagg = alloc_pagg(to_task, from_pagg->hook);
> > +		if (!to_pagg) {
> > +			retcode = -ENOMEM;
> > +			goto error_return;
> > +		}
> > +		retcode = attach_pagg(to_task, to_pagg, from_pagg->data);
> > +		if (retcode != 0) {
> > +			/* attach should issue error message */
> > +			goto error_return;
> > +		}
>
> This looks like it leaks the just alloc'd to_pagg.

I agree that it looks suspect but I think it's OK.

You're talking about the case where the pagg was allocated, but couldn't
attach I assume.

The alloc_pagg function adds that allocated pagg to the pagg list.  In
error_return, detach_pagg_list is called so this pagg should be freed then.

--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

[-- Attachment #2: Type: TEXT/PLAIN, Size: 34657 bytes --]

diff -Naru 2.6-patch/Documentation/pagg.txt 2.6pagg-patch/Documentation/pagg.txt
--- 2.6-patch/Documentation/pagg.txt	1969-12-31 18:00:00.000000000 -0600
+++ 2.6pagg-patch/Documentation/pagg.txt	2004-04-26 14:36:05.000000000 -0500
@@ -0,0 +1,162 @@
+Linux Process Aggregates (PAGG)
+-------------------------------
+
+1. Description
+
+The process aggregates infrastructure, or PAGG, provides a generalized
+mechanism for providing arbitrary process groups in Linux.  PAGG consists
+of a series of functions for registering and unregistering support
+for new types of process aggregation containers with the kernel.
+This is similar to the support currently provided within Linux that
+allows for dynamic support of filesystems, block and character devices,
+symbol tables, network devices, serial devices, and execution domains.
+This implementation of PAGG provides developers the basic hooks necessary
+to implement kernel modules for specific process containers, such as
+the job container.
+
+The do_fork function in the kernel was altered to support PAGG.  If a
+process is attached to any PAGG containers and subsequently forks a
+child process, the child process will also be attached to the same PAGG
+containers.  The PAGG containers involved during the fork are notified
+that a new process has been attached.  The notification is accomplished
+via a callback function provided by the PAGG module.
+
+The do_exit function in the kernel has also been altered.  If a process
+is attached to any PAGG containers and that process is exiting, the PAGG
+containers are notified that a process has detached from the container.
+The notification is accomplished via a callback function provided by
+the PAGG module.
+
+The sys_execve function has been modified to support an optional callout
+that can be run when a process in a pagg list does an exec.  It can be 
+used, for example, by other kernel modules that wish to do advanced CPU
+placement on multi-processor systems (just one example).
+
+Additional details concerning this implementation of the process aggregates
+infrastructure are described in the sections that follow.
+
+
+2.  Kernel Changes
+
+This section describe the files and data strcutrues that are involved in this
+implementation of PAGG.  Both modified as well as new files and data
+structures are discussed.
+
+3.1. Modified Files
+
+The following files were modified to implement PAGG:
+
+-  include/linux/init_task.h
+-  include/linux/sched.h
+-  init/Config.help
+-  init/Config.in
+-  kernel/Makefile
+-  kernel/exit.c
+-  kernel/fork.c
+-  fs/exec.c
+-  Documentation/Configure.help
+-  init/Kconfig
+
+This implementation of PAGG supports the i386 and ia64 architecture.
+
+2.2. New Files
+
+The following files were added to implement PAGG:
+
+-  Documentation/pagg.txt
+-  include/linux/pagg.h
+-  kernel/pagg.c
+
+
+2.3. Modified Data Structures
+
+The following existing data structures were altered to implement PAGG.
+
+-  struct task_struct:          (include/linux/sched.h)
+     struct pagg_list  pagg_list;     /* List of pagg containers */
+
+This new member in task_struct, pagg_list, points to the list of pagg
+containers to which the process is currently attached.
+
+2.4. New Data Structures
+
+The following new data structures were introduced to implement PAGG.
+
+-  struct pagg:          (include/linux/pagg.h)
+     struct pagg_hook *hook		     /* Ptr to pagg module entry */
+     void 		*data;               /* Task specific data */
+     struct list_head   entry;	   	     /* List connection */	
+     
+-  struct pagg_hook:        (include/linux/pagg.h)
+     struct module *module;                  /* Ptr to PAGG module */
+     char *name;                             /* PAGG hook name - restricted
+					      * to 32 characters.  */
+     int  (*attach)(struct task_struct *, /* Function to attach */
+               struct pagg *,
+               void *);
+     int  (*detach)(struct task_struct *, /* Function to detach */
+               struct pagg *);
+     int  (*init)(struct task_struct *,   /* Load task init func. */
+		     struct pagg *);
+     void  *data;                            /* Module specific data */
+     struct list_head entry;		     /* List connection */
+     void    (*exec)(struct task_struct *, struct pagg *); /* exec func ptr */
+
+The pagg structure provides the process' reference to the PAGG
+containers provided by the PAGG modules.  The attach function pointer
+is the function used to notify the referenced PAGG container that the
+process is being attached.  The detach function pointer is used to notify
+the referenced PAGG container that the process is exiting or otherwise
+detaching from the container.  The exec function pointer is used when a
+process in the pagg container exec's a new process.  This is optional and
+may be set to NULL if it is not needed by the pagg module.
+
+The pagg_hook structure provides the reference to the module that
+implements a type of PAGG container.  In addition to the function pointers
+described concerning pagg, this structure provides an addition
+function pointer.  The init function pointer is currently not used
+but will be available in the future.  Future use of the init function
+will be optional and will used to attach currently running processes to
+a default PAGG container when a PAGG module is loaded on a running system.
+
+
+2.5. Modified Functions
+
+The following functions were changed to implement PAGG:
+
+-  do_fork:     (kernel/fork.c)
+     /* execute the following pseudocode before add to run-queue  */  
+     If parent process pagg list is not empty
+          Call attach_pagg_list function with child task_struct as argument
+-  do_exit:     (kernel/exit.c)
+     /* execute the following pseudocode prior to schedule call */
+     If current process pagg list is not empty
+               Call detach_pagg_list function with current task_struct 
+-  sys_execve:  (fs/exec.c)
+     /* When a process in a pagg exec's, an optional callout can be run.  This
+        is implemented with an optional function pointer in the pagg_hook.  */
+
+2.6 New Functions
+
+The following new functions were added to implement PAGG:
+
+-  int  register_pagg_hook(struct pagg_hook *);  (kernel/pagg.c)
+     Add module entry into table of pagg modules
+-  int unregister_pagg_hook(struct pagg_hook *); (kernel/pagg.c)
+     Find module entry in list of pagg modules
+          Foreach task
+		If task is attached to this pagg module
+			return error
+	  If no tasks are referencing this module
+		remove module entry from list of pagg modules
+-  int attach_pagg_list(struct task_struct *);       (kernel/pagg.c)
+     /* Assumed task pagg list pts to paggs that it attaches to */
+     While another pagg container reference
+          Make copy of pagg container reference & insert into new list
+          Attach task to pagg container using new container reference
+          Get next pagg container reference
+     Make task pagg list use the new pagg list
+-  int detach_pagg_list(struct task_struct *);       (kernel/pagg.c)
+     While another pagg container reference
+          Detach task from pagg container using reference
+
diff -Naru 2.6-patch/fs/exec.c 2.6pagg-patch/fs/exec.c
--- 2.6-patch/fs/exec.c	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/fs/exec.c	2004-04-26 12:23:02.000000000 -0500
@@ -46,6 +46,7 @@
 #include <linux/security.h>
 #include <linux/syscalls.h>
 #include <linux/rmap-locking.h>
+#include <linux/pagg.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
@@ -1151,6 +1152,7 @@
 	retval = search_binary_handler(&bprm,regs);
 	if (retval >= 0) {
 		free_arg_pages(&bprm);
+		exec_pagg_list_chk(current);
 
 		/* execve success */
 		security_bprm_free(&bprm);
diff -Naru 2.6-patch/include/linux/init_task.h 2.6pagg-patch/include/linux/init_task.h
--- 2.6-patch/include/linux/init_task.h	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/include/linux/init_task.h	2004-04-13 21:42:35.000000000 -0500
@@ -2,6 +2,7 @@
 #define _LINUX__INIT_TASK_H
 
 #include <linux/file.h>
+#include <linux/pagg.h>
 
 #define INIT_FILES \
 { 							\
@@ -112,6 +113,7 @@
 	.proc_lock	= SPIN_LOCK_UNLOCKED,				\
 	.switch_lock	= SPIN_LOCK_UNLOCKED,				\
 	.journal_info	= NULL,						\
+	INIT_TASK_PAGG(tsk)						\
 }
 
 
diff -Naru 2.6-patch/include/linux/pagg.h 2.6pagg-patch/include/linux/pagg.h
--- 2.6-patch/include/linux/pagg.h	1969-12-31 18:00:00.000000000 -0600
+++ 2.6pagg-patch/include/linux/pagg.h	2004-04-26 12:23:02.000000000 -0500
@@ -0,0 +1,249 @@
+/* 
+ * PAGG (Process Aggregates) interface
+ *
+ * 
+ * Copyright (c) 2000-2002, 2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Description:	This file, include/linux/pagg.h, contains the data
+ *              structure definitions and function prototypes used to
+ *              implement process aggrefates (paggs). Paggs provides a
+ *              generalized was to implement process groupings or
+ *              containers.  Modules use these functions to register
+ *              with the kernel as providers of process aggregation
+ *              containers. The pagg data structures define the
+ *              callback functions and data access pointers back into
+ *              the pagg modules.
+ */
+
+#ifndef _PAGG_H
+#define _PAGG_H
+
+#include <linux/config.h>
+
+/*
+ * Used by task_struct to manage a list of pagg attachments for the task.
+ * The list will be used to hold references to pagg structures.  
+ * These structures define the pagg attachments for the task.  
+ *
+ * STRUCT MEMBERS:
+ * 	list:		The list head pointer for the list of pagg
+ * 			structures.
+ * 	sem:		The semaphore used  to lock the list.
+ */
+struct pagg_list {
+	struct list_head	head;	
+	struct rw_semaphore	sem;
+};
+
+#ifdef CONFIG_PAGG
+
+#define PAGG_NAMELN	32		/* Max chars in PAGG module name */
+
+
+/* Macro used to initialize a pagg_list structure after declaration 
+ *
+ * Macro arguments:
+ * 	l:	the pagg list (struct pagg_list)
+ */
+#define INIT_PAGG_LIST(l)						\
+do {									\
+	INIT_LIST_HEAD(l.head);						\
+	init_rwsem(l.sem);						\
+} while(0)
+	
+
+/*
+ * Used by task_struct to manage list of pagg attachments for the process.  
+ * Each pagg provides the link between the process and the 
+ * correct pagg container.
+ *
+ * STRUCT MEMBERS:
+ *     hook:	Reference to pagg module structure.  That struct
+ *     		holds the name key and function pointers.
+ *     data:	Opaque data pointer - defined by pagg modules.
+ *     entry:	List pointers
+ */
+struct pagg {
+       struct pagg_hook	*hook;
+       void		*data;
+       struct list_head	entry;
+};
+
+/*
+ * Used by pagg modules to define the callback functions into the 
+ * module.
+ *
+ * STRUCT MEMBERS:
+ *     name:           The name of the pagg container type provided by
+ *                     the module. This will be set by the pagg module.
+ *     attach:         Function pointer to function used when attaching
+ *                     a process to the pagg container referenced by 
+ *                     this struct.
+ *     detach:         Function pointer to function used when detaching
+ *                     a process to the pagg container referenced by 
+ *                     this struct.
+ *     init:           Function pointer to initialization function.  This
+ *                     function is used when the module is loaded to attach
+ *                     existing processes to a default container as defined by
+ *                     the pagg module. This is optional and may be set to 
+ *                     NULL if it is not needed by the pagg module.
+ *     data:           Opaque data pointer - defined by pagg modules.
+ *     module:         Pointer to kernel module struct.  Used to increment & 
+ *                     decrement the use count for the module.
+ *     entry:	       List pointers
+ *     exec:           Function pointer to function used when a process
+ *                     in the pagg container exec's a new process. This
+ *                     is optional and may be set to NULL if it is not 
+ *                     needed by the pagg module.
+ */
+struct pagg_hook {
+       struct module	*module;
+       char		*name;	/* Name Key - restricted to 32 characters */
+       int		(*attach)(struct task_struct *, struct pagg *, void*);
+       int		(*detach)(struct task_struct *, struct pagg *);
+       int		(*init)(struct task_struct *, struct pagg *);
+       void		*data;	/* Opaque module specific data */
+       struct list_head	entry;	/* List pointers */
+       void		(*exec)(struct task_struct *, struct pagg *);
+};
+
+
+/* Kernel service functions for providing PAGG support */
+extern struct pagg *get_pagg(struct task_struct *task, char *key);
+extern struct pagg *alloc_pagg(struct task_struct *task, 
+				      struct pagg_hook *pt);
+extern void free_pagg(struct pagg *pagg);
+extern int register_pagg_hook(struct pagg_hook *pt_new);
+extern int unregister_pagg_hook(struct pagg_hook *pt_old);
+extern int attach_pagg_list(struct task_struct *to_task, 
+					struct task_struct *from_task);
+extern int detach_pagg_list(struct task_struct *task);
+extern int exec_pagg_list(struct task_struct *task);
+
+/* 
+ *  Macro used when a child process must inherit attachment to pagg 
+ *  containers from the parent.
+ *
+ *  Arguments:
+ *	ct:	child (struct task_struct *)
+ *	pt:	parent (struct task_struct *)
+ *	cf:	clone_flags
+ */
+#define attach_pagg_list_chk(ct, pt)					\
+do {									\
+	INIT_PAGG_LIST(&ct->pagg_list);					\
+	if (!list_empty(&pt->pagg_list.head)) {				\
+		if (attach_pagg_list(ct, pt) != 0)			\
+			goto bad_fork_cleanup;				\
+	}								\
+} while(0)
+
+/* 
+ * Macro used when a process must detach from pagg containers to which it
+ * is currenlty a member.
+ *
+ * Aguments:
+ * 	t:	task (struct task_struct *)
+ */
+#define detach_pagg_list_chk(t)					\
+do {									\
+	if (!list_empty(&t->pagg_list.head)) {				\
+		detach_pagg_list(t);					\
+	}								\
+} while(0)
+
+
+/* 
+ * Macro used when a process exec's.
+ *
+ * Aguments:
+ * 	t:	task (struct task_struct *)
+ */
+#define exec_pagg_list_chk(t)						\
+do {									\
+	if (!list_empty(&t->pagg_list.head)) {				\
+		exec_pagg_list(t);					\
+	}								\
+} while(0)
+
+
+/*
+ * Utility macros for pagg handling and locking pagg lists.
+ *
+ * Arguments:
+ * 	t:	task  (struct task_list *)
+ * 	p:	pagg  (struct pagg *)
+ * 	d:	data  (ptr to data maintained by the 
+ * 			pagg module - converts to void ptr)
+ */
+	/* Invoke module detach callback for the pagg & task */
+#define detach_pagg(t, p)		p->hook->detach(t, p)
+	/* Invoke module attach callback for the pagg & task */
+#define attach_pagg(t, p, d)  		p->hook->attach(t, p, (void *)d)
+	/* Allows optional callout at exec */
+#define exec_pagg(t, p)  		do {				\
+						if (p->hook->exec)	\
+						    p->hook->exec(t, p);\
+					} while(0)
+	/* Allows module to set data item for pagg */
+#define set_pagg(p, d)   		p->data = (void *)d
+	/* Down the read semaphore for the task's pagg_list */
+#define read_lock_pagg_list(t)		down_read(&t->pagg_list.sem)
+	/* Up the read semaphore for the task's pagg_list */
+#define read_unlock_pagg_list(t) 	up_read(&t->pagg_list.sem)
+	/* Down the write semaphore for the task's pagg_list */
+#define write_lock_pagg_list(t)		down_write(&t->pagg_list.sem)
+	/* Up the write semaphore for the task's pagg_list */
+#define write_unlock_pagg_list(t) 	up_write(&t->pagg_list.sem)
+
+/*
+ * Marco Used in INIT_TASK to set the head and sem of pagg_list.
+ * If CONFIG_PAGG is off, it is defined as an empty macro below.
+ */
+#define INIT_TASK_PAGG(tsk) \
+	.pagg_list  = {                  \
+	.head = LIST_HEAD_INIT(tsk.pagg_list.head),     \
+	.sem  = __RWSEM_INITIALIZER(tsk.pagg_list.sem)  \
+	}, \
+
+#else  /* CONFIG_PAGG */
+
+/* 
+ * Replacement macros used when PAGG (Process Aggregates) support is not
+ * compiled into the kernel.
+ */
+#define INIT_TASK_PAGG(tsk)
+#define INIT_PAGG_LIST(l) do { } while(0)
+#define attach_pagg_list_chk(ct, pt)  do { } while(0)
+#define detach_pagg_list_chk(t)  do {  } while(0)     
+#define exec_pagg_list_chk(t)  do {  } while(0)     
+
+#endif /* CONFIG_PAGG */
+
+#endif /* _PAGG_H */
diff -Naru 2.6-patch/include/linux/sched.h 2.6pagg-patch/include/linux/sched.h
--- 2.6-patch/include/linux/sched.h	2004-04-05 14:18:05.000000000 -0500
+++ 2.6pagg-patch/include/linux/sched.h	2004-04-14 07:52:16.000000000 -0500
@@ -29,6 +29,7 @@
 #include <linux/completion.h>
 #include <linux/pid.h>
 #include <linux/percpu.h>
+#include <linux/pagg.h>
 
 struct exec_domain;
 
@@ -488,11 +489,15 @@
 
 	struct dentry *proc_dentry;
 	struct backing_dev_info *backing_dev_info;
-
 	struct io_context *io_context;
 
 	unsigned long ptrace_message;
 	siginfo_t *last_siginfo; /* For ptrace use.  */
+
+#if defined(CONFIG_PAGG)
+/* List of pagg (process aggregate) attachments */
+	struct pagg_list pagg_list;
+#endif
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
diff -Naru 2.6-patch/init/Kconfig 2.6pagg-patch/init/Kconfig
--- 2.6-patch/init/Kconfig	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/init/Kconfig	2004-04-26 14:25:25.000000000 -0500
@@ -104,6 +104,14 @@
 	  up to the user level program to do useful things with this
 	  information.  This is generally a good idea, so say Y.
 
+config PAGG
+	bool "Support for process aggregates (PAGGs)"
+	help
+     Say Y here if you will be loading modules which provide support
+     for process aggregate containers.  Examples of such modules include the
+     Linux Jobs module and the Linux Array Sessions module.  If you will not 
+     be using such modules, say N.
+
 config SYSCTL
 	bool "Sysctl support"
 	---help---
diff -Naru 2.6-patch/kernel/exit.c 2.6pagg-patch/kernel/exit.c
--- 2.6-patch/kernel/exit.c	2004-04-05 14:18:05.000000000 -0500
+++ 2.6pagg-patch/kernel/exit.c	2004-04-14 07:52:16.000000000 -0500
@@ -22,7 +22,7 @@
 #include <linux/profile.h>
 #include <linux/mount.h>
 #include <linux/proc_fs.h>
-
+#include <linux/pagg.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/mmu_context.h>
@@ -788,6 +788,9 @@
 		module_put(tsk->binfmt->module);
 
 	tsk->exit_code = code;
+
+	detach_pagg_list_chk(tsk);
+
 	exit_notify(tsk);
 	schedule();
 	BUG();
diff -Naru 2.6-patch/kernel/fork.c 2.6pagg-patch/kernel/fork.c
--- 2.6-patch/kernel/fork.c	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/kernel/fork.c	2004-04-13 21:42:35.000000000 -0500
@@ -31,7 +31,7 @@
 #include <linux/futex.h>
 #include <linux/ptrace.h>
 #include <linux/mount.h>
-
+#include <linux/pagg.h>
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -232,6 +232,9 @@
 
 	init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
 	init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
+
+	/* Initialize the pagg list in pid 0 before it can clone itself. */
+	INIT_PAGG_LIST(&current->pagg_list);
 }
 
 static struct task_struct *dup_task_struct(struct task_struct *orig)
@@ -985,6 +988,12 @@
 	   
 	p->parent_exec_id = p->self_exec_id;
 
+	/*
+	 * call pagg modules to properly attach new process to the same
+	 * process aggregate containers as the parent process.
+	 */
+	attach_pagg_list_chk(p, current);
+
 	/* ok, now we should be set up.. */
 	p->exit_signal = (clone_flags & CLONE_THREAD) ? -1 : (clone_flags & CSIGNAL);
 	p->pdeath_signal = 0;
diff -Naru 2.6-patch/kernel/Makefile 2.6pagg-patch/kernel/Makefile
--- 2.6-patch/kernel/Makefile	2004-03-16 14:13:30.000000000 -0600
+++ 2.6pagg-patch/kernel/Makefile	2004-04-13 21:42:35.000000000 -0500
@@ -7,7 +7,7 @@
 	    sysctl.o capability.o ptrace.o timer.o user.o \
 	    signal.o sys.o kmod.o workqueue.o pid.o \
 	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
-	    kthread.o
+	    kthread.o pagg.o
 
 obj-$(CONFIG_FUTEX) += futex.o
 obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o
@@ -18,6 +18,7 @@
 obj-$(CONFIG_PM) += power/
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_COMPAT) += compat.o
+obj-$(CONFIG_PAGG) += pagg.o
 obj-$(CONFIG_IKCONFIG) += configs.o
 obj-$(CONFIG_IKCONFIG_PROC) += configs.o
 obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
diff -Naru 2.6-patch/kernel/pagg.c 2.6pagg-patch/kernel/pagg.c
--- 2.6-patch/kernel/pagg.c	1969-12-31 18:00:00.000000000 -0600
+++ 2.6pagg-patch/kernel/pagg.c	2004-04-27 14:41:29.000000000 -0500
@@ -0,0 +1,430 @@
+/* 
+ * PAGG (Process Aggregates) interface
+ *
+ * 
+ * Copyright (c) 2000-2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Description:  This file, kernel/pagg.c, contains the routines used
+ *               to implement process aggregates (paggs).  The pagg
+ *               extends the task_struct to allow for various process
+ *               aggregation continers.  Examples of such containers
+ *               include "jobs" and cluster applications IDs.  Process
+ *               sessions and groups could have been implemented using
+ *               paggs (although there would be little purpose in
+ *               making that change at this juncture).  The pagg
+ *               structure maintains pointers to callback functions and
+ *               data strucures maintained in modules that have
+ *               registered with the kernel as pagg container
+ *               providers.
+ */
+
+#include <linux/config.h>
+
+#ifdef CONFIG_PAGG
+
+#include <asm/uaccess.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <asm/semaphore.h>
+#include <linux/smp_lock.h>
+#include <linux/proc_fs.h>
+#include <linux/module.h>
+#include <linux/pagg.h>
+
+/* list of pagg hook entries that reference the "module" implementations */
+static LIST_HEAD(pagg_hook_list);
+static DECLARE_RWSEM(pagg_hook_list_sem);
+
+
+/* 
+ * get_pagg
+ *
+ * Given a pagg_list list structure, this function will return
+ * a pointer to the pagg struct that matches the search
+ * key.  If the key is not found, the function will return NULL.
+ *
+ * The caller should hold at least a read lock on the pagg_list
+ * for task using read_lock_pagg_list(task).
+ */
+struct pagg *
+get_pagg(struct task_struct *task, char *key)
+{
+	struct list_head *entry;
+
+	list_for_each(entry, &task->pagg_list.head) {
+		struct pagg *pagg = list_entry(entry, struct pagg, entry);
+		if (!strcmp(pagg->hook->name,key)) {
+			return pagg;
+		}
+	}
+	return NULL;
+}
+
+
+/*
+ * alloc_pagg
+ *
+ * Given a task and a pagg hook, this function will allocate
+ * a new pagg structure, initialize the settings, and insert the pagg into
+ * the pagg_list for the task.
+ *
+ * The caller for this function should hold at least a read lock on the
+ * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be 
+ * removed. If this function was called from the pagg module (usually the
+ * case), then the caller need not hold this lock. The caller should hold 
+ * a write lock on for the tasks pagg_list.sem.  This can be locked using the
+ * write_lock_pagg_list(task) macro.
+ */
+struct pagg *
+alloc_pagg(struct task_struct *task, struct pagg_hook *pagg_hook)
+{
+	struct pagg *pagg;
+
+	pagg = (struct pagg *)kmalloc(sizeof(struct pagg), GFP_KERNEL);
+	if (!pagg)
+		return NULL;
+
+	pagg->hook = pagg_hook;
+	pagg->data = NULL;
+	list_add_tail(&pagg->entry, &task->pagg_list.head);
+	return pagg;
+}
+
+
+/*
+ * free_pagg
+ *
+ * This function will ensure the pagg is deleted form 
+ * the list of pagg entries for the task. Finally, the memory for the 
+ * pagg is discarded.
+ *
+ * The caller of this function should hold a write lock on the pagg_list.sem
+ * for the task. This can be locke dusing the write_lock_pagg_list(task) 
+ * macro.
+ *
+ * Prior to calling free_pagg, the pagg should have been detached from the
+ * pagg container represented by this pagg.  That is usually done using the
+ * macro detach_pagg(pagg).
+ */
+void
+free_pagg(struct pagg *pagg) 
+{
+
+	list_del(&pagg->entry);
+	kfree(pagg);
+}
+
+
+/*
+ * get_pagg_hook
+ *
+ * Given a pagg hook name key, this function will return a pointer
+ * to the pagg_hook struct that contains that matches the name.
+ * 
+ * You should hold either the write or read lock for pagg_hook_list_sem
+ * before using this function.  This will ensure that the pagg_hook_list
+ * does not change while iterating through the list entries.
+ */
+static struct pagg_hook *
+get_pagg_hook(char *key)
+{
+	struct list_head *entry;
+	struct pagg_hook *pagg_hook;
+
+	list_for_each(entry, &pagg_hook_list) {
+		pagg_hook = list_entry(entry, struct pagg_hook, entry);
+		if (!strcmp(pagg_hook->name, key)) {
+			return pagg_hook;
+		}
+	}
+	return NULL;
+}
+
+
+/*
+ * register_pagg_hook
+ *
+ * Used to register a new pagg hook and enter it into the pagg_hook_list.
+ * The service name for a pagg hook is restricted to 32 characters.
+ *
+ * In the future an initialization function may also be defined so that all
+ * existing tasks can be assigned to a default pagg entry for the hook.
+ * However, this would require iterating through the tasklist.  To do that
+ * requires that the tasklist_lock be read locked.  Since the initialization
+ * function might be in a module, and therefore it might sleep (implementors
+ * decision), holding the tasklist_lock seems like a bad idea. It may be a
+ * requirement that the initialization function will be strictly forbidden
+ * from locking - by gentlemans agreement... 
+ *
+ * If a memory error is encountered, the pagg hook is unregistered and any
+ * tasks that have been attached to the initial pagg container are detached
+ * from that container.
+ */
+int
+register_pagg_hook(struct pagg_hook *pagg_hook_new)
+{
+	struct pagg_hook *pagg_hook = NULL;
+
+	/* ADD NEW PAGG MODULE TO ACCESS LIST */
+	if (!pagg_hook_new)
+		return -EINVAL;			/* error */
+	if (!list_empty(&pagg_hook_new->entry))
+		return -EINVAL;			/* error */
+	if (pagg_hook_new->name == NULL || strlen(pagg_hook_new->name) > PAGG_NAMELN) 
+		return -EINVAL;			/* error */
+
+	/* Try to insert new hook entry into the pagg hook list */
+	down_write(&pagg_hook_list_sem);
+
+	pagg_hook = get_pagg_hook(pagg_hook_new->name);
+
+	if (pagg_hook) {
+		up_write(&pagg_hook_list_sem);
+		printk(KERN_WARNING "Attempt to register duplicate"
+				" PAGG support (name=%s)\n", pagg_hook_new->name);
+		return -EBUSY;
+	}
+
+	/* Okay, we can insert into the pagg hook list */
+	list_add_tail(&pagg_hook_new->entry, &pagg_hook_list);
+	up_write(&pagg_hook_list_sem);
+
+	printk(KERN_INFO "Registering PAGG support for (name=%s)\n",
+			pagg_hook_new->name);
+
+	return 0;					/* success */
+
+}
+
+
+/*
+ * unregister_pagg_hook
+ *
+ * Used to unregister pagg hooks and remove them from the pagg_hook_list.
+ * Once the pagg hook entry in the pagg_hook_list is found, all of the
+ * tasks are scanned and detached from any pagg containers defined by the
+ * pagg implementation module.
+ */
+int
+unregister_pagg_hook(struct pagg_hook *pagg_hook_old)
+{
+	struct pagg_hook *pagg_hook;
+	struct task_struct *task;
+
+
+	/* Check the validity of the arguments */
+	if (!pagg_hook_old)
+		return -EINVAL;			/* error */
+	if (list_empty(&pagg_hook_old->entry))
+		return -EINVAL;			/* error */
+	if (pagg_hook_old->name == NULL)
+		return -EINVAL;			/* error */
+
+	down_write(&pagg_hook_list_sem);
+
+	pagg_hook = get_pagg_hook(pagg_hook_old->name);
+	if (pagg_hook && pagg_hook == pagg_hook_old) {
+		/* 
+		 * Scan through processes on system and check for  
+		 * references to pagg containers for this pagg hook.
+		 * 
+		 * The module cannot be unloaded if there are references.
+		 */
+		read_lock(&tasklist_lock);
+		for_each_process(task) {
+			struct pagg *pagg = NULL;
+
+			get_task_struct(task); /* So the task doesn't vanish on us */
+			read_unlock(&tasklist_lock);
+			read_lock_pagg_list(task);
+			pagg = get_pagg(task, pagg_hook_old->name);
+			put_task_struct(task);
+			/* 
+			 * We won't be accessing pagg's memory, just need
+			 * to see if one existed - so we can release the task
+			 * lock now.
+			 */
+			read_unlock_pagg_list(task);
+			if (pagg) {
+				up_write(&pagg_hook_list_sem);
+				return -EBUSY;
+			}
+
+			/* lock the task list again so we get a valid task in the loop */
+			read_lock(&tasklist_lock);
+		}
+		read_unlock(&tasklist_lock);
+		list_del_init(&pagg_hook->entry);
+		up_write(&pagg_hook_list_sem);
+
+		printk(KERN_INFO "Unregistering PAGG support for"
+				" (name=%s)\n", pagg_hook_old->name);
+
+		return 0;			/* success */
+	}
+
+	up_write(&pagg_hook_list_sem);
+
+	printk(KERN_WARNING "Attempt to unregister PAGG support (name=%s)"
+			" failed - not found\n", pagg_hook_old->name);
+	
+	return -EINVAL;				/* error */
+}
+
+
+/*
+ * attach_pagg_list
+ *
+ * Used to attach a new task to the same pagg containers to which it's parent
+ * is attached.
+ *
+ * The "from" argument is the parent task.  The "to" argument is the child
+ * task. 
+ *
+ */
+int
+attach_pagg_list(struct task_struct *to_task, struct task_struct *from_task)
+{
+	struct list_head   *entry;
+	int  		   retcode = 0;
+
+
+
+	/* lock the parents pagg_list we are copying from */
+	read_lock_pagg_list(from_task);
+
+	list_for_each(entry, &from_task->pagg_list.head) {
+		struct pagg *to_pagg = NULL;
+		struct pagg *from_pagg = list_entry(entry, struct pagg, 
+							entry);
+		to_pagg = alloc_pagg(to_task, from_pagg->hook);
+		if (!to_pagg) {
+			retcode = -ENOMEM;
+			goto error_return;
+		}
+		retcode = attach_pagg(to_task, to_pagg, from_pagg->data);
+		if (retcode != 0) {
+			/* attach should issue error message */
+			goto error_return;
+		}
+	}
+
+	read_unlock_pagg_list(from_task);
+
+	return 0;					/* success */
+
+  error_return:
+	/* 
+	 * Clean up all the pagg attachments made on behalf of the new
+	 * task.  Set new task pagg ptr to NULL for return.
+	 */
+	read_unlock_pagg_list(from_task);
+	detach_pagg_list(to_task);
+	return retcode;				/* failure */
+}
+
+
+/*
+ * detach_pagg_list
+ *
+ * Used to detach a task from all pagg containers to which it is attached.
+ */
+int
+detach_pagg_list(struct task_struct *task)
+{
+	struct list_head   *entry;
+	int retcode = 0;
+	int rettmp = 0;
+
+	/* Remove ref. to paggs from task immediately */
+	write_lock_pagg_list(task);
+
+	if (list_empty(&task->pagg_list.head)) {
+		write_unlock_pagg_list(task);
+		return retcode;
+	} 
+
+	list_for_each(entry, &task->pagg_list.head) {
+		int rettemp = 0;
+		struct pagg *pagg = list_entry(entry, struct pagg, entry);
+
+		entry = &task->pagg_list.head;
+
+		rettemp = detach_pagg(task, pagg);
+		if (rettmp) {
+			/* an error message should be logged in free_pagg */
+			retcode = rettmp;
+		}
+		free_pagg(pagg);
+	}
+
+	write_unlock_pagg_list(task);
+
+	return retcode;	/* 0 = success, else return last code for failure */
+}
+
+
+/*
+ * exec_pagg_list
+ *
+ * Used to when a process that is in a pagg container does an exec.
+ *
+ * The "from" argument is the task.  The "name" argument is the name
+ * of the process being exec'ed.
+ *
+ */
+int exec_pagg_list(struct task_struct *task) {
+	struct list_head   *entry;
+
+
+
+	/* lock the parents pagg_list we are copying from */
+	read_lock_pagg_list(task);
+
+	list_for_each(entry, &task->pagg_list.head) {
+		struct pagg *pagg = list_entry(entry, struct pagg, 
+							entry);
+		exec_pagg(task, pagg);
+	}
+
+	read_unlock_pagg_list(task);
+	return 0;
+}
+
+
+EXPORT_SYMBOL(get_pagg);
+EXPORT_SYMBOL(alloc_pagg);
+EXPORT_SYMBOL(free_pagg);
+EXPORT_SYMBOL(attach_pagg_list);
+EXPORT_SYMBOL(detach_pagg_list);
+EXPORT_SYMBOL(exec_pagg_list);
+EXPORT_SYMBOL(register_pagg_hook);
+EXPORT_SYMBOL(unregister_pagg_hook);
+
+#endif /* CONFIG_PAGG */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-27  0:41     ` Chris Wright
@ 2004-04-27 21:00       ` Erik Jacobson
  2004-04-27 21:05         ` Chris Wright
  2004-04-29 21:10       ` Rik van Riel
  1 sibling, 1 reply; 31+ messages in thread
From: Erik Jacobson @ 2004-04-27 21:00 UTC (permalink / raw)
  To: Chris Wright; +Cc: Jesse Barnes, linux-kernel

> Nothing aside from what's on ckrm.sf.net.  I know they've been retooling
> it a bit, but I'm not up on the current status.

The web site contains API docs and some out-dated patches.  I joined the
mailing list and asked for the latest stuff.  They told me they'd be posting
a new patch to their mailing list soon.

I expect the "new" stuff uses the virtual filesystem interface and other
things suggested by the API docs they have.

I plan to look at this in more detail when I have the latest stuff to
think about.

My first impression is that pagg itself could be used to implement parts of
what ckrm is doing if they desired and not necessarily the other way around.

It's not clear to me if this is something that will be accepted in to the
main kernel source or not.

I'll post again when I learn more about it and can form a better opinion.

--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-27 21:00       ` Erik Jacobson
@ 2004-04-27 21:05         ` Chris Wright
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wright @ 2004-04-27 21:05 UTC (permalink / raw)
  To: Erik Jacobson; +Cc: Chris Wright, Jesse Barnes, linux-kernel

* Erik Jacobson (erikj@subway.americas.sgi.com) wrote:
> I expect the "new" stuff uses the virtual filesystem interface and other
> things suggested by the API docs they have.

*nod*

> My first impression is that pagg itself could be used to implement parts of
> what ckrm is doing if they desired and not necessarily the other way around.

Guess the key point is that many folks are interested in some sort of
aggregate resource container.  QoS on virtual servers, make rlimit type
of limits acutally useful, your needs, etc.  Be nice to come from
common infrastructure.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-27 20:51   ` Erik Jacobson
@ 2004-04-27 22:28     ` Chris Wright
  2004-04-28 14:55     ` Christoph Hellwig
  1 sibling, 0 replies; 31+ messages in thread
From: Chris Wright @ 2004-04-27 22:28 UTC (permalink / raw)
  To: Erik Jacobson; +Cc: Chris Wright, linux-kernel

* Erik Jacobson (erikj@subway.americas.sgi.com) wrote:
> I didn't choose to change the macros at this time - however, I'm not against
> changing them either - I just haven't done it yet.

OK.  I still think it's a good idea for readability and type safety.

> On Mon, 26 Apr 2004, Chris Wright wrote:
> > This looks like it's just the infrastructure, i.e. nothing is using it.
> > It seems like PAGG could be done on top of CKRM (albeit, with more
> > code).  But if the goal is to do some basic accounting, scheduling, etc.
> > on a resource group, wouldn't CKRM be more generic?
> 
> Right.  A couple examples of things we have that use it are CSA
> (oss.sgi.com/csa) and job.  job provides inescapable job containers that
> are also used by csa.
> 
> But what I presented here was just the infrastructure as you said.
> 
> Patches for inescapable job containers ('job') are available on the pagg web
> site as well (oss.sgi.com/pagg).

OK, thanks, I'll take a look.

> > > +       char		*name;	/* Name Key - restricted to 32 characters */
> >
> > why the restriction?
> 
> I'm open to suggestions.  Right now, this is usually set to something like
> "job" or similar.  The max length is enforced by the module that makes use
> of pagg.  For example, with the job package:

Right, if it's a pointer, and you guarantee it's NULL terminated, then I
don't see the point.  Otherwise, strncmp() or something?

> I fixed the tasklist issue you were concerned about.  Again, I didn't address
> the macro issue at this moment.

Alright, see below.

> > This looks like it leaks the just alloc'd to_pagg.
> 
> I agree that it looks suspect but I think it's OK.
> 
> You're talking about the case where the pagg was allocated, but couldn't
> attach I assume.
> 
> The alloc_pagg function adds that allocated pagg to the pagg list.  In
> error_return, detach_pagg_list is called so this pagg should be freed then.

Yes, I see it now, thanks.  BTW, I see a common idiom here of:

 if (!list_empty) {
	list_for_each() {
	}

 }

Seems like mostly an empty optimization, since list_for_each essentially
does that list_empty() check, no?

> +unregister_pagg_hook(struct pagg_hook *pagg_hook_old)
> +{
<snip>
> +	down_write(&pagg_hook_list_sem);
<snip>
> +		read_lock(&tasklist_lock);
> +		for_each_process(task) {
> +			struct pagg *pagg = NULL;
> +
> +			get_task_struct(task); /* So the task doesn't vanish on us */
> +			read_unlock(&tasklist_lock);

dropped tasklist_lock, task could exit, and unlink and potentially drop the 
only other ref.

> +			read_lock_pagg_list(task);
> +			pagg = get_pagg(task, pagg_hook_old->name);
> +			put_task_struct(task);

and this could have totally freed the memory to task.

still looks unsafe to me.

> +			/* 
> +			 * We won't be accessing pagg's memory, just need
> +			 * to see if one existed - so we can release the task
> +			 * lock now.
> +			 */
> +			read_unlock_pagg_list(task);
> +			if (pagg) {
> +				up_write(&pagg_hook_list_sem);
> +				return -EBUSY;
> +			}
> +
> +			/* lock the task list again so we get a valid task in the loop */
> +			read_lock(&tasklist_lock);
> +		}
> +		read_unlock(&tasklist_lock);
> +		list_del_init(&pagg_hook->entry);
> +		up_write(&pagg_hook_list_sem);

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-27 20:51   ` Erik Jacobson
  2004-04-27 22:28     ` Chris Wright
@ 2004-04-28 14:55     ` Christoph Hellwig
  2004-04-29 19:20       ` Paul Jackson
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2004-04-28 14:55 UTC (permalink / raw)
  To: Erik Jacobson; +Cc: Chris Wright, linux-kernel

Highlevel comments:

 - without any user merging doesn't make sense
 - you probably want to update all the function/data structure comments
   to the normal kernel-doc style.

> +-  init/Config.help
> +-  init/Config.in

Where did you find these files? :)

> +-  Documentation/Configure.help

Dito.

> +This implementation of PAGG supports the i386 and ia64 architecture.

Can't find anything architecture-specific here.


The whole chapter 2 of this document doesn't belong into the kernel
tree.

> @@ -1151,6 +1152,7 @@
>  	retval = search_binary_handler(&bprm,regs);
>  	if (retval >= 0) {
>  		free_arg_pages(&bprm);
> +		exec_pagg_list_chk(current);

This looks rather misnamed.  pagg_exec sounds like a better name,
with __pagg_exec for the implementation after the inline list_empty
check.

> +#ifndef _PAGG_H
> +#define _PAGG_H

should be _LINUX_PAGG_H

> +#define INIT_PAGG_LIST(l)						\
> +do {									\
> +	INIT_LIST_HEAD(l.head);						\
> +	init_rwsem(l.sem);						\
> +} while(0)

braces around l here to avoid too much trouble?

> +struct pagg_hook {
> +       struct module	*module;
> +       char		*name;	/* Name Key - restricted to 32 characters */
> +       int		(*attach)(struct task_struct *, struct pagg *, void*);
> +       int		(*detach)(struct task_struct *, struct pagg *);
> +       int		(*init)(struct task_struct *, struct pagg *);
> +       void		*data;	/* Opaque module specific data */
> +       struct list_head	entry;	/* List pointers */
> +       void		(*exec)(struct task_struct *, struct pagg *);
> +};

The ordering here looks strange, please keep data and methods ordered,
ala:

struct pagg_hook {
       struct module	*module;
       char		*name;	/* Name Key - restricted to 32 characters */
       void		*data;	/* Opaque module specific data */
       struct list_head	entry;	/* List pointers */
       int		(*init)(struct task_struct *, struct pagg *);
       int		(*attach)(struct task_struct *, struct pagg *, void*);
       int		(*detach)(struct task_struct *, struct pagg *);
       void		(*exec)(struct task_struct *, struct pagg *);
};

> +extern struct pagg *get_pagg(struct task_struct *task, char *key);
> +extern struct pagg *alloc_pagg(struct task_struct *task, 
> +				      struct pagg_hook *pt);
> +extern void free_pagg(struct pagg *pagg);
> +extern int register_pagg_hook(struct pagg_hook *pt_new);
> +extern int unregister_pagg_hook(struct pagg_hook *pt_old);
> +extern int attach_pagg_list(struct task_struct *to_task, 
> +					struct task_struct *from_task);
> +extern int detach_pagg_list(struct task_struct *task);
> +extern int exec_pagg_list(struct task_struct *task);

I'd call all these pagg_*.  Also please kill the _list postfixes,
they're extremly confusing.

> +/* 
> + *  Macro used when a child process must inherit attachment to pagg 
> + *  containers from the parent.
> + *
> + *  Arguments:
> + *	ct:	child (struct task_struct *)
> + *	pt:	parent (struct task_struct *)
> + *	cf:	clone_flags
> + */
> +#define attach_pagg_list_chk(ct, pt)					\
> +do {									\
> +	INIT_PAGG_LIST(&ct->pagg_list);					\
> +	if (!list_empty(&pt->pagg_list.head)) {				\
> +		if (attach_pagg_list(ct, pt) != 0)			\
> +			goto bad_fork_cleanup;				\
> +	}								\
> +} while(0)

Should probably be an inline, ala:

static inline int pagg_attach(struct task_struct *child,
			      struct task_struct *parent)
{
	INIT_PAGG_LIST(&child->pagg_list);
	if (!list_empty(&parent->pagg_list.head))
		return __pagg_attach(child, parent));
	return 0;
}

and then handle the error in the caller.


> +#define detach_pagg_list_chk(t)					\
> +do {									\
> +	if (!list_empty(&t->pagg_list.head)) {				\
> +		detach_pagg_list(t);					\
> +	}								\
> +} while(0)

static inline void pagg_detach(struct task_struct *task)
{
	if (!list_empty(&task->pagg_list.head))
		__pagg_detach(task);
}

> +#define exec_pagg_list_chk(t)						\
> +do {									\
> +	if (!list_empty(&t->pagg_list.head)) {				\
> +		exec_pagg_list(t);					\
> +	}								\
> +} while(0)

Dito.

> +	/* Invoke module detach callback for the pagg & task */
> +#define detach_pagg(t, p)		p->hook->detach(t, p)
> +	/* Invoke module attach callback for the pagg & task */
> +#define attach_pagg(t, p, d)  		p->hook->attach(t, p, (void *)d)
> +	/* Allows optional callout at exec */
> +#define exec_pagg(t, p)  		do {				\
> +						if (p->hook->exec)	\
> +						    p->hook->exec(t, p);\
> +					} while(0)

please kill all these wrappers.  in linux we call methods directly,
unlike the sysv style :)  Also why is the exec hook conditional and the
others not?   please make that coherent.

> 
> +	/* Allows module to set data item for pagg */
> +#define set_pagg(p, d)   		p->data = (void *)d
> +	/* Down the read semaphore for the task's pagg_list */
> +#define read_lock_pagg_list(t)		down_read(&t->pagg_list.sem)
> +	/* Up the read semaphore for the task's pagg_list */
> +#define read_unlock_pagg_list(t) 	up_read(&t->pagg_list.sem)
> +	/* Down the write semaphore for the task's pagg_list */
> +#define write_lock_pagg_list(t)		down_write(&t->pagg_list.sem)
> +	/* Up the write semaphore for the task's pagg_list */
> +#define write_unlock_pagg_list(t) 	up_write(&t->pagg_list.sem)

thos were already mentioned, please kill all those accesors..

> +#if defined(CONFIG_PAGG)

#ifdef CONFIG_PAGG is preferred style in linux.

> +++ 2.6pagg-patch/kernel/Makefile	2004-04-13 21:42:35.000000000 -0500
> @@ -7,7 +7,7 @@
>  	    sysctl.o capability.o ptrace.o timer.o user.o \
>  	    signal.o sys.o kmod.o workqueue.o pid.o \
>  	    rcupdate.o intermodule.o extable.o params.o posix-timers.o \
> -	    kthread.o
> +	    kthread.o pagg.o

do you really want to build in pagg.o all the time, even without
CONFIG_PAGG set?

>  obj-$(CONFIG_COMPAT) += compat.o
> +obj-$(CONFIG_PAGG) += pagg.o

.. then you wouldn't need this line at least :)

> + *               structure maintains pointers to callback functions and
> + *               data strucures maintained in modules that have
> + *               registered with the kernel as pagg container
> + *               providers.
> + */
> +
> +#include <linux/config.h>
> +
> +#ifdef CONFIG_PAGG

this one isn't needed if you properly compile pagg.o only if CONFIG_PAGG
is set..

> +#include <asm/uaccess.h>
> +#include <linux/slab.h>
> +#include <linux/sched.h>
> +#include <asm/semaphore.h>
> +#include <linux/smp_lock.h>
> +#include <linux/proc_fs.h>
> +#include <linux/module.h>
> +#include <linux/pagg.h>

Please include asm/ headers after linux/.  smp_lock.h, proc_fs.h amd
uaccess.h don't seem to be needed.

> +struct pagg *
> +get_pagg(struct task_struct *task, char *key)
> +{
> +	struct list_head *entry;
> +
> +	list_for_each(entry, &task->pagg_list.head) {
> +		struct pagg *pagg = list_entry(entry, struct pagg, entry);

list_for_each_entry()

> +		if (!strcmp(pagg->hook->name,key)) {
> +			return pagg;
> +		}

superflous braces here.

> +	pagg = (struct pagg *)kmalloc(sizeof(struct pagg), GFP_KERNEL);

no need to cast.

> +free_pagg(struct pagg *pagg) 
> +{
> +
> +	list_del(&pagg->entry);
> +	kfree(pagg);
> +}

that blank line over the list_del looks rather strange..

> +	list_for_each(entry, &pagg_hook_list) {
> +		pagg_hook = list_entry(entry, struct pagg_hook, entry);

list_for_each_entry again.

> +	/* Try to insert new hook entry into the pagg hook list */
> +	down_write(&pagg_hook_list_sem);

does this really need a semaphore?  a spinlock looks like it could do it
aswell - or am I missing a blocking function somewhere?

> +	printk(KERN_INFO "Registering PAGG support for (name=%s)\n",
> +			pagg_hook_new->name);

sounds rather verbose, no?

> +		for_each_process(task) {
> +			struct pagg *pagg = NULL;
> +
> +			get_task_struct(task); /* So the task doesn't vanish on us */
> +			read_unlock(&tasklist_lock);
> +			read_lock_pagg_list(task);
> +			pagg = get_pagg(task, pagg_hook_old->name);
> +			put_task_struct(task);
> +			/* 
> +			 * We won't be accessing pagg's memory, just need
> +			 * to see if one existed - so we can release the task
> +			 * lock now.
> +			 */
> +			read_unlock_pagg_list(task);
> +			if (pagg) {
> +				up_write(&pagg_hook_list_sem);
> +				return -EBUSY;
> +			}
> +

if the pagg list lock wasn't a sleeping lock this could be much simpler,
no?

> +		printk(KERN_INFO "Unregistering PAGG support for"
> +				" (name=%s)\n", pagg_hook_old->name);

also overly verbose.

> +	/* Remove ref. to paggs from task immediately */
> +	write_lock_pagg_list(task);
> +
> +	if (list_empty(&task->pagg_list.head)) {
> +		write_unlock_pagg_list(task);
> +		return retcode;
> +	} 
> +
> +	list_for_each(entry, &task->pagg_list.head) {
> +		int rettemp = 0;
> +		struct pagg *pagg = list_entry(entry, struct pagg, entry);

list_for_each* is a noop for an empty list.  also you want
list_for_each_entry again.

> +int exec_pagg_list(struct task_struct *task) {

brace wants to be on the next line.

> 
> +	struct list_head   *entry;
> +
> +
> +

huh?

> +EXPORT_SYMBOL(get_pagg);
> +EXPORT_SYMBOL(alloc_pagg);
> +EXPORT_SYMBOL(free_pagg);
> +EXPORT_SYMBOL(attach_pagg_list);
> +EXPORT_SYMBOL(detach_pagg_list);
> +EXPORT_SYMBOL(exec_pagg_list);
> +EXPORT_SYMBOL(register_pagg_hook);
> +EXPORT_SYMBOL(unregister_pagg_hook);

should probably be _GPL as this directly messed with highly kernel
internal process managment.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-28 14:55     ` Christoph Hellwig
@ 2004-04-29 19:20       ` Paul Jackson
  2004-04-29 19:27         ` Chris Wright
  2004-04-29 19:29         ` Christoph Hellwig
  0 siblings, 2 replies; 31+ messages in thread
From: Paul Jackson @ 2004-04-29 19:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: erikj, chrisw, linux-kernel

>  - without any user merging doesn't make sense

Could you try restating this particular comment, Christoph?
I am failing to make any sense of it.

Thanks.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-29 19:20       ` Paul Jackson
@ 2004-04-29 19:27         ` Chris Wright
  2004-04-29 19:29         ` Christoph Hellwig
  1 sibling, 0 replies; 31+ messages in thread
From: Chris Wright @ 2004-04-29 19:27 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Christoph Hellwig, erikj, chrisw, linux-kernel

* Paul Jackson (pj@sgi.com) wrote:
> >  - without any user merging doesn't make sense
> 
> Could you try restating this particular comment, Christoph?
> I am failing to make any sense of it.

Same thing I was trying to get across.  Merging an infrastructure with
no users is a tough sell.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-29 19:20       ` Paul Jackson
  2004-04-29 19:27         ` Chris Wright
@ 2004-04-29 19:29         ` Christoph Hellwig
  2004-04-29 19:34           ` Paul Jackson
  2004-04-29 19:53           ` Erik Jacobson
  1 sibling, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2004-04-29 19:29 UTC (permalink / raw)
  To: Paul Jackson; +Cc: erikj, chrisw, linux-kernel

On Thu, Apr 29, 2004 at 12:20:26PM -0700, Paul Jackson wrote:
> >  - without any user merging doesn't make sense
> 
> Could you try restating this particular comment, Christoph?
> I am failing to make any sense of it.

without merging anything that actually uses pagg getting pagg itself
into the kernel doesn't make a lot of sense.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-29 19:29         ` Christoph Hellwig
@ 2004-04-29 19:34           ` Paul Jackson
  2004-04-29 19:53           ` Erik Jacobson
  1 sibling, 0 replies; 31+ messages in thread
From: Paul Jackson @ 2004-04-29 19:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: erikj, chrisw, linux-kernel

Thanks for your clarifications, Christoph and Chris.
Crystal clear now ;).

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <pj@sgi.com> 1.650.933.1373

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-29 19:29         ` Christoph Hellwig
  2004-04-29 19:34           ` Paul Jackson
@ 2004-04-29 19:53           ` Erik Jacobson
  2004-04-29 21:20             ` Rik van Riel
  1 sibling, 1 reply; 31+ messages in thread
From: Erik Jacobson @ 2004-04-29 19:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Paul Jackson, chrisw, linux-kernel

> without merging anything that actually uses pagg getting pagg itself
> into the kernel doesn't make a lot of sense.

We have job and CSA that make use of it but I didn't know if I should
push them all at the same time, or one at a time.  As it is, the suggestions
provided for PAGG are making for some significant revisions.  The changes
require adjustments to job as well.

I'm hoping to get something with nearly all the suggestions implemented for
PAGG posted, then I can post job if you think that will help.  I'm sure
there will be lots of comments on that as well -- and those comments will
probably require revisions to CSA :-)

If you're saying there really is zero chance even if I implement all the
suggestions and have things that use it, I guess I'll just have to live with
that (what's my other choice? :). At least the patches will be better off
even if they exist as patches forever.  We're really hoping we can get this in
though...  So I want to do everything I can to give it the best shot possible.

--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-27  0:41     ` Chris Wright
  2004-04-27 21:00       ` Erik Jacobson
@ 2004-04-29 21:10       ` Rik van Riel
  1 sibling, 0 replies; 31+ messages in thread
From: Rik van Riel @ 2004-04-29 21:10 UTC (permalink / raw)
  To: Chris Wright; +Cc: Jesse Barnes, linux-kernel, Erik Jacobson, Shailabh Nagar

On Mon, 26 Apr 2004, Chris Wright wrote:

> > Quite possibly.  Do you have a pointer to the latest bits/design docs?
> 
> Nothing aside from what's on ckrm.sf.net.  I know they've been retooling
> it a bit, but I'm not up on the current status.

Shailabh posted the latest CKRM code to lkml yesterday.

CKRM + rcfs seems to be slightly more capable than the PAGG code;
furthermore, CKRM already has a number of resource controller
modules while I'm only seeing very basic PAGG infrastructure.

Now would be a good time to join forces...

I admit that the CKRM project so far seems to have been mostly
an IBM internal project, but since there is community interest
it'd be time to get the community involved.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-29 19:53           ` Erik Jacobson
@ 2004-04-29 21:20             ` Rik van Riel
  2004-04-30  6:17               ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Rik van Riel @ 2004-04-29 21:20 UTC (permalink / raw)
  To: Erik Jacobson; +Cc: Christoph Hellwig, Paul Jackson, chrisw, linux-kernel

On Thu, 29 Apr 2004, Erik Jacobson wrote:

> If you're saying there really is zero chance even if I implement all the
> suggestions and have things that use it, I guess I'll just have to live
> with that (what's my other choice? :).

I suspect there's a rather good chance of merging a common
PAGG/CKRM infrastructure, since they pretty much do the same
thing at the core and they both have different functionality
implemented on top of the core process grouping.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-29 21:20             ` Rik van Riel
@ 2004-04-30  6:17               ` Christoph Hellwig
  2004-04-30 11:08                 ` Guillaume Thouvenin
  2004-04-30 12:54                 ` Rik van Riel
  0 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2004-04-30  6:17 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Erik Jacobson, Paul Jackson, chrisw, linux-kernel

> I suspect there's a rather good chance of merging a common
> PAGG/CKRM infrastructure, since they pretty much do the same
> thing at the core and they both have different functionality
> implemented on top of the core process grouping.

Still doesn't make a lot of sense.  CKRM is a huge cludgy beast poking
everywhere while PAGG is a really small layer to allow kernel modules
keeping per-process state.  If CKRM gets merged at all (and the current
looks far to horrible and the gains are rather unclear) it should layer
ontop of something like PAGG for the functionality covered by it.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-26 22:04 [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel Erik Jacobson
  2004-04-26 23:39 ` Chris Wright
@ 2004-04-30  8:54 ` Guillaume Thouvenin
  2004-05-20 21:16 ` Erik Jacobson
  2 siblings, 0 replies; 31+ messages in thread
From: Guillaume Thouvenin @ 2004-04-30  8:54 UTC (permalink / raw)
  To: Erik Jacobson; +Cc: linux-kernel

Selon Erik Jacobson <erikj@subway.americas.sgi.com>:

> What is Process Aggregates (PAGG)?
> ----------------------------------
> PAGG provides for the implementation of arbitrary process groups in Linux.
> It is a building block for kernel modules that can group processes
> together into a single set for specific purposes beyond the traditional
> process groups.

Hello,

    I'm working on a project (ELSA -Enhanced Linux System Accounting) that is
very similar to PAGG and CSA. My goal is to improve accounting on Linux by using
containers to group processes according to the system administrator policy.
Currently I provide a patch to 2.6.5 kernel ( http://elsa.sourceforge.net ) that
provides infrastructures to manipulate containers. The advantage of my patch is
that manipulating containers is very simple. I think that it could be very
interesting to share our work to see what could be done. Thus I will take a look
to PAGG implementation and also to the CKRM solution proposed by Chris Wright. 

Best,
Guillaume

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30  6:17               ` Christoph Hellwig
@ 2004-04-30 11:08                 ` Guillaume Thouvenin
  2004-04-30 18:00                   ` Shailabh
  2004-04-30 18:28                   ` Rik van Riel
  2004-04-30 12:54                 ` Rik van Riel
  1 sibling, 2 replies; 31+ messages in thread
From: Guillaume Thouvenin @ 2004-04-30 11:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rik van Riel, Erik Jacobson, Paul Jackson, chrisw, linux-kernel

Selon Christoph Hellwig <hch@infradead.org>:

> > I suspect there's a rather good chance of merging a common
> > PAGG/CKRM infrastructure, since they pretty much do the same
> > thing at the core and they both have different functionality
> > implemented on top of the core process grouping.
> 
> Still doesn't make a lot of sense.  CKRM is a huge cludgy beast poking
> everywhere while PAGG is a really small layer to allow kernel modules
> keeping per-process state.  If CKRM gets merged at all (and the current
> looks far to horrible and the gains are rather unclear) it should layer
> ontop of something like PAGG for the functionality covered by it.

And what about put the management of containers outside the kernel. We could for
exemple use a program that will listen file /proc/acct_event and execute a
programs to handle the event like ACPID does. Of course it will need some kernel
modifications but those modifications will be small as process aggregation will
be done outside the kernel. We could also use relayfs to exchange datas between
user program and the kernel.

Guillaume

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30  6:17               ` Christoph Hellwig
  2004-04-30 11:08                 ` Guillaume Thouvenin
@ 2004-04-30 12:54                 ` Rik van Riel
  2004-04-30 13:06                   ` Christoph Hellwig
  2004-04-30 15:59                   ` Chris Wright
  1 sibling, 2 replies; 31+ messages in thread
From: Rik van Riel @ 2004-04-30 12:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Erik Jacobson, Paul Jackson, chrisw, linux-kernel

On Fri, 30 Apr 2004, Christoph Hellwig wrote:

> Still doesn't make a lot of sense.  CKRM is a huge cludgy beast poking
> everywhere while PAGG is a really small layer to allow kernel modules
> keeping per-process state.  If CKRM gets merged at all (and the current
> looks far to horrible and the gains are rather unclear) it should layer
> ontop of something like PAGG for the functionality covered by it.

What was the last time you looked at the CKRM source?

Sure it's a bit bigger than PAGG, but that's also because
it includes the functionality to change the group a process
belongs to and other things that don't seem to be included
in the PAGG patch.

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 12:54                 ` Rik van Riel
@ 2004-04-30 13:06                   ` Christoph Hellwig
  2004-04-30 13:28                     ` Chris Mason
                                       ` (2 more replies)
  2004-04-30 15:59                   ` Chris Wright
  1 sibling, 3 replies; 31+ messages in thread
From: Christoph Hellwig @ 2004-04-30 13:06 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Erik Jacobson, Paul Jackson, chrisw, linux-kernel

On Fri, Apr 30, 2004 at 08:54:08AM -0400, Rik van Riel wrote:
> What was the last time you looked at the CKRM source?

the day before yesterday (the patch in SuSE's tree because there
doesn't seem to be any official patch on their website)

> Sure it's a bit bigger than PAGG, but that's also because
> it includes the functionality to change the group a process
> belongs to and other things that don't seem to be included
> in the PAGG patch.

Again, pagg doesn't even play in that league.  It's really just a tiny
meachnism to allow a kernel module keep per-process data.  Policies
like process-groups can be implemented ontop of that.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 13:06                   ` Christoph Hellwig
@ 2004-04-30 13:28                     ` Chris Mason
  2004-04-30 16:50                       ` Shailabh
  2004-04-30 15:22                     ` Rik van Riel
  2004-04-30 17:53                     ` Shailabh
  2 siblings, 1 reply; 31+ messages in thread
From: Chris Mason @ 2004-04-30 13:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rik van Riel, Erik Jacobson, Paul Jackson, chrisw, linux-kernel

On Fri, 2004-04-30 at 09:06, Christoph Hellwig wrote:
> On Fri, Apr 30, 2004 at 08:54:08AM -0400, Rik van Riel wrote:
> > What was the last time you looked at the CKRM source?
> 
> the day before yesterday (the patch in SuSE's tree because there
> doesn't seem to be any official patch on their website)
> 
Somewhat unrelated, but the day before yesterday suse was at ckrm-e5,
we're now at ckrm-e12.

-chris



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 13:06                   ` Christoph Hellwig
  2004-04-30 13:28                     ` Chris Mason
@ 2004-04-30 15:22                     ` Rik van Riel
  2004-04-30 16:45                       ` Christoph Hellwig
  2004-04-30 17:53                     ` Shailabh
  2 siblings, 1 reply; 31+ messages in thread
From: Rik van Riel @ 2004-04-30 15:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Erik Jacobson, Paul Jackson, chrisw, linux-kernel

On Fri, 30 Apr 2004, Christoph Hellwig wrote:

> Again, pagg doesn't even play in that league.  It's really just a tiny
> meachnism to allow a kernel module keep per-process data.  Policies like
> process-groups can be implemented on top of that.

So basically you're arguing that PAGG is better because it
doesn't do what's needed ? ;)

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 12:54                 ` Rik van Riel
  2004-04-30 13:06                   ` Christoph Hellwig
@ 2004-04-30 15:59                   ` Chris Wright
  1 sibling, 0 replies; 31+ messages in thread
From: Chris Wright @ 2004-04-30 15:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christoph Hellwig, Erik Jacobson, Paul Jackson, chrisw, linux-kernel

* Rik van Riel (riel@redhat.com) wrote:
> On Fri, 30 Apr 2004, Christoph Hellwig wrote:
> 
> > Still doesn't make a lot of sense.  CKRM is a huge cludgy beast poking
> > everywhere while PAGG is a really small layer to allow kernel modules
> > keeping per-process state.  If CKRM gets merged at all (and the current
> > looks far to horrible and the gains are rather unclear) it should layer
> > ontop of something like PAGG for the functionality covered by it.
> 
> What was the last time you looked at the CKRM source?
> 
> Sure it's a bit bigger than PAGG, but that's also because
> it includes the functionality to change the group a process
> belongs to and other things that don't seem to be included
> in the PAGG patch.

I looked briefly at one of the PAGG modules called job.  It contains the
grouping functionalities.  I suspect that this is something that would
be a common need for all users of a resource grouping mechanism.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 15:22                     ` Rik van Riel
@ 2004-04-30 16:45                       ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2004-04-30 16:45 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Erik Jacobson, Paul Jackson, chrisw, linux-kernel

On Fri, Apr 30, 2004 at 11:22:49AM -0400, Rik van Riel wrote:
> > Again, pagg doesn't even play in that league.  It's really just a tiny
> > meachnism to allow a kernel module keep per-process data.  Policies like
> > process-groups can be implemented on top of that.
> 
> So basically you're arguing that PAGG is better because it
> doesn't do what's needed ? ;)

I told you a bunch of times that's it's a different thing.  Simply keeping
per-process state might be a useful building block for some monster resource
whatever fuckup, but certainly not the other way around.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 13:28                     ` Chris Mason
@ 2004-04-30 16:50                       ` Shailabh
  0 siblings, 0 replies; 31+ messages in thread
From: Shailabh @ 2004-04-30 16:50 UTC (permalink / raw)
  To: Chris Mason
  Cc: Christoph Hellwig, Rik van Riel, Erik Jacobson, Paul Jackson,
	chrisw, linux-kernel

Chris Mason wrote:
> On Fri, 2004-04-30 at 09:06, Christoph Hellwig wrote:
> 
>>On Fri, Apr 30, 2004 at 08:54:08AM -0400, Rik van Riel wrote:
>>
>>>What was the last time you looked at the CKRM source?
>>
>>the day before yesterday (the patch in SuSE's tree because there
>>doesn't seem to be any official patch on their website)

That was rectified concommitant with the lkml posting of the patches
for ckrm-E12. Please see the Implementation section of 
http://ckrm.sf.net for all the current patches.
	

>>
> 
> Somewhat unrelated, but the day before yesterday suse was at ckrm-e5,
> we're now at ckrm-e12.

Good point. One of the major changes between ckrm-e5 and ckrm-e12 is a
serious attempt at modularizing and cleaning up the internal 
interfaces which should help allay concerns about it being a big piece 
of code which has to be taken in whole.



 From the view of kernel developers considering merging CKRM into the
kernel, only two components are essential:

core
rcfs

Of course, to do anything useful, you need to have either one of

task_class: groups tasks together
socket_class: groups sockets together

the two are completely independent.

Once a particular grouping is chosen, one can further selectively 
include one or more resource controllers associated with the grouping.
i.e. for task_classes, choose one or more of cpu, mem, io, 
numtasks....; for socket_class, choose one or more of listenaq,<future 
socket based controllers, potentially including outbound network 
control)....


The same kind of flexibility that is available to kernel developers 
for integrating parts of ckrm selectively, will also remain available 
to users, even if all of CKRM is included. So a user could enable just 
task_classes and the cpu controller if s/he doesn't care about memory, 
io or any other kind of control.



-- Shailabh





^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 13:06                   ` Christoph Hellwig
  2004-04-30 13:28                     ` Chris Mason
  2004-04-30 15:22                     ` Rik van Riel
@ 2004-04-30 17:53                     ` Shailabh
  2004-04-30 18:15                       ` Chris Wright
  2 siblings, 1 reply; 31+ messages in thread
From: Shailabh @ 2004-04-30 17:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rik van Riel, Erik Jacobson, Paul Jackson, chrisw, linux-kernel

Christoph Hellwig wrote:
> On Fri, Apr 30, 2004 at 08:54:08AM -0400, Rik van Riel wrote:
> 
>>What was the last time you looked at the CKRM source?
> 
> 
> the day before yesterday (the patch in SuSE's tree because there
> doesn't seem to be any official patch on their website)
> 
> 
>>Sure it's a bit bigger than PAGG, but that's also because
>>it includes the functionality to change the group a process
>>belongs to and other things that don't seem to be included
>>in the PAGG patch.
> 
> 
> Again, pagg doesn't even play in that league.  It's really just a tiny
> meachnism to allow a kernel module keep per-process data.  

Speaking of per-process data, one of the classification engines of 
CKRM called crbce, implemented as a module, allows per-process data to 
be  sent to userland.  crbce in particular, exports data on the delays 
   seen by processes in a) waiting for cpu time after being runnable 
b) page fault service time c) io service time etc. (getting the data 
requires another kernel patch)....so per-process data needs can be met 
through CKRM, though that is not the intent or main objective of the 
project.


> Policies
> like process-groups can be implemented ontop of that.

This is true if one is only interested in data gathering or 
coarse-grain control. One could monitor per-process stats and fiddle 
with each process' rlimits (assuming all the ones needed are 
available) and achieve coarse-grain group control.

But if processes leave and join process groups  rapidly, you need the 
schedulers and the core kernel to be aware of the groupings and 
schedule resources accordingly.

In CKRM, the premise is that the privileged user defines the way 
processes get grouped and could do so in a way that leads to rapid 
changes in group membership. So having group control/monitoring 
policies implemented as an externally loaded module (not talking of 
scheduler modifications as modules, which is a no-no)  is not a 
palatable option.


-- Shailabh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 11:08                 ` Guillaume Thouvenin
@ 2004-04-30 18:00                   ` Shailabh
  2004-04-30 18:28                   ` Rik van Riel
  1 sibling, 0 replies; 31+ messages in thread
From: Shailabh @ 2004-04-30 18:00 UTC (permalink / raw)
  To: Guillaume Thouvenin
  Cc: Christoph Hellwig, Rik van Riel, Erik Jacobson, Paul Jackson,
	chrisw, linux-kernel

Guillaume Thouvenin wrote:
> Selon Christoph Hellwig <hch@infradead.org>:
> 
> 
>>>I suspect there's a rather good chance of merging a common
>>>PAGG/CKRM infrastructure, since they pretty much do the same
>>>thing at the core and they both have different functionality
>>>implemented on top of the core process grouping.
>>
>>Still doesn't make a lot of sense.  CKRM is a huge cludgy beast poking
>>everywhere while PAGG is a really small layer to allow kernel modules
>>keeping per-process state.  If CKRM gets merged at all (and the current
>>looks far to horrible and the gains are rather unclear) it should layer
>>ontop of something like PAGG for the functionality covered by it.
> 
> 
> And what about put the management of containers outside the kernel. We could for
> exemple use a program that will listen file /proc/acct_event and execute a
> programs to handle the event like ACPID does. Of course it will need some kernel
> modifications but those modifications will be small as process aggregation will
> be done outside the kernel. We could also use relayfs to exchange datas between
> user program and the kernel.
> 
> Guillaume


Guillaume,

As mentioned in my response to Christoph, keeping process aggregation 
outside the kernel (or as a module that sits atop process-centric 
patches) will work only for statistics gathering and coarse-grain 
control.

CKRM's crbce controller (will be put up on http://ckrm.sf.net within a 
day...) uses relayfs to send per-process data to a privileged user 
program (will also be included) that can use the data as it pleases, 
including doing aggregation.

We think a class-aware kernel is the right way to go and it can be 
done with sufficiently low impact that one doesn't have to 
unnecessarily limit the flexibility of users in defining process 
groups (=classes) or the time periods over which shares can be enforced.

-- Shailabh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 17:53                     ` Shailabh
@ 2004-04-30 18:15                       ` Chris Wright
  0 siblings, 0 replies; 31+ messages in thread
From: Chris Wright @ 2004-04-30 18:15 UTC (permalink / raw)
  To: Shailabh
  Cc: Christoph Hellwig, Rik van Riel, Erik Jacobson, Paul Jackson,
	chrisw, linux-kernel

* Shailabh (nagar@watson.ibm.com) wrote:
> In CKRM, the premise is that the privileged user defines the way 
> processes get grouped and could do so in a way that leads to rapid 
> changes in group membership. So having group control/monitoring 
> policies implemented as an externally loaded module (not talking of 
> scheduler modifications as modules, which is a no-no)  is not a 
> palatable option.

Yes, this is why I looked at the PAGG job module.  I was looking for how
it might have mucked (externally) with scheduler.  At any rate, I found
all the primitives for joining/leaving/defining groups here which I'd
have expected closer to core.

thanks,
-chris
-- 
Linux Security Modules     http://lsm.immunix.org     http://lsm.bkbits.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-30 11:08                 ` Guillaume Thouvenin
  2004-04-30 18:00                   ` Shailabh
@ 2004-04-30 18:28                   ` Rik van Riel
  1 sibling, 0 replies; 31+ messages in thread
From: Rik van Riel @ 2004-04-30 18:28 UTC (permalink / raw)
  To: Guillaume Thouvenin
  Cc: Christoph Hellwig, Erik Jacobson, Paul Jackson, chrisw, linux-kernel

On Fri, 30 Apr 2004, Guillaume Thouvenin wrote:

> And what about put the management of containers outside the kernel.

User Mode Linux would be an option indeed, if the overhead
was low enough for general use.  It's quite possible that
this can be done...

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel
  2004-04-26 22:04 [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel Erik Jacobson
  2004-04-26 23:39 ` Chris Wright
  2004-04-30  8:54 ` Guillaume Thouvenin
@ 2004-05-20 21:16 ` Erik Jacobson
  2 siblings, 0 replies; 31+ messages in thread
From: Erik Jacobson @ 2004-05-20 21:16 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 336 bytes --]

Attached is a new PAGG patch for the 2.6.6 kernel.  I'm just inserting it
here in the old discussion thread for now.

Note that this patch implements much of the feedback received.  However, there
are still some pieces of feedback we need to implement yet.

--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

[-- Attachment #2: Type: TEXT/PLAIN, Size: 33933 bytes --]

diff -Naru 2.6-patch/Documentation/pagg.txt 2.6pagg-patch/Documentation/pagg.txt
--- 2.6-patch/Documentation/pagg.txt	1969-12-31 18:00:00.000000000 -0600
+++ 2.6pagg-patch/Documentation/pagg.txt	2004-04-30 16:57:38.000000000 -0500
@@ -0,0 +1,157 @@
+Linux Process Aggregates (PAGG)
+-------------------------------
+
+1. Description
+
+The process aggregates infrastructure, or PAGG, provides a generalized
+mechanism for providing arbitrary process groups in Linux.  PAGG consists
+of a series of functions for registering and unregistering support
+for new types of process aggregation containers with the kernel.
+This is similar to the support currently provided within Linux that
+allows for dynamic support of filesystems, block and character devices,
+symbol tables, network devices, serial devices, and execution domains.
+This implementation of PAGG provides developers the basic hooks necessary
+to implement kernel modules for specific process containers, such as
+the job container.
+
+The do_fork function in the kernel was altered to support PAGG.  If a
+process is attached to any PAGG containers and subsequently forks a
+child process, the child process will also be attached to the same PAGG
+containers.  The PAGG containers involved during the fork are notified
+that a new process has been attached.  The notification is accomplished
+via a callback function provided by the PAGG module.
+
+The do_exit function in the kernel has also been altered.  If a process
+is attached to any PAGG containers and that process is exiting, the PAGG
+containers are notified that a process has detached from the container.
+The notification is accomplished via a callback function provided by
+the PAGG module.
+
+The sys_execve function has been modified to support an optional callout
+that can be run when a process in a pagg list does an exec.  It can be 
+used, for example, by other kernel modules that wish to do advanced CPU
+placement on multi-processor systems (just one example).
+
+Additional details concerning this implementation of the process aggregates
+infrastructure are described in the sections that follow.
+
+
+2.  Kernel Changes
+
+This section describe the files and data strcutrues that are involved in this
+implementation of PAGG.  Both modified as well as new files and data
+structures are discussed.
+
+2.1. Modified Files
+
+The following files were modified to implement PAGG:
+
+-  include/linux/init_task.h
+-  include/linux/sched.h
+-  kernel/Makefile
+-  kernel/exit.c
+-  kernel/fork.c
+-  fs/exec.c
+-  init/Kconfig
+
+2.2. New Files
+
+The following files were added to implement PAGG:
+
+-  Documentation/pagg.txt
+-  include/linux/pagg.h
+-  kernel/pagg.c
+
+
+2.3. Modified Data Structures
+
+The following existing data structures were altered to implement PAGG.
+
+-  struct task_struct:          (include/linux/sched.h)
+     struct pagg_list  pagg_list;     /* List of pagg containers */
+
+This new member in task_struct, pagg_list, points to the list of pagg
+containers to which the process is currently attached.
+
+2.4. New Data Structures
+
+The following new data structures were introduced to implement PAGG.
+
+-  struct pagg:          (include/linux/pagg.h)
+     struct pagg_hook *hook		     /* Ptr to pagg module entry */
+     void 		*data;               /* Task specific data */
+     struct list_head   entry;	   	     /* List connection */	
+     
+-  struct pagg_hook:        (include/linux/pagg.h)
+     struct module *module;                  /* Ptr to PAGG module */
+     char *name;                             /* PAGG hook name - restricted
+					      * to 32 characters.  */
+     int  (*attach)(struct task_struct *, /* Function to attach */
+               struct pagg *,
+               void *);
+     int  (*detach)(struct task_struct *, /* Function to detach */
+               struct pagg *);
+     int  (*init)(struct task_struct *,   /* Load task init func. */
+		     struct pagg *);
+     void  *data;                            /* Module specific data */
+     struct list_head entry;		     /* List connection */
+     void    (*exec)(struct task_struct *, struct pagg *); /* exec func ptr */
+
+The pagg structure provides the process' reference to the PAGG
+containers provided by the PAGG modules.  The attach function pointer
+is the function used to notify the referenced PAGG container that the
+process is being attached.  The detach function pointer is used to notify
+the referenced PAGG container that the process is exiting or otherwise
+detaching from the container.  The exec function pointer is used when a
+process in the pagg container exec's a new process.  This is optional and
+may be set to NULL if it is not needed by the pagg module.
+
+The pagg_hook structure provides the reference to the module that
+implements a type of PAGG container.  In addition to the function pointers
+described concerning pagg, this structure provides an addition
+function pointer.  The init function pointer is currently not used
+but will be available in the future.  Future use of the init function
+will be optional and will used to attach currently running processes to
+a default PAGG container when a PAGG module is loaded on a running system.
+
+
+2.5. Modified Functions
+
+The following functions were changed to implement PAGG:
+
+-  do_fork:     (kernel/fork.c)
+     /* execute the following pseudocode before add to run-queue  */  
+     If parent process pagg list is not empty
+          Call attach_pagg_list function with child task_struct as argument
+-  do_exit:     (kernel/exit.c)
+     /* execute the following pseudocode prior to schedule call */
+     If current process pagg list is not empty
+               Call detach_pagg_list function with current task_struct 
+-  sys_execve:  (fs/exec.c)
+     /* When a process in a pagg exec's, an optional callout can be run.  This
+        is implemented with an optional function pointer in the pagg_hook.  */
+
+2.6 New Functions
+
+The following new functions were added to implement PAGG:
+
+-  int  register_pagg_hook(struct pagg_hook *);  (kernel/pagg.c)
+     Add module entry into table of pagg modules
+-  int unregister_pagg_hook(struct pagg_hook *); (kernel/pagg.c)
+     Find module entry in list of pagg modules
+          Foreach task
+		If task is attached to this pagg module
+			return error
+	  If no tasks are referencing this module
+		remove module entry from list of pagg modules
+-  int attach_pagg_list(struct task_struct *);       (kernel/pagg.c)
+     /* Assumed task pagg list pts to paggs that it attaches to */
+     While another pagg container reference
+          Make copy of pagg container reference & insert into new list
+          Attach task to pagg container using new container reference
+          Get next pagg container reference
+     Make task pagg list use the new pagg list
+-  int detach_pagg_list(struct task_struct *);       (kernel/pagg.c)
+     While another pagg container reference
+          Detach task from pagg container using reference
+
diff -Naru 2.6-patch/fs/exec.c 2.6pagg-patch/fs/exec.c
--- 2.6-patch/fs/exec.c	2004-05-10 12:06:11.000000000 -0500
+++ 2.6pagg-patch/fs/exec.c	2004-05-20 14:46:49.000000000 -0500
@@ -46,7 +46,7 @@
 #include <linux/security.h>
 #include <linux/syscalls.h>
 #include <linux/rmap.h>
-
+#include <linux/pagg.h>
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
 #include <asm/mmu_context.h>
@@ -1142,6 +1142,7 @@
 	retval = search_binary_handler(&bprm,regs);
 	if (retval >= 0) {
 		free_arg_pages(&bprm);
+		pagg_exec(current);
 
 		/* execve success */
 		security_bprm_free(&bprm);
diff -Naru 2.6-patch/include/linux/init_task.h 2.6pagg-patch/include/linux/init_task.h
--- 2.6-patch/include/linux/init_task.h	2004-05-10 12:06:11.000000000 -0500
+++ 2.6pagg-patch/include/linux/init_task.h	2004-05-20 14:46:49.000000000 -0500
@@ -2,6 +2,7 @@
 #define _LINUX__INIT_TASK_H
 
 #include <linux/file.h>
+#include <linux/pagg.h>
 
 #define INIT_FILES \
 { 							\
@@ -112,6 +113,7 @@
 	.proc_lock	= SPIN_LOCK_UNLOCKED,				\
 	.switch_lock	= SPIN_LOCK_UNLOCKED,				\
 	.journal_info	= NULL,						\
+	INIT_TASK_PAGG(tsk)						\
 }
 
 
diff -Naru 2.6-patch/include/linux/pagg.h 2.6pagg-patch/include/linux/pagg.h
--- 2.6-patch/include/linux/pagg.h	1969-12-31 18:00:00.000000000 -0600
+++ 2.6pagg-patch/include/linux/pagg.h	2004-04-30 14:55:51.000000000 -0500
@@ -0,0 +1,246 @@
+/* 
+ * PAGG (Process Aggregates) interface
+ *
+ * 
+ * Copyright (c) 2000-2002, 2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Description:	This file, include/linux/pagg.h, contains the data
+ *              structure definitions and function prototypes used to
+ *              implement process aggregates (paggs). Paggs provides a
+ *              generalized way to implement process groupings or
+ *              containers.  Modules use these functions to register
+ *              with the kernel as providers of process aggregation
+ *              containers. The pagg data structures define the
+ *              callback functions and data access pointers back into
+ *              the pagg modules.
+ */
+
+#ifndef _LINUX_PAGG_H
+#define _LINUX_PAGG_H
+
+#include <linux/config.h>
+
+/*
+ * Used by task_struct to manage a list of pagg attachments for the task.
+ * The list will be used to hold references to pagg structures.  
+ * These structures define the pagg attachments for the task.  
+ *
+ * STRUCT MEMBERS:
+ * 	list:		The list head pointer for the list of pagg
+ * 			structures.
+ * 	sem:		The semaphore used  to lock the list.
+ */
+struct pagg_list {
+	struct list_head	head;	
+	struct rw_semaphore	sem;
+};
+
+#ifdef CONFIG_PAGG
+
+#define PAGG_NAMELN	32		/* Max chars in PAGG module name */
+
+
+/* Macro used to initialize a pagg_list structure after declaration 
+ *
+ * Macro arguments:
+ * 	l:	the pagg list (struct pagg_list)
+ */
+#define INIT_PAGG_LIST(_l)						\
+do {									\
+	INIT_LIST_HEAD(_l.head);					\
+	init_rwsem(_l.sem);						\
+} while(0)
+	
+
+/*
+ * Used by task_struct to manage list of pagg attachments for the process.  
+ * Each pagg provides the link between the process and the 
+ * correct pagg container.
+ *
+ * STRUCT MEMBERS:
+ *     hook:	Reference to pagg module structure.  That struct
+ *     		holds the name key and function pointers.
+ *     data:	Opaque data pointer - defined by pagg modules.
+ *     entry:	List pointers
+ */
+struct pagg {
+       struct pagg_hook	*hook;
+       void		*data;
+       struct list_head	entry;
+};
+
+/*
+ * Used by pagg modules to define the callback functions into the 
+ * module.
+ *
+ * STRUCT MEMBERS:
+ *     name:           The name of the pagg container type provided by
+ *                     the module. This will be set by the pagg module.
+ *     attach:         Function pointer to function used when attaching
+ *                     a process to the pagg container referenced by 
+ *                     this struct.
+ *     detach:         Function pointer to function used when detaching
+ *                     a process to the pagg container referenced by 
+ *                     this struct.
+ *     init:           Function pointer to initialization function.  This
+ *                     function is used when the module is loaded to attach
+ *                     existing processes to a default container as defined by
+ *                     the pagg module. This is optional and may be set to 
+ *                     NULL if it is not needed by the pagg module.
+ *     data:           Opaque data pointer - defined by pagg modules.
+ *     module:         Pointer to kernel module struct.  Used to increment & 
+ *                     decrement the use count for the module.
+ *     entry:	       List pointers
+ *     exec:           Function pointer to function used when a process
+ *                     in the pagg container exec's a new process. This
+ *                     is optional and may be set to NULL if it is not 
+ *                     needed by the pagg module.
+ */
+struct pagg_hook {
+       struct module	*module;
+       char		*name;	/* Name Key - restricted to 32 characters */
+       void		*data;	/* Opaque module specific data */
+       struct list_head	entry;	/* List pointers */
+       int		(*init)(struct task_struct *, struct pagg *);
+       int		(*attach)(struct task_struct *, struct pagg *, void*);
+       int		(*detach)(struct task_struct *, struct pagg *);
+       void		(*exec)(struct task_struct *, struct pagg *);
+};
+
+
+/* Kernel service functions for providing PAGG support */
+extern struct pagg *pagg_get(struct task_struct *task, char *key);
+extern struct pagg *pagg_alloc(struct task_struct *task, 
+			       struct pagg_hook *pt);
+extern void pagg_free(struct pagg *pagg);
+extern int pagg_hook_register(struct pagg_hook *pt_new);
+extern int pagg_hook_unregister(struct pagg_hook *pt_old);
+extern int __pagg_attach(struct task_struct *to_task, 
+			 struct task_struct *from_task);
+extern int __pagg_detach(struct task_struct *task);
+extern int __pagg_exec(struct task_struct *task);
+
+/* macros used when a child process must inherit attachment to pagg
+ * containers from the parent.
+ * 
+ * Using static inline would result in not fully defined type errors.
+ * bad_fork_cleanup is defined in fork.c.
+ */
+
+#define pagg_attach(child, parent)					\
+do {									\
+	INIT_PAGG_LIST(&child->pagg_list);				\
+	if (!list_empty(&parent->pagg_list.head))			\
+		if (__pagg_attach(child, parent) != 0)			\
+			goto bad_fork_cleanup;				\
+} while(0)
+
+/* 
+ * macro used when a process must detach from pagg containers to which it
+ * is currenlty a member.
+ *
+ */
+#define pagg_detach(task)						\
+do {									\
+	if (!list_empty(&task->pagg_list.head))				\
+		__pagg_detach(task);					\
+} while(0)
+
+/* 
+ * macro used when a process exec's.
+ *
+ */
+#define pagg_exec(task)							\
+do {									\
+	if (!list_empty(&task->pagg_list.head))				\
+		__pagg_exec(task);					\
+} while(0)
+
+/* The static inlines commented out for now with the ifdef below */
+#ifdef NOTDEFINED
+
+/* function used when a child process must inherit attachment to pagg
+ * containers from the parent.
+ */
+static inline int pagg_attach(struct task_struct *child, 
+			      struct task_struct *parent)
+{
+	INIT_PAGG_LIST(&child->pagg_list);
+	if (!list_empty(&parent->pagg_list.head))
+		return __pagg_attach(child, parent);
+	return 0;
+}
+
+/* 
+ * Function used when a process must detach from pagg containers to which it
+ * is currenlty a member.
+ *
+ */
+static inline void pagg_detach(struct task_struct *task)
+{
+	if (!list_empty(&task->pagg_list.head))
+		__pagg_detach(task);
+}
+
+/* 
+ * function used when a process exec's.
+ *
+ */
+static inline void pagg_exec(struct task_struct *task)
+{
+	if (!list_empty(&task->pagg_list.head))
+		__pagg_exec(task);
+}
+#endif /* NOT-DEFINED just comment out the block above for now */
+
+/*
+ * Marco Used in INIT_TASK to set the head and sem of pagg_list.
+ * If CONFIG_PAGG is off, it is defined as an empty macro below.
+ */
+#define INIT_TASK_PAGG(tsk) \
+	.pagg_list  = {                  \
+	.head = LIST_HEAD_INIT(tsk.pagg_list.head),     \
+	.sem  = __RWSEM_INITIALIZER(tsk.pagg_list.sem)  \
+	}, \
+
+#else  /* CONFIG_PAGG */
+
+/* 
+ * Replacement macros used when PAGG (Process Aggregates) support is not
+ * compiled into the kernel.
+ */
+#define INIT_TASK_PAGG(tsk)
+#define INIT_PAGG_LIST(l) do { } while(0)
+#define pagg_attach(ct, pt)  do { } while(0)
+#define pagg_detach(t)  do {  } while(0)     
+#define pagg_exec(t)  do {  } while(0)     
+
+#endif /* CONFIG_PAGG */
+
+#endif /* _LINUX_PAGG_H */
diff -Naru 2.6-patch/include/linux/sched.h 2.6pagg-patch/include/linux/sched.h
--- 2.6-patch/include/linux/sched.h	2004-05-10 12:06:11.000000000 -0500
+++ 2.6pagg-patch/include/linux/sched.h	2004-05-20 14:46:49.000000000 -0500
@@ -29,6 +29,7 @@
 #include <linux/completion.h>
 #include <linux/pid.h>
 #include <linux/percpu.h>
+#include <linux/pagg.h>
 
 struct exec_domain;
 
@@ -499,11 +500,15 @@
 
 	struct dentry *proc_dentry;
 	struct backing_dev_info *backing_dev_info;
-
 	struct io_context *io_context;
 
 	unsigned long ptrace_message;
 	siginfo_t *last_siginfo; /* For ptrace use.  */
+
+#ifdef CONFIG_PAGG
+/* List of pagg (process aggregate) attachments */
+	struct pagg_list pagg_list;
+#endif
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
diff -Naru 2.6-patch/init/Kconfig 2.6pagg-patch/init/Kconfig
--- 2.6-patch/init/Kconfig	2004-05-10 12:06:11.000000000 -0500
+++ 2.6pagg-patch/init/Kconfig	2004-05-20 14:46:49.000000000 -0500
@@ -121,6 +121,14 @@
 	  up to the user level program to do useful things with this
 	  information.  This is generally a good idea, so say Y.
 
+config PAGG
+	bool "Support for process aggregates (PAGGs)"
+	help
+     Say Y here if you will be loading modules which provide support
+     for process aggregate containers.  Examples of such modules include the
+     Linux Jobs module and the Linux Array Sessions module.  If you will not 
+     be using such modules, say N.
+
 config SYSCTL
 	bool "Sysctl support"
 	---help---
diff -Naru 2.6-patch/kernel/exit.c 2.6pagg-patch/kernel/exit.c
--- 2.6-patch/kernel/exit.c	2004-05-10 12:06:11.000000000 -0500
+++ 2.6pagg-patch/kernel/exit.c	2004-05-20 14:46:49.000000000 -0500
@@ -22,7 +22,7 @@
 #include <linux/profile.h>
 #include <linux/mount.h>
 #include <linux/proc_fs.h>
-
+#include <linux/pagg.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/mmu_context.h>
@@ -799,6 +799,9 @@
 		module_put(tsk->binfmt->module);
 
 	tsk->exit_code = code;
+
+	pagg_detach(tsk);
+
 	exit_notify(tsk);
 	schedule();
 	BUG();
diff -Naru 2.6-patch/kernel/fork.c 2.6pagg-patch/kernel/fork.c
--- 2.6-patch/kernel/fork.c	2004-05-10 12:06:11.000000000 -0500
+++ 2.6pagg-patch/kernel/fork.c	2004-05-20 14:46:49.000000000 -0500
@@ -33,7 +33,7 @@
 #include <linux/ptrace.h>
 #include <linux/mount.h>
 #include <linux/audit.h>
-
+#include <linux/pagg.h>
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -236,6 +236,9 @@
 
 	init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
 	init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
+
+	/* Initialize the pagg list in pid 0 before it can clone itself. */
+	INIT_PAGG_LIST(&current->pagg_list);
 }
 
 static struct task_struct *dup_task_struct(struct task_struct *orig)
@@ -995,6 +998,12 @@
 	   
 	p->parent_exec_id = p->self_exec_id;
 
+	/*
+	 * call pagg modules to properly attach new process to the same
+	 * process aggregate containers as the parent process.
+	 */
+	pagg_attach(p, current);
+
 	/* ok, now we should be set up.. */
 	p->exit_signal = (clone_flags & CLONE_THREAD) ? -1 : (clone_flags & CSIGNAL);
 	p->pdeath_signal = 0;
diff -Naru 2.6-patch/kernel/Makefile 2.6pagg-patch/kernel/Makefile
--- 2.6-patch/kernel/Makefile	2004-05-10 12:06:11.000000000 -0500
+++ 2.6pagg-patch/kernel/Makefile	2004-05-20 14:46:49.000000000 -0500
@@ -18,6 +18,7 @@
 obj-$(CONFIG_PM) += power/
 obj-$(CONFIG_BSD_PROCESS_ACCT) += acct.o
 obj-$(CONFIG_COMPAT) += compat.o
+obj-$(CONFIG_PAGG) += pagg.o
 obj-$(CONFIG_IKCONFIG) += configs.o
 obj-$(CONFIG_IKCONFIG_PROC) += configs.o
 obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
diff -Naru 2.6-patch/kernel/pagg.c 2.6pagg-patch/kernel/pagg.c
--- 2.6-patch/kernel/pagg.c	1969-12-31 18:00:00.000000000 -0600
+++ 2.6pagg-patch/kernel/pagg.c	2004-05-20 13:36:37.000000000 -0500
@@ -0,0 +1,406 @@
+/* 
+ * PAGG (Process Aggregates) interface
+ *
+ * 
+ * Copyright (c) 2000-2004 Silicon Graphics, Inc.  All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ *
+ *
+ * Contact information:  Silicon Graphics, Inc., 1500 Crittenden Lane,
+ * Mountain View, CA  94043, or:
+ * 
+ * http://www.sgi.com 
+ * 
+ * For further information regarding this notice, see: 
+ * 
+ * http://oss.sgi.com/projects/GenInfo/NoticeExplan
+ */
+
+/*
+ * Description:  This file, kernel/pagg.c, contains the routines used
+ *               to implement process aggregates (paggs).  The pagg
+ *               extends the task_struct to allow for various process
+ *               aggregation continers.  Examples of such containers
+ *               include "jobs" and cluster applications IDs.  Process
+ *               sessions and groups could have been implemented using
+ *               paggs (although there would be little purpose in
+ *               making that change at this juncture).  The pagg
+ *               structure maintains pointers to callback functions and
+ *               data strucures maintained in modules that have
+ *               registered with the kernel as pagg container
+ *               providers.
+ */
+
+#include <linux/config.h>
+
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <linux/pagg.h>
+#include <asm/semaphore.h>
+
+/* list of pagg hook entries that reference the "module" implementations */
+static LIST_HEAD(pagg_hook_list);
+static DECLARE_RWSEM(pagg_hook_list_sem);
+
+
+/* 
+ * pagg_get
+ *
+ * Given a pagg_list list structure, this function will return
+ * a pointer to the pagg struct that matches the search
+ * key.  If the key is not found, the function will return NULL.
+ *
+ * The caller should hold at least a read lock on the pagg_list
+ * for task using down_read(&task->pagg_list.sem).
+ */
+struct pagg *
+pagg_get(struct task_struct *task, char *key)
+{
+	struct pagg *pagg;
+
+	list_for_each_entry(pagg, &task->pagg_list.head, entry) {
+		if (!strcmp(pagg->hook->name,key))
+			return pagg;
+	}
+	return NULL;
+}
+
+
+/*
+ * pagg_alloc
+ *
+ * Given a task and a pagg hook, this function will allocate
+ * a new pagg structure, initialize the settings, and insert the pagg into
+ * the pagg_list for the task.
+ *
+ * The caller for this function should hold at least a read lock on the
+ * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be 
+ * removed. If this function was called from the pagg module (usually the
+ * case), then the caller need not hold this lock. The caller should hold 
+ * a write lock on for the tasks pagg_list.sem.  This can be locked using 
+ * down_write(&task->pagg_list.sem)
+ */
+struct pagg *
+pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook)
+{
+	struct pagg *pagg;
+
+	pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL);
+	if (!pagg)
+		return NULL;
+
+	pagg->hook = pagg_hook;
+	pagg->data = NULL;
+	list_add_tail(&pagg->entry, &task->pagg_list.head);
+	return pagg;
+}
+
+
+/*
+ * pagg_free
+ *
+ * This function will ensure the pagg is deleted form 
+ * the list of pagg entries for the task. Finally, the memory for the 
+ * pagg is discarded.
+ *
+ * The caller of this function should hold a write lock on the pagg_list.sem
+ * for the task. This can be locked using down_write(&task->pagg_list.sem).
+ *
+ * Prior to calling pagg_free, the pagg should have been detached from the
+ * pagg container represented by this pagg.  That is usually done using
+ * p->hook->detach(task, pagg);
+ */
+void
+pagg_free(struct pagg *pagg) 
+{
+	list_del(&pagg->entry);
+	kfree(pagg);
+}
+
+
+/*
+ * get_pagg_hook
+ *
+ * Given a pagg hook name key, this function will return a pointer
+ * to the pagg_hook struct that contains that matches the name.
+ * 
+ * You should hold either the write or read lock for pagg_hook_list_sem
+ * before using this function.  This will ensure that the pagg_hook_list
+ * does not change while iterating through the list entries.
+ */
+static struct pagg_hook *
+get_pagg_hook(char *key)
+{
+	struct pagg_hook *pagg_hook;
+
+	list_for_each_entry(pagg_hook, &pagg_hook_list, entry) {
+		if (!strcmp(pagg_hook->name, key)) {
+			return pagg_hook;
+		}
+	}
+	return NULL;
+}
+
+
+/*
+ * pagg_hook_register
+ *
+ * Used to register a new pagg hook and enter it into the pagg_hook_list.
+ * The service name for a pagg hook is restricted to 32 characters.
+ *
+ * In the future an initialization function may also be defined so that all
+ * existing tasks can be assigned to a default pagg entry for the hook.
+ * However, this would require iterating through the tasklist.  To do that
+ * requires that the tasklist_lock be read locked.  Since the initialization
+ * function might be in a module, and therefore it might sleep (implementors
+ * decision), holding the tasklist_lock seems like a bad idea. It may be a
+ * requirement that the initialization function will be strictly forbidden
+ * from locking - by gentlemans agreement... 
+ *
+ * If a memory error is encountered, the pagg hook is unregistered and any
+ * tasks that have been attached to the initial pagg container are detached
+ * from that container.
+ */
+int
+pagg_hook_register(struct pagg_hook *pagg_hook_new)
+{
+	struct pagg_hook *pagg_hook = NULL;
+
+	/* ADD NEW PAGG MODULE TO ACCESS LIST */
+	if (!pagg_hook_new)
+		return -EINVAL;			/* error */
+	if (!list_empty(&pagg_hook_new->entry))
+		return -EINVAL;			/* error */
+	if (pagg_hook_new->name == NULL || strlen(pagg_hook_new->name) > PAGG_NAMELN) 
+		return -EINVAL;			/* error */
+
+	/* Try to insert new hook entry into the pagg hook list */
+	down_write(&pagg_hook_list_sem);
+
+	pagg_hook = get_pagg_hook(pagg_hook_new->name);
+
+	if (pagg_hook) {
+		up_write(&pagg_hook_list_sem);
+		printk(KERN_WARNING "Attempt to register duplicate"
+				" PAGG support (name=%s)\n", pagg_hook_new->name);
+		return -EBUSY;
+	}
+
+	/* Okay, we can insert into the pagg hook list */
+	list_add_tail(&pagg_hook_new->entry, &pagg_hook_list);
+	up_write(&pagg_hook_list_sem);
+
+	printk(KERN_INFO "Registering PAGG support for (name=%s)\n",
+			pagg_hook_new->name);
+
+	return 0;					/* success */
+
+}
+
+
+/*
+ * pagg_hook_unregister
+ *
+ * Used to unregister pagg hooks and remove them from the pagg_hook_list.
+ * Once the pagg hook entry in the pagg_hook_list is found, all of the
+ * tasks are scanned and detached from any pagg containers defined by the
+ * pagg implementation module.
+ */
+int
+pagg_hook_unregister(struct pagg_hook *pagg_hook_old)
+{
+	struct pagg_hook *pagg_hook;
+	struct task_struct *task;
+
+
+	/* Check the validity of the arguments */
+	if (!pagg_hook_old)
+		return -EINVAL;			/* error */
+	if (list_empty(&pagg_hook_old->entry))
+		return -EINVAL;			/* error */
+	if (pagg_hook_old->name == NULL)
+		return -EINVAL;			/* error */
+
+	down_write(&pagg_hook_list_sem);
+
+	pagg_hook = get_pagg_hook(pagg_hook_old->name);
+	if (pagg_hook && pagg_hook == pagg_hook_old) {
+		/* 
+		 * Scan through processes on system and check for  
+		 * references to pagg containers for this pagg hook.
+		 * 
+		 * The module cannot be unloaded if there are references.
+		 */
+		read_lock(&tasklist_lock);
+		for_each_process(task) {
+			struct pagg *pagg = NULL;
+
+			get_task_struct(task); /* So the task doesn't vanish on us */
+			read_unlock(&tasklist_lock);
+			down_read(&task->pagg_list.sem); /* lock the pagg list */
+			pagg = pagg_get(task, pagg_hook_old->name);
+			put_task_struct(task);
+			/* 
+			 * We won't be accessing pagg's memory, just need
+			 * to see if one existed - so we can release the task
+			 * lock now.
+			 */
+			up_read(&task->pagg_list.sem); /* unlock the pagg list */
+			if (pagg) {
+				up_write(&pagg_hook_list_sem);
+				return -EBUSY;
+			}
+
+			/* lock the task list again so we get a valid task in the loop */
+			read_lock(&tasklist_lock);
+		}
+		read_unlock(&tasklist_lock);
+		list_del_init(&pagg_hook->entry);
+		up_write(&pagg_hook_list_sem);
+
+		printk(KERN_INFO "Unregistering PAGG support for"
+				" (name=%s)\n", pagg_hook_old->name);
+
+		return 0;			/* success */
+	}
+
+	up_write(&pagg_hook_list_sem);
+
+	printk(KERN_WARNING "Attempt to unregister PAGG support (name=%s)"
+			" failed - not found\n", pagg_hook_old->name);
+	
+	return -EINVAL;				/* error */
+}
+
+
+/*
+ * __pagg_attach
+ *
+ * Used to attach a new task to the same pagg containers to which it's parent
+ * is attached.
+ *
+ * The "from" argument is the parent task.  The "to" argument is the child
+ * task. 
+ *
+ */
+int __pagg_attach(struct task_struct *to_task, struct task_struct *from_task)
+{
+	struct list_head   *entry;
+	int  		   retcode = 0;
+
+	/* lock the parents pagg_list we are copying from */
+	down_read(&from_task->pagg_list.sem); /* read lock the pagg list */
+
+	list_for_each(entry, &from_task->pagg_list.head) {
+		struct pagg *to_pagg = NULL;
+		struct pagg *from_pagg = list_entry(entry, struct pagg, 
+							entry);
+		to_pagg = pagg_alloc(to_task, from_pagg->hook);
+		if (!to_pagg) {
+			retcode = -ENOMEM;
+			goto error_return;
+		}
+		retcode = to_pagg->hook->attach(to_task, to_pagg, from_pagg->data);
+		if (retcode != 0) {
+			/* attach should issue error message */
+			goto error_return;
+		}
+	}
+
+	up_read(&from_task->pagg_list.sem); /* unlock the pagg list */
+
+	return 0;					/* success */
+
+  error_return:
+	/* 
+	 * Clean up all the pagg attachments made on behalf of the new
+	 * task.  Set new task pagg ptr to NULL for return.
+	 */
+	up_read(&from_task->pagg_list.sem); /* unlock the pagg list */
+	__pagg_detach(to_task);
+	return retcode;				/* failure */
+}
+
+/*
+ * __pagg_detach
+ *
+ * Used to detach a task from all pagg containers to which it is attached.
+ * 
+ * list_for_each used here because we need to reset the list after a 
+ * pagg is detached.
+ */
+int
+__pagg_detach(struct task_struct *task)
+{
+	struct list_head   *entry;
+	int retcode = 0;
+	int rettmp = 0;
+
+	/* Remove ref. to paggs from task immediately */
+	down_write(&task->pagg_list.sem); /* write lock pagg list */
+
+	list_for_each(entry, &task->pagg_list.head) {
+		int rettemp = 0;
+		struct pagg *pagg = list_entry(entry, struct pagg, entry);
+                                                                                
+		entry = &task->pagg_list.head;
+                                                                                
+		rettemp = pagg->hook->detach(task, pagg);
+		if (rettmp) {
+			/* an error message should be logged in free_pagg */
+			retcode = rettmp;
+		}
+		pagg_free(pagg);
+	}
+                                                                                
+	up_write(&task->pagg_list.sem); /* write unlock the pagg list */
+                                                                                
+	return retcode;   /* 0 = success, else return last code for failure */
+}
+
+
+/*
+ * __pagg_exec
+ *
+ * Used to when a process that is in a pagg container does an exec.
+ *
+ * The "from" argument is the task.  The "name" argument is the name
+ * of the process being exec'ed.
+ *
+ */
+int __pagg_exec(struct task_struct *task) 
+{
+	struct pagg	*pagg;
+
+	/* lock the parents pagg_list we are copying from */
+	down_read(&task->pagg_list.sem); /* lock the pagg list */
+
+	list_for_each_entry(pagg, &task->pagg_list.head, entry) {
+		if (pagg->hook->exec) /* conditional because it's optional */
+			pagg->hook->exec(task, pagg);
+	}
+
+	up_read(&task->pagg_list.sem); /* unlock the pagg list */
+	return 0;
+}
+
+
+EXPORT_SYMBOL(pagg_get);
+EXPORT_SYMBOL(pagg_alloc);
+EXPORT_SYMBOL(pagg_free);
+EXPORT_SYMBOL(pagg_hook_register);
+EXPORT_SYMBOL(pagg_hook_unregister);

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2004-05-20 21:17 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-26 22:04 [PATCH] Process Aggregates (PAGG) support for the 2.6 kernel Erik Jacobson
2004-04-26 23:39 ` Chris Wright
2004-04-27  0:36   ` Jesse Barnes
2004-04-27  0:41     ` Chris Wright
2004-04-27 21:00       ` Erik Jacobson
2004-04-27 21:05         ` Chris Wright
2004-04-29 21:10       ` Rik van Riel
2004-04-27 20:51   ` Erik Jacobson
2004-04-27 22:28     ` Chris Wright
2004-04-28 14:55     ` Christoph Hellwig
2004-04-29 19:20       ` Paul Jackson
2004-04-29 19:27         ` Chris Wright
2004-04-29 19:29         ` Christoph Hellwig
2004-04-29 19:34           ` Paul Jackson
2004-04-29 19:53           ` Erik Jacobson
2004-04-29 21:20             ` Rik van Riel
2004-04-30  6:17               ` Christoph Hellwig
2004-04-30 11:08                 ` Guillaume Thouvenin
2004-04-30 18:00                   ` Shailabh
2004-04-30 18:28                   ` Rik van Riel
2004-04-30 12:54                 ` Rik van Riel
2004-04-30 13:06                   ` Christoph Hellwig
2004-04-30 13:28                     ` Chris Mason
2004-04-30 16:50                       ` Shailabh
2004-04-30 15:22                     ` Rik van Riel
2004-04-30 16:45                       ` Christoph Hellwig
2004-04-30 17:53                     ` Shailabh
2004-04-30 18:15                       ` Chris Wright
2004-04-30 15:59                   ` Chris Wright
2004-04-30  8:54 ` Guillaume Thouvenin
2004-05-20 21:16 ` Erik Jacobson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).