LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer
@ 2008-08-11 23:53 Matt Helsley
  2008-08-11 23:53 ` [PATCH 1/5] Container Freezer: Add TIF_FREEZE flag to all architectures Matt Helsley
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Matt Helsley @ 2008-08-11 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm

This patch series introduces a cgroup subsystem that utilizes the swsusp
freezer to freeze a group of tasks. It's immediately useful for batch job
management scripts. It should also be useful in the future for implementing
container checkpoint/restart.

The freezer subsystem in the container filesystem defines a cgroup file named
freezer.state. Reading freezer.state will return the current state of the
cgroup.  Writing "FROZEN" to the state file will freeze all tasks in the
cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. 

* Examples of usage :

   # mkdir /containers/freezer
   # mount -t cgroup -ofreezer freezer  /containers
   # mkdir /containers/0
   # echo $some_pid > /containers/0/tasks

to get status of the freezer subsystem :

   # cat /containers/0/freezer.state
   RUNNING

to freeze all tasks in the container :

   # echo FROZEN > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   FREEZING
   # cat /containers/0/freezer.state
   FROZEN

to unfreeze all tasks in the container :

   # echo RUNNING > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   RUNNING

Andrew, please consider these patches for -mm.

Cheers,
	-Matt Helsley

Changes since v5:
v6:
	Merged the patch using the cgroups write_string() method since Linus
		has merged the patch supporting it.
	Moved header file modifications to in patch 1 to
		arch/$ARCH/include/asm/thread_info.h where appropriate.
	Moved cgroup_freezer.h contents into cgroups_freezer.c and freezer.h
	Added CONFIG_FREEZER to help conditionally build the freezer code.
		This required some simplifications of the second patch.
	Fix a lock ordering problem with the freezer->lock reacquire code
		Required order: freezer->lock, css_set_lock, task->alloc_lock
		Reacquiring: css_set_lock, task->alloc_lock, freezer->lock
		Solution: change freezer_fork() to not require any ordering
			between task->alloc_lock and freezer->lock
v5:
	Split out write_string as a separate patch for easier merging
		with trees lacking certain cgroup patches at the time.
	Checked use of task alloc lock for races with swsusp freeze/thaw --
		looks safe because there are explicit barriers to handle
		freeze/thaw races for individual tasks, we explicitly
		handle partial group freezing, and partial group thawing
		should be resolved without changing swsusp's loop.
	Updated the patches to Linus' git tree as of approximately
		7/31/2008.
	Added Pavel and Serge's Acked-by lines to Acked patches

v4 (Almost all of these changes are confined to patch 3):
	Reworked the series to use task_lock() instead of RCU.
	Reworked the series to use write_string() and read_seq_string()
		cgroup methods.
	Fixed the race Paul Menage identified.
	Fixed up check_if_frozen() to do more than just test the FROZEN
		flag. In some cases tasks could be stopped (T) and marked
		FREEZING. When that happens we can safely assume that it
		will be frozen immediately upon waking up in the kernel.
		Waiting for it to get marked with PF_FROZEN in order to
		transition to the FROZEN state would block unnecessarily.
	Removed freezer_ prefix from static functions in cgroup_freezer.c.
	Simplified STATE_ switch.
	Updated the locking comments.

v3:
	Ported to 2.6.26-rc5-mm2 with Rafael's freezer patches
	Tested on 24 combinations of 3 architectures (x86, x86_64, ppc64)
		with 8 different kernel configs varying power management
		and cgroup config variables. Each patch builds and boots
		in these 24 combinations.
	Passes functional testing.

v2 (roughly patches 3 and 5):
	Moved the "kill" file into a separate cgroup subsystem (signal) and
		it's own patch.
	Changed the name of the file from freezer.freeze to freezer.state.
	Switched from taking 1 and 0 as input to the strings "FROZEN" and 
		"RUNNING", respectively. This helps keep the interface
		human-usable if/when we need to more states.
	Checked that stopped or interrupted is "frozen enough"
		Since try_to_freeze() is called upon wakeup of these tasks
		this should be fine. This idea comes from recent changes to
		the freezer.
	Checked that if (task == current) whilst freezing cgroup we're ok
	Fixed bug where -EBUSY would always be returned when freezing
	Added code to handle userspace retries for any remaining -EBUSY

-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/5] Container Freezer: Add TIF_FREEZE flag to all architectures
  2008-08-11 23:53 [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Matt Helsley
@ 2008-08-11 23:53 ` Matt Helsley
  2008-08-11 23:53 ` [PATCH 2/5] Container Freezer: Make refrigerator always available Matt Helsley
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Matt Helsley @ 2008-08-11 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Pavel Machek,
	Serge E. Hallyn, Nigel Cunningham

This patch is the first step in making the refrigerator() available
to all architectures, even for those without power management.

The purpose of such a change is to be able to use the refrigerator()
in a new control group subsystem which will implement a control group
freezer.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
---
 arch/parisc/include/asm/thread_info.h   |    2 ++
 arch/sparc/include/asm/thread_info_32.h |    2 ++
 arch/sparc/include/asm/thread_info_64.h |    2 ++
 include/asm-alpha/thread_info.h         |    2 ++
 include/asm-avr32/thread_info.h         |    1 +
 include/asm-cris/thread_info.h          |    2 ++
 include/asm-h8300/thread_info.h         |    2 ++
 include/asm-m68k/thread_info.h          |    1 +
 include/asm-m68knommu/thread_info.h     |    2 ++
 include/asm-s390/thread_info.h          |    2 ++
 include/asm-um/thread_info.h            |    2 ++
 include/asm-xtensa/thread_info.h        |    2 ++
 12 files changed, 22 insertions(+)

Index: linux-2.6.27-rc1-mm1/arch/sparc/include/asm/thread_info_32.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/arch/sparc/include/asm/thread_info_32.h
+++ linux-2.6.27-rc1-mm1/arch/sparc/include/asm/thread_info_32.h
@@ -139,6 +139,7 @@ BTFIXUPDEF_CALL(void, free_thread_info, 
 #define TIF_POLLING_NRFLAG	9	/* true if poll_idle() is polling
 					 * TIF_NEED_RESCHED */
 #define TIF_MEMDIE		10
+#define TIF_FREEZE		11	/* is freezing for suspend */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
@@ -152,6 +153,7 @@ BTFIXUPDEF_CALL(void, free_thread_info, 
 #define _TIF_DO_NOTIFY_RESUME_MASK	(_TIF_NOTIFY_RESUME | \
 					 _TIF_SIGPENDING | \
 					 _TIF_RESTORE_SIGMASK)
+#define TIF_FREEZE		(1<<TIF_FREEZE)
 
 #endif /* __KERNEL__ */
 
Index: linux-2.6.27-rc1-mm1/arch/sparc/include/asm/thread_info_64.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/arch/sparc/include/asm/thread_info_64.h
+++ linux-2.6.27-rc1-mm1/arch/sparc/include/asm/thread_info_64.h
@@ -237,6 +237,7 @@ register struct thread_info *current_thr
 #define TIF_ABI_PENDING		12
 #define TIF_MEMDIE		13
 #define TIF_POLLING_NRFLAG	14
+#define TIF_FREEZE		15	/* is freezing for suspend */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
@@ -249,6 +250,7 @@ register struct thread_info *current_thr
 #define _TIF_SYSCALL_AUDIT	(1<<TIF_SYSCALL_AUDIT)
 #define _TIF_ABI_PENDING	(1<<TIF_ABI_PENDING)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
+#define _TIF_FREEZE		(1<<TIF_FREEZE)
 
 #define _TIF_USER_WORK_MASK	((0xff << TI_FLAG_WSAVED_SHIFT) | \
 				 _TIF_DO_NOTIFY_RESUME_MASK | \
Index: linux-2.6.27-rc1-mm1/include/asm-alpha/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/asm-alpha/thread_info.h
+++ linux-2.6.27-rc1-mm1/include/asm-alpha/thread_info.h
@@ -74,12 +74,14 @@ register struct thread_info *__current_t
 #define TIF_UAC_SIGBUS		7
 #define TIF_MEMDIE		8
 #define TIF_RESTORE_SIGMASK	9	/* restore signal mask in do_signal */
+#define TIF_FREEZE		16	/* is freezing for suspend */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1<<TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
 #define _TIF_RESTORE_SIGMASK	(1<<TIF_RESTORE_SIGMASK)
+#define _TIF_FREEZE		(1<<TIF_FREEZE)
 
 /* Work to do on interrupt/exception return.  */
 #define _TIF_WORK_MASK		(_TIF_SIGPENDING | _TIF_NEED_RESCHED)
Index: linux-2.6.27-rc1-mm1/include/asm-avr32/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/asm-avr32/thread_info.h
+++ linux-2.6.27-rc1-mm1/include/asm-avr32/thread_info.h
@@ -96,6 +96,7 @@ static inline struct thread_info *curren
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_RESTORE_SIGMASK	(1 << TIF_RESTORE_SIGMASK)
 #define _TIF_CPU_GOING_TO_SLEEP (1 << TIF_CPU_GOING_TO_SLEEP)
+#define _TIF_FREEZE		(1 << TIF_FREEZE)
 
 /* Note: The masks below must never span more than 16 bits! */
 
Index: linux-2.6.27-rc1-mm1/include/asm-cris/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/asm-cris/thread_info.h
+++ linux-2.6.27-rc1-mm1/include/asm-cris/thread_info.h
@@ -88,6 +88,7 @@ struct thread_info {
 #define TIF_RESTORE_SIGMASK	9	/* restore signal mask in do_signal() */
 #define TIF_POLLING_NRFLAG	16	/* true if poll_idle() is polling TIF_NEED_RESCHED */
 #define TIF_MEMDIE		17
+#define TIF_FREEZE		18	/* is freezing for suspend */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_NOTIFY_RESUME	(1<<TIF_NOTIFY_RESUME)
@@ -95,6 +96,7 @@ struct thread_info {
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
 #define _TIF_RESTORE_SIGMASK	(1<<TIF_RESTORE_SIGMASK)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
+#define _TIF_FREEZE		(1<<TIF_FREEZE)
 
 #define _TIF_WORK_MASK		0x0000FFFE	/* work to do on interrupt/exception return */
 #define _TIF_ALLWORK_MASK	0x0000FFFF	/* work to do on any return to u-space */
Index: linux-2.6.27-rc1-mm1/include/asm-h8300/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/asm-h8300/thread_info.h
+++ linux-2.6.27-rc1-mm1/include/asm-h8300/thread_info.h
@@ -89,6 +89,7 @@ static inline struct thread_info *curren
 					   TIF_NEED_RESCHED */
 #define TIF_MEMDIE		4
 #define TIF_RESTORE_SIGMASK	5	/* restore signal mask in do_signal() */
+#define TIF_FREEZE		16	/* is freezing for suspend */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
@@ -96,6 +97,7 @@ static inline struct thread_info *curren
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
 #define _TIF_RESTORE_SIGMASK	(1<<TIF_RESTORE_SIGMASK)
+#define _TIF_FREEZE		(1<<TIF_FREEZE)
 
 #define _TIF_WORK_MASK		0x0000FFFE	/* work to do on interrupt/exception return */
 
Index: linux-2.6.27-rc1-mm1/include/asm-m68k/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/asm-m68k/thread_info.h
+++ linux-2.6.27-rc1-mm1/include/asm-m68k/thread_info.h
@@ -52,5 +52,6 @@ struct thread_info {
 #define TIF_DELAYED_TRACE	14	/* single step a syscall */
 #define TIF_SYSCALL_TRACE	15	/* syscall trace active */
 #define TIF_MEMDIE		16
+#define TIF_FREEZE		17	/* thread is freezing for suspend */
 
 #endif	/* _ASM_M68K_THREAD_INFO_H */
Index: linux-2.6.27-rc1-mm1/include/asm-m68knommu/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/asm-m68knommu/thread_info.h
+++ linux-2.6.27-rc1-mm1/include/asm-m68knommu/thread_info.h
@@ -84,12 +84,14 @@ static inline struct thread_info *curren
 #define TIF_POLLING_NRFLAG	3	/* true if poll_idle() is polling
 					   TIF_NEED_RESCHED */
 #define TIF_MEMDIE		4
+#define TIF_FREEZE		16	/* is freezing for suspend */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1<<TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1<<TIF_NEED_RESCHED)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
+#define _TIF_FREEZE		(1<<TIF_FREEZE)
 
 #define _TIF_WORK_MASK		0x0000FFFE	/* work to do on interrupt/exception return */
 
Index: linux-2.6.27-rc1-mm1/include/asm-s390/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/asm-s390/thread_info.h
+++ linux-2.6.27-rc1-mm1/include/asm-s390/thread_info.h
@@ -98,6 +98,7 @@ static inline struct thread_info *curren
 #define TIF_31BIT		18	/* 32bit process */ 
 #define TIF_MEMDIE		19
 #define TIF_RESTORE_SIGMASK	20	/* restore signal mask in do_signal() */
+#define TIF_FREEZE		21	/* thread is freezing for suspend */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_RESTORE_SIGMASK	(1<<TIF_RESTORE_SIGMASK)
@@ -110,6 +111,7 @@ static inline struct thread_info *curren
 #define _TIF_USEDFPU		(1<<TIF_USEDFPU)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
 #define _TIF_31BIT		(1<<TIF_31BIT)
+#define _TIF_FREEZE		(1<<TIF_FREEZE)
 
 #endif /* __KERNEL__ */
 
Index: linux-2.6.27-rc1-mm1/include/asm-um/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/asm-um/thread_info.h
+++ linux-2.6.27-rc1-mm1/include/asm-um/thread_info.h
@@ -69,6 +69,7 @@ static inline struct thread_info *curren
 #define TIF_MEMDIE	 	5
 #define TIF_SYSCALL_AUDIT	6
 #define TIF_RESTORE_SIGMASK	7
+#define TIF_FREEZE		16	/* is freezing for suspend */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -77,5 +78,6 @@ static inline struct thread_info *curren
 #define _TIF_MEMDIE		(1 << TIF_MEMDIE)
 #define _TIF_SYSCALL_AUDIT	(1 << TIF_SYSCALL_AUDIT)
 #define _TIF_RESTORE_SIGMASK	(1 << TIF_RESTORE_SIGMASK)
+#define _TIF_FREEZE		(1 << TIF_FREEZE)
 
 #endif
Index: linux-2.6.27-rc1-mm1/include/asm-xtensa/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/asm-xtensa/thread_info.h
+++ linux-2.6.27-rc1-mm1/include/asm-xtensa/thread_info.h
@@ -134,6 +134,7 @@ static inline struct thread_info *curren
 #define TIF_MEMDIE		5
 #define TIF_RESTORE_SIGMASK	6	/* restore signal mask in do_signal() */
 #define TIF_POLLING_NRFLAG	16	/* true if poll_idle() is polling TIF_NEED_RESCHED */
+#define TIF_FREEZE		17	/* is freezing for suspend */
 
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1<<TIF_SIGPENDING)
@@ -142,6 +143,7 @@ static inline struct thread_info *curren
 #define _TIF_IRET		(1<<TIF_IRET)
 #define _TIF_POLLING_NRFLAG	(1<<TIF_POLLING_NRFLAG)
 #define _TIF_RESTORE_SIGMASK	(1<<TIF_RESTORE_SIGMASK)
+#define _TIF_FREEZE		(1<<TIF_FREEZE)
 
 #define _TIF_WORK_MASK		0x0000FFFE	/* work to do on interrupt/exception return */
 #define _TIF_ALLWORK_MASK	0x0000FFFF	/* work to do on any return to u-space */
Index: linux-2.6.27-rc1-mm1/arch/parisc/include/asm/thread_info.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/arch/parisc/include/asm/thread_info.h
+++ linux-2.6.27-rc1-mm1/arch/parisc/include/asm/thread_info.h
@@ -58,6 +58,7 @@ struct thread_info {
 #define TIF_32BIT               4       /* 32 bit binary */
 #define TIF_MEMDIE		5
 #define TIF_RESTORE_SIGMASK	6	/* restore saved signal mask */
+#define TIF_FREEZE		7	/* is freezing for suspend */
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
@@ -65,6 +66,7 @@ struct thread_info {
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 #define _TIF_RESTORE_SIGMASK	(1 << TIF_RESTORE_SIGMASK)
+#define _TIF_FREEZE		(1 << TIF_FREEZE)
 
 #define _TIF_USER_WORK_MASK     (_TIF_SIGPENDING | \
                                  _TIF_NEED_RESCHED | _TIF_RESTORE_SIGMASK)

-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/5] Container Freezer: Make refrigerator always available
  2008-08-11 23:53 [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Matt Helsley
  2008-08-11 23:53 ` [PATCH 1/5] Container Freezer: Add TIF_FREEZE flag to all architectures Matt Helsley
@ 2008-08-11 23:53 ` Matt Helsley
  2008-08-12 20:49   ` Rafael J. Wysocki
  2008-08-11 23:53 ` [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem Matt Helsley
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Matt Helsley @ 2008-08-11 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

Now that the TIF_FREEZE flag is available in all architectures,
extract the refrigerator() and freeze_task() from kernel/power/process.c
and make it available to all.

The refrigerator() can now be used in a control group subsystem
implementing a control group freezer.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
---
 include/linux/freezer.h |   14 ++++-
 kernel/Makefile         |    1 
 kernel/freezer.c        |  122 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/power/Kconfig    |    3 +
 kernel/power/process.c  |  116 ---------------------------------------------
 5 files changed, 137 insertions(+), 119 deletions(-)
 create mode 100644 kernel/freezer.c

Index: linux-2.6.27-rc1-mm1/include/linux/freezer.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/linux/freezer.h
+++ linux-2.6.27-rc1-mm1/include/linux/freezer.h
@@ -6,7 +6,7 @@
 #include <linux/sched.h>
 #include <linux/wait.h>
 
-#ifdef CONFIG_PM_SLEEP
+#ifdef CONFIG_FREEZER
 /*
  * Check if a process has been frozen
  */
@@ -39,6 +39,11 @@ static inline void clear_freeze_flag(str
 	clear_tsk_thread_flag(p, TIF_FREEZE);
 }
 
+static inline bool should_send_signal(struct task_struct *p)
+{
+	return !(p->flags & PF_FREEZER_NOSIG);
+}
+
 /*
  * Wake up a frozen process
  *
@@ -75,6 +80,9 @@ static inline int try_to_freeze(void)
 		return 0;
 }
 
+extern bool freeze_task(struct task_struct *p, bool sig_only);
+extern void cancel_freezing(struct task_struct *p);
+
 /*
  * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
  * calls wait_for_completion(&vfork) and reset right after it returns from this
@@ -166,7 +174,7 @@ static inline void set_freezable_with_si
 	} while (try_to_freeze());					\
 	__retval;							\
 })
-#else /* !CONFIG_PM_SLEEP */
+#else /* !CONFIG_FREEZER */
 static inline int frozen(struct task_struct *p) { return 0; }
 static inline int freezing(struct task_struct *p) { return 0; }
 static inline void set_freeze_flag(struct task_struct *p) {}
@@ -191,6 +199,6 @@ static inline void set_freezable_with_si
 #define wait_event_freezable_timeout(wq, condition, timeout)		\
 		wait_event_interruptible_timeout(wq, condition, timeout)
 
-#endif /* !CONFIG_PM_SLEEP */
+#endif /* !CONFIG_FREEZER */
 
 #endif	/* FREEZER_H_INCLUDED */
Index: linux-2.6.27-rc1-mm1/kernel/Makefile
===================================================================
--- linux-2.6.27-rc1-mm1.orig/kernel/Makefile
+++ linux-2.6.27-rc1-mm1/kernel/Makefile
@@ -24,6 +24,7 @@ CFLAGS_REMOVE_sched_clock.o = -pg
 CFLAGS_REMOVE_sched.o = -mno-spe -pg
 endif
 
+obj-$(CONFIG_FREEZER) += freezer.o
 obj-$(CONFIG_PROFILING) += profile.o
 obj-$(CONFIG_SYSCTL_SYSCALL_CHECK) += sysctl_check.o
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
Index: linux-2.6.27-rc1-mm1/kernel/freezer.c
===================================================================
--- /dev/null
+++ linux-2.6.27-rc1-mm1/kernel/freezer.c
@@ -0,0 +1,122 @@
+/*
+ * kernel/freezer.c - Function to freeze a process
+ *
+ * Originally from kernel/power/process.c
+ */
+
+#include <linux/interrupt.h>
+#include <linux/suspend.h>
+#include <linux/module.h>
+#include <linux/syscalls.h>
+#include <linux/freezer.h>
+
+/*
+ * freezing is complete, mark current process as frozen
+ */
+static inline void frozen_process(void)
+{
+	if (!unlikely(current->flags & PF_NOFREEZE)) {
+		current->flags |= PF_FROZEN;
+		wmb();
+	}
+	clear_freeze_flag(current);
+}
+
+/* Refrigerator is place where frozen processes are stored :-). */
+void refrigerator(void)
+{
+	/* Hmm, should we be allowed to suspend when there are realtime
+	   processes around? */
+	long save;
+
+	task_lock(current);
+	if (freezing(current)) {
+		frozen_process();
+		task_unlock(current);
+	} else {
+		task_unlock(current);
+		return;
+	}
+	save = current->state;
+	pr_debug("%s entered refrigerator\n", current->comm);
+
+	spin_lock_irq(&current->sighand->siglock);
+	recalc_sigpending(); /* We sent fake signal, clean it up */
+	spin_unlock_irq(&current->sighand->siglock);
+
+	for (;;) {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		if (!frozen(current))
+			break;
+		schedule();
+	}
+	pr_debug("%s left refrigerator\n", current->comm);
+	__set_current_state(save);
+}
+EXPORT_SYMBOL(refrigerator);
+
+static void fake_signal_wake_up(struct task_struct *p)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&p->sighand->siglock, flags);
+	signal_wake_up(p, 0);
+	spin_unlock_irqrestore(&p->sighand->siglock, flags);
+}
+
+/**
+ *	freeze_task - send a freeze request to given task
+ *	@p: task to send the request to
+ *	@sig_only: if set, the request will only be sent if the task has the
+ *		PF_FREEZER_NOSIG flag unset
+ *	Return value: 'false', if @sig_only is set and the task has
+ *		PF_FREEZER_NOSIG set or the task is frozen, 'true', otherwise
+ *
+ *	The freeze request is sent by setting the tasks's TIF_FREEZE flag and
+ *	either sending a fake signal to it or waking it up, depending on whether
+ *	or not it has PF_FREEZER_NOSIG set.  If @sig_only is set and the task
+ *	has PF_FREEZER_NOSIG set (ie. it is a typical kernel thread), its
+ *	TIF_FREEZE flag will not be set.
+ */
+bool freeze_task(struct task_struct *p, bool sig_only)
+{
+	/*
+	 * We first check if the task is freezing and next if it has already
+	 * been frozen to avoid the race with frozen_process() which first marks
+	 * the task as frozen and next clears its TIF_FREEZE.
+	 */
+	if (!freezing(p)) {
+		rmb();
+		if (frozen(p))
+			return false;
+
+		if (!sig_only || should_send_signal(p))
+			set_freeze_flag(p);
+		else
+			return false;
+	}
+
+	if (should_send_signal(p)) {
+		if (!signal_pending(p))
+			fake_signal_wake_up(p);
+	} else if (sig_only) {
+		return false;
+	} else {
+		wake_up_state(p, TASK_INTERRUPTIBLE);
+	}
+
+	return true;
+}
+
+void cancel_freezing(struct task_struct *p)
+{
+	unsigned long flags;
+
+	if (freezing(p)) {
+		pr_debug("  clean up: %s\n", p->comm);
+		clear_freeze_flag(p);
+		spin_lock_irqsave(&p->sighand->siglock, flags);
+		recalc_sigpending_and_wake(p);
+		spin_unlock_irqrestore(&p->sighand->siglock, flags);
+	}
+}
Index: linux-2.6.27-rc1-mm1/kernel/power/process.c
===================================================================
--- linux-2.6.27-rc1-mm1.orig/kernel/power/process.c
+++ linux-2.6.27-rc1-mm1/kernel/power/process.c
@@ -28,121 +28,6 @@ static inline int freezeable(struct task
 	return 1;
 }
 
-/*
- * freezing is complete, mark current process as frozen
- */
-static inline void frozen_process(void)
-{
-	if (!unlikely(current->flags & PF_NOFREEZE)) {
-		current->flags |= PF_FROZEN;
-		wmb();
-	}
-	clear_freeze_flag(current);
-}
-
-/* Refrigerator is place where frozen processes are stored :-). */
-void refrigerator(void)
-{
-	/* Hmm, should we be allowed to suspend when there are realtime
-	   processes around? */
-	long save;
-
-	task_lock(current);
-	if (freezing(current)) {
-		frozen_process();
-		task_unlock(current);
-	} else {
-		task_unlock(current);
-		return;
-	}
-	save = current->state;
-	pr_debug("%s entered refrigerator\n", current->comm);
-
-	spin_lock_irq(&current->sighand->siglock);
-	recalc_sigpending(); /* We sent fake signal, clean it up */
-	spin_unlock_irq(&current->sighand->siglock);
-
-	for (;;) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		if (!frozen(current))
-			break;
-		schedule();
-	}
-	pr_debug("%s left refrigerator\n", current->comm);
-	__set_current_state(save);
-}
-
-static void fake_signal_wake_up(struct task_struct *p)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&p->sighand->siglock, flags);
-	signal_wake_up(p, 0);
-	spin_unlock_irqrestore(&p->sighand->siglock, flags);
-}
-
-static inline bool should_send_signal(struct task_struct *p)
-{
-	return !(p->flags & PF_FREEZER_NOSIG);
-}
-
-/**
- *	freeze_task - send a freeze request to given task
- *	@p: task to send the request to
- *	@sig_only: if set, the request will only be sent if the task has the
- *		PF_FREEZER_NOSIG flag unset
- *	Return value: 'false', if @sig_only is set and the task has
- *		PF_FREEZER_NOSIG set or the task is frozen, 'true', otherwise
- *
- *	The freeze request is sent by setting the tasks's TIF_FREEZE flag and
- *	either sending a fake signal to it or waking it up, depending on whether
- *	or not it has PF_FREEZER_NOSIG set.  If @sig_only is set and the task
- *	has PF_FREEZER_NOSIG set (ie. it is a typical kernel thread), its
- *	TIF_FREEZE flag will not be set.
- */
-static bool freeze_task(struct task_struct *p, bool sig_only)
-{
-	/*
-	 * We first check if the task is freezing and next if it has already
-	 * been frozen to avoid the race with frozen_process() which first marks
-	 * the task as frozen and next clears its TIF_FREEZE.
-	 */
-	if (!freezing(p)) {
-		rmb();
-		if (frozen(p))
-			return false;
-
-		if (!sig_only || should_send_signal(p))
-			set_freeze_flag(p);
-		else
-			return false;
-	}
-
-	if (should_send_signal(p)) {
-		if (!signal_pending(p))
-			fake_signal_wake_up(p);
-	} else if (sig_only) {
-		return false;
-	} else {
-		wake_up_state(p, TASK_INTERRUPTIBLE);
-	}
-
-	return true;
-}
-
-static void cancel_freezing(struct task_struct *p)
-{
-	unsigned long flags;
-
-	if (freezing(p)) {
-		pr_debug("  clean up: %s\n", p->comm);
-		clear_freeze_flag(p);
-		spin_lock_irqsave(&p->sighand->siglock, flags);
-		recalc_sigpending_and_wake(p);
-		spin_unlock_irqrestore(&p->sighand->siglock, flags);
-	}
-}
-
 static int try_to_freeze_tasks(bool sig_only)
 {
 	struct task_struct *g, *p;
@@ -264,4 +149,3 @@ void thaw_processes(void)
 	printk("done.\n");
 }
 
-EXPORT_SYMBOL(refrigerator);
Index: linux-2.6.27-rc1-mm1/kernel/power/Kconfig
===================================================================
--- linux-2.6.27-rc1-mm1.orig/kernel/power/Kconfig
+++ linux-2.6.27-rc1-mm1/kernel/power/Kconfig
@@ -85,6 +85,9 @@ config PM_SLEEP
 	depends on SUSPEND || HIBERNATION || XEN_SAVE_RESTORE
 	default y
 
+config FREEZER
+	def_bool PM_SLEEP
+
 config SUSPEND
 	bool "Suspend to RAM and standby"
 	depends on PM && ARCH_SUSPEND_POSSIBLE

-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem
  2008-08-11 23:53 [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Matt Helsley
  2008-08-11 23:53 ` [PATCH 1/5] Container Freezer: Add TIF_FREEZE flag to all architectures Matt Helsley
  2008-08-11 23:53 ` [PATCH 2/5] Container Freezer: Make refrigerator always available Matt Helsley
@ 2008-08-11 23:53 ` Matt Helsley
  2008-08-12 22:56   ` Andrew Morton
  2008-11-04  5:43   ` Paul Menage
  2008-08-11 23:53 ` [PATCH 4/5] Container Freezer: Skip frozen cgroups during power management resume Matt Helsley
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Matt Helsley @ 2008-08-11 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

This patch implements a new freezer subsystem in the control groups framework.
It provides a way to stop and resume execution of all tasks in a cgroup by
writing in the cgroup filesystem.

The freezer subsystem in the container filesystem defines a file named
freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup.
Reading will return the current state.

* Examples of usage :

   # mkdir /containers/freezer
   # mount -t cgroup -ofreezer freezer  /containers
   # mkdir /containers/0
   # echo $some_pid > /containers/0/tasks

to get status of the freezer subsystem :

   # cat /containers/0/freezer.state
   RUNNING

to freeze all tasks in the container :

   # echo FROZEN > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   FREEZING
   # cat /containers/0/freezer.state
   FROZEN

to unfreeze all tasks in the container :

   # echo RUNNING > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   RUNNING

This is the basic mechanism which should do the right thing for user space task
in a simple scenario.

It's important to note that freezing can be incomplete. In that case we return
EBUSY. This means that some tasks in the cgroup are busy doing something that
prevents us from completely freezing the cgroup at this time. After EBUSY,
the cgroup will remain partially frozen -- reflected by freezer.state reporting
"FREEZING" when read. The state will remain "FREEZING" until one of these
things happens:

	1) Userspace cancels the freezing operation by writing "RUNNING" to
		the freezer.state file
	2) Userspace retries the freezing operation by writing "FROZEN" to
		the freezer.state file (writing "FREEZING" is not legal
		and returns EIO)
	3) The tasks that blocked the cgroup from entering the "FROZEN"
		state disappear from the cgroup's set of tasks.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
---
Changes since v5:
v6:
	Use the new cgroup write_string method.
	Conditionally build and link the freezer code.
	Fix a lock ordering problem with the freezer->lock reacquire code
		Required order: freezer->lock, css_set_lock, task->alloc_lock
		Reacquiring: css_set_lock, task->alloc_lock, freezer->lock
		Solution: change freezer_fork() to not require any ordering
			between task->alloc_lock and freezer->lock

 include/linux/cgroup_subsys.h |    6 
 include/linux/freezer.h       |   22 ++
 init/Kconfig                  |    7 
 kernel/Makefile               |    1 
 kernel/cgroup_freezer.c       |  366 ++++++++++++++++++++++++++++++++++++++++++
 kernel/power/Kconfig          |    2 
 6 files changed, 399 insertions(+), 5 deletions(-)
 create mode 100644 include/linux/cgroup_freezer.h
 create mode 100644 kernel/cgroup_freezer.c

Index: linux-2.6.27-rc1-mm1/include/linux/cgroup_subsys.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/linux/cgroup_subsys.h
+++ linux-2.6.27-rc1-mm1/include/linux/cgroup_subsys.h
@@ -52,3 +52,9 @@ SUBSYS(memrlimit_cgroup)
 #endif
 
 /* */
+
+#ifdef CONFIG_CGROUP_FREEZER
+SUBSYS(freezer)
+#endif
+
+/* */
Index: linux-2.6.27-rc1-mm1/include/linux/freezer.h
===================================================================
--- linux-2.6.27-rc1-mm1.orig/include/linux/freezer.h
+++ linux-2.6.27-rc1-mm1/include/linux/freezer.h
@@ -47,22 +47,30 @@ static inline bool should_send_signal(st
 /*
  * Wake up a frozen process
  *
- * task_lock() is taken to prevent the race with refrigerator() which may
+ * task_lock() is needed to prevent the race with refrigerator() which may
  * occur if the freezing of tasks fails.  Namely, without the lock, if the
  * freezing of tasks failed, thaw_tasks() might have run before a task in
  * refrigerator() could call frozen_process(), in which case the task would be
  * frozen and no one would thaw it.
  */
-static inline int thaw_process(struct task_struct *p)
+static inline int __thaw_process(struct task_struct *p)
 {
-	task_lock(p);
 	if (frozen(p)) {
 		p->flags &= ~PF_FROZEN;
+		return 1;
+	}
+	clear_freeze_flag(p);
+	return 0;
+}
+
+static inline int thaw_process(struct task_struct *p)
+{
+	task_lock(p);
+	if (__thaw_process(p) == 1) {
 		task_unlock(p);
 		wake_up_process(p);
 		return 1;
 	}
-	clear_freeze_flag(p);
 	task_unlock(p);
 	return 0;
 }
@@ -83,6 +91,12 @@ static inline int try_to_freeze(void)
 extern bool freeze_task(struct task_struct *p, bool sig_only);
 extern void cancel_freezing(struct task_struct *p);
 
+#ifdef CONFIG_CGROUP_FREEZER
+extern int cgroup_frozen(struct task_struct *task);
+#else /* !CONFIG_CGROUP_FREEZER */
+static inline int cgroup_frozen(struct task_struct *task) { return 0; }
+#endif /* !CONFIG_CGROUP_FREEZER */
+
 /*
  * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
  * calls wait_for_completion(&vfork) and reset right after it returns from this
Index: linux-2.6.27-rc1-mm1/init/Kconfig
===================================================================
--- linux-2.6.27-rc1-mm1.orig/init/Kconfig
+++ linux-2.6.27-rc1-mm1/init/Kconfig
@@ -299,6 +299,13 @@ config CGROUP_NS
           for instance virtual servers and checkpoint/restart
           jobs.
 
+config CGROUP_FREEZER
+        bool "control group freezer subsystem"
+        depends on CGROUPS
+        help
+          Provides a way to freeze and unfreeze all tasks in a
+	  cgroup.
+
 config CGROUP_DEVICE
 	bool "Device controller for cgroups"
 	depends on CGROUPS && EXPERIMENTAL
Index: linux-2.6.27-rc1-mm1/kernel/Makefile
===================================================================
--- linux-2.6.27-rc1-mm1.orig/kernel/Makefile
+++ linux-2.6.27-rc1-mm1/kernel/Makefile
@@ -54,6 +54,7 @@ obj-$(CONFIG_KEXEC) += kexec.o
 obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup.o
 obj-$(CONFIG_CGROUP_DEBUG) += cgroup_debug.o
+obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
 obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
 obj-$(CONFIG_UTS_NS) += utsname.o
Index: linux-2.6.27-rc1-mm1/kernel/cgroup_freezer.c
===================================================================
--- /dev/null
+++ linux-2.6.27-rc1-mm1/kernel/cgroup_freezer.c
@@ -0,0 +1,366 @@
+/*
+ * cgroup_freezer.c -  control group freezer subsystem
+ *
+ * Copyright IBM Corporation, 2007
+ *
+ * Author : Cedric Le Goater <clg@fr.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+ */
+
+#include <linux/module.h>
+#include <linux/cgroup.h>
+#include <linux/fs.h>
+#include <linux/uaccess.h>
+#include <linux/freezer.h>
+#include <linux/seq_file.h>
+
+enum freezer_state {
+	STATE_RUNNING = 0,
+	STATE_FREEZING,
+	STATE_FROZEN,
+};
+
+struct freezer {
+	struct cgroup_subsys_state css;
+	enum freezer_state state;
+	spinlock_t lock; /* protects _writes_ to state */
+};
+
+static inline struct freezer *cgroup_freezer(
+		struct cgroup *cgroup)
+{
+	return container_of(
+		cgroup_subsys_state(cgroup, freezer_subsys_id),
+		struct freezer, css);
+}
+
+static inline struct freezer *task_freezer(struct task_struct *task)
+{
+	return container_of(task_subsys_state(task, freezer_subsys_id),
+			    struct freezer, css);
+}
+
+int cgroup_frozen(struct task_struct *task)
+{
+	struct freezer *freezer;
+	enum freezer_state state;
+
+	task_lock(task);
+	freezer = task_freezer(task);
+	state = freezer->state;
+	task_unlock(task);
+
+	return state == STATE_FROZEN;
+}
+
+/*
+ * Buffer size for freezer state is limited by cgroups write_string()
+ * interface. See cgroups code for the current size.
+ */
+static const char *freezer_state_strs[] = {
+	"RUNNING",
+	"FREEZING",
+	"FROZEN",
+};
+
+/*
+ * State diagram
+ * Transitions are caused by userspace writes to the freezer.state file.
+ * The values in parenthesis are state labels. The rest are edge labels.
+ *
+ * (RUNNING) --FROZEN--> (FREEZING) --FROZEN--> (FROZEN)
+ *    ^ ^                     |                       |
+ *    | \_______RUNNING_______/                       |
+ *    \_____________________________RUNNING___________/
+ */
+
+struct cgroup_subsys freezer_subsys;
+
+/* Locks taken and their ordering
+ * ------------------------------
+ * css_set_lock
+ * cgroup_mutex (AKA cgroup_lock)
+ * task->alloc_lock (AKA task_lock)
+ * freezer->lock
+ * task->sighand->siglock
+ *
+ * cgroup code forces css_set_lock to be taken before task->alloc_lock
+ *
+ * freezer_create(), freezer_destroy():
+ * cgroup_mutex [ by cgroup core ]
+ *
+ * can_attach():
+ * cgroup_mutex
+ *
+ * cgroup_frozen():
+ * task->alloc_lock (to get task's cgroup)
+ *
+ * freezer_fork() (preserving fork() performance means can't take cgroup_mutex):
+ * task->alloc_lock (to get task's cgroup)
+ * freezer->lock
+ *  sighand->siglock (if the cgroup is freezing)
+ *
+ * freezer_read():
+ * cgroup_mutex
+ *  freezer->lock
+ *   read_lock css_set_lock (cgroup iterator start)
+ *
+ * freezer_write() (freeze):
+ * cgroup_mutex
+ *  freezer->lock
+ *   read_lock css_set_lock (cgroup iterator start)
+ *    sighand->siglock
+ *
+ * freezer_write() (unfreeze):
+ * cgroup_mutex
+ *  freezer->lock
+ *   read_lock css_set_lock (cgroup iterator start)
+ *    task->alloc_lock (to prevent races with freeze_task())
+ *     sighand->siglock
+ */
+static struct cgroup_subsys_state *freezer_create(struct cgroup_subsys *ss,
+						  struct cgroup *cgroup)
+{
+	struct freezer *freezer;
+
+	freezer = kzalloc(sizeof(struct freezer), GFP_KERNEL);
+	if (!freezer)
+		return ERR_PTR(-ENOMEM);
+
+	spin_lock_init(&freezer->lock);
+	freezer->state = STATE_RUNNING;
+	return &freezer->css;
+}
+
+static void freezer_destroy(struct cgroup_subsys *ss,
+			    struct cgroup *cgroup)
+{
+	kfree(cgroup_freezer(cgroup));
+}
+
+
+static int freezer_can_attach(struct cgroup_subsys *ss,
+			      struct cgroup *new_cgroup,
+			      struct task_struct *task)
+{
+	struct freezer *freezer;
+	int retval = 0;
+
+	/*
+	 * The call to cgroup_lock() in the freezer.state write method prevents
+	 * a write to that file racing against an attach, and hence the
+	 * can_attach() result will remain valid until the attach completes.
+	 */
+	freezer = cgroup_freezer(new_cgroup);
+	if (freezer->state == STATE_FROZEN)
+		retval = -EBUSY;
+	return retval;
+}
+
+static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
+{
+	struct freezer *freezer;
+
+	task_lock(task);
+	freezer = task_freezer(task);
+	task_unlock(task);
+
+	BUG_ON(freezer->state == STATE_FROZEN);
+	spin_lock_irq(&freezer->lock);
+	/* Locking avoids race with FREEZING -> RUNNING transitions. */
+	if (freezer->state == STATE_FREEZING)
+		freeze_task(task, true);
+	spin_unlock_irq(&freezer->lock);
+}
+
+/*
+ * caller must hold freezer->lock
+ */
+static void check_if_frozen(struct cgroup *cgroup,
+			     struct freezer *freezer)
+{
+	struct cgroup_iter it;
+	struct task_struct *task;
+	unsigned int nfrozen = 0, ntotal = 0;
+
+	cgroup_iter_start(cgroup, &it);
+	while ((task = cgroup_iter_next(cgroup, &it))) {
+		ntotal++;
+		/*
+		 * Task is frozen or will freeze immediately when next it gets
+		 * woken
+		 */
+		if (frozen(task) ||
+		    (task_is_stopped_or_traced(task) && freezing(task)))
+			nfrozen++;
+	}
+
+	/*
+	 * Transition to FROZEN when no new tasks can be added ensures
+	 * that we never exist in the FROZEN state while there are unfrozen
+	 * tasks.
+	 */
+	if (nfrozen == ntotal)
+		freezer->state = STATE_FROZEN;
+	cgroup_iter_end(cgroup, &it);
+}
+
+static int freezer_read(struct cgroup *cgroup, struct cftype *cft,
+    			struct seq_file *m)
+{
+	struct freezer *freezer;
+	enum freezer_state state;
+
+	if (!cgroup_lock_live_group(cgroup))
+		return -ENODEV;
+
+	freezer = cgroup_freezer(cgroup);
+	spin_lock_irq(&freezer->lock);
+	state = freezer->state;
+	if (state == STATE_FREEZING) {
+		/* We change from FREEZING to FROZEN lazily if the cgroup was
+		 * only partially frozen when we exitted write. */
+		check_if_frozen(cgroup, freezer);
+		state = freezer->state;
+	}
+	spin_unlock_irq(&freezer->lock);
+	cgroup_unlock();
+
+	seq_puts(m, freezer_state_strs[state]);
+	seq_putc(m, '\n');
+	return 0;
+}
+
+static int try_to_freeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
+{
+	struct cgroup_iter it;
+	struct task_struct *task;
+	unsigned int num_cant_freeze_now = 0;
+
+	freezer->state = STATE_FREEZING;
+	cgroup_iter_start(cgroup, &it);
+	while ((task = cgroup_iter_next(cgroup, &it))) {
+		if (!freeze_task(task, true))
+			continue;
+		if (task_is_stopped_or_traced(task) && freezing(task))
+			/*
+			 * The freeze flag is set so these tasks will
+			 * immediately go into the fridge upon waking.
+			 */
+			continue;
+		if (!freezing(task) && !freezer_should_skip(task))
+			num_cant_freeze_now++;
+	}
+	cgroup_iter_end(cgroup, &it);
+
+	return num_cant_freeze_now ? -EBUSY : 0;
+}
+
+static int unfreeze_cgroup(struct cgroup *cgroup, struct freezer *freezer)
+{
+	struct cgroup_iter it;
+	struct task_struct *task;
+
+	cgroup_iter_start(cgroup, &it);
+	while ((task = cgroup_iter_next(cgroup, &it))) {
+		int do_wake;
+
+		task_lock(task);
+		do_wake = __thaw_process(task);
+		task_unlock(task);
+		if (do_wake)
+			wake_up_process(task);
+	}
+	cgroup_iter_end(cgroup, &it);
+	freezer->state = STATE_RUNNING;
+
+	return 0;
+}
+
+static int freezer_change_state(struct cgroup *cgroup,
+				enum freezer_state goal_state)
+{
+	struct freezer *freezer;
+	int retval = 0;
+
+	freezer = cgroup_freezer(cgroup);
+	spin_lock_irq(&freezer->lock);
+	check_if_frozen(cgroup, freezer); /* may update freezer->state */
+	if (goal_state == freezer->state)
+		goto out;
+	switch (freezer->state) {
+	case STATE_RUNNING:
+		retval = try_to_freeze_cgroup(cgroup, freezer);
+		break;
+	case STATE_FREEZING:
+		if (goal_state == STATE_FROZEN) {
+			/* Userspace is retrying after
+			 * "/bin/echo FROZEN > freezer.state" returned -EBUSY */
+			retval = try_to_freeze_cgroup(cgroup, freezer);
+			break;
+		}
+		/* state == FREEZING and goal_state == RUNNING, so unfreeze */
+	case STATE_FROZEN:
+		retval = unfreeze_cgroup(cgroup, freezer);
+		break;
+	default:
+		break;
+	}
+out:
+	spin_unlock_irq(&freezer->lock);
+
+	return retval;
+}
+
+static int freezer_write(struct cgroup *cgroup,
+			 struct cftype *cft,
+			 const char *buffer)
+{
+	int retval;
+	enum freezer_state goal_state;
+
+	if (strcmp(buffer, freezer_state_strs[STATE_RUNNING]) == 0)
+		goal_state = STATE_RUNNING;
+	else if (strcmp(buffer, freezer_state_strs[STATE_FROZEN]) == 0)
+		goal_state = STATE_FROZEN;
+	else
+		return -EIO;
+
+	if (!cgroup_lock_live_group(cgroup))
+		return -ENODEV;
+	retval = freezer_change_state(cgroup, goal_state);
+	cgroup_unlock();
+	return retval;
+}
+
+static struct cftype files[] = {
+	{
+		.name = "state",
+		.read_seq_string = freezer_read,
+		.write_string = freezer_write,
+	},
+};
+
+static int freezer_populate(struct cgroup_subsys *ss, struct cgroup *cgroup)
+{
+	return cgroup_add_files(cgroup, ss, files, ARRAY_SIZE(files));
+}
+
+struct cgroup_subsys freezer_subsys = {
+	.name		= "freezer",
+	.create		= freezer_create,
+	.destroy	= freezer_destroy,
+	.populate	= freezer_populate,
+	.subsys_id	= freezer_subsys_id,
+	.can_attach	= freezer_can_attach,
+	.attach		= NULL,
+	.fork		= freezer_fork,
+	.exit		= NULL,
+};
Index: linux-2.6.27-rc1-mm1/kernel/power/Kconfig
===================================================================
--- linux-2.6.27-rc1-mm1.orig/kernel/power/Kconfig
+++ linux-2.6.27-rc1-mm1/kernel/power/Kconfig
@@ -86,7 +86,7 @@ config PM_SLEEP
 	default y
 
 config FREEZER
-	def_bool PM_SLEEP
+	def_bool PM_SLEEP || CGROUP_FREEZER
 
 config SUSPEND
 	bool "Suspend to RAM and standby"

-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/5] Container Freezer: Skip frozen cgroups during power management resume
  2008-08-11 23:53 [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Matt Helsley
                   ` (2 preceding siblings ...)
  2008-08-11 23:53 ` [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem Matt Helsley
@ 2008-08-11 23:53 ` Matt Helsley
  2008-08-12 20:50   ` Rafael J. Wysocki
  2008-08-11 23:53 ` [PATCH 5/5] Container Freezer: Prevent frozen tasks or cgroups from changing Matt Helsley
  2008-08-12 22:44 ` [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Andrew Morton
  5 siblings, 1 reply; 27+ messages in thread
From: Matt Helsley @ 2008-08-11 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

When a system is resumed after a suspend, it will also unfreeze
frozen cgroups.

This patchs modifies the resume sequence to skip the tasks which
are part of a frozen control group.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Matt Helsley <matthltc@us.ibm.com>
---
 kernel/power/process.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6.27-rc1-mm1/kernel/power/process.c
===================================================================
--- linux-2.6.27-rc1-mm1.orig/kernel/power/process.c
+++ linux-2.6.27-rc1-mm1/kernel/power/process.c
@@ -135,6 +135,9 @@ static void thaw_tasks(bool nosig_only)
 		if (nosig_only && should_send_signal(p))
 			continue;
 
+		if (cgroup_frozen(p))
+			continue;
+
 		thaw_process(p);
 	} while_each_thread(g, p);
 	read_unlock(&tasklist_lock);

-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/5] Container Freezer: Prevent frozen tasks or cgroups from changing
  2008-08-11 23:53 [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Matt Helsley
                   ` (3 preceding siblings ...)
  2008-08-11 23:53 ` [PATCH 4/5] Container Freezer: Skip frozen cgroups during power management resume Matt Helsley
@ 2008-08-11 23:53 ` Matt Helsley
  2008-08-12 22:44 ` [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Andrew Morton
  5 siblings, 0 replies; 27+ messages in thread
From: Matt Helsley @ 2008-08-11 23:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rafael J. Wysocki, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm

Don't let frozen tasks or cgroups change. This means frozen tasks can't
leave their current cgroup for another cgroup. It also means that tasks
cannot be added to or removed from a cgroup in the FROZEN state. We
enforce these rules by checking for frozen tasks and cgroups in the
can_attach() function.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
---
Changes since v4:
v5:
	Checked use of task alloc lock for races with swsusp freeze/thaw --
		looks safe because there are explicit barriers to handle
		freeze/thaw races for individual tasks, we explicitly
		handle partial group freezing, and partial group thawing
		should be resolved without changing swsusp's loop. This should
		answer Li Zefan's last comment re: races between freeze and
		thaw.

 kernel/cgroup_freezer.c |   43 ++++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

Index: linux-2.6.27-rc1-mm1/kernel/cgroup_freezer.c
===================================================================
--- linux-2.6.27-rc1-mm1.orig/kernel/cgroup_freezer.c
+++ linux-2.6.27-rc1-mm1/kernel/cgroup_freezer.c
@@ -145,22 +145,40 @@ static void freezer_destroy(struct cgrou
 	kfree(cgroup_freezer(cgroup));
 }
 
+/* Task is frozen or will freeze immediately when next it gets woken */
+static bool is_task_frozen_enough(struct task_struct *task)
+{
+	return frozen(task) ||
+		(task_is_stopped_or_traced(task) && freezing(task));
+}
 
+/*
+ * The call to cgroup_lock() in the freezer.state write method prevents
+ * a write to that file racing against an attach, and hence the
+ * can_attach() result will remain valid until the attach completes.
+ */
 static int freezer_can_attach(struct cgroup_subsys *ss,
 			      struct cgroup *new_cgroup,
 			      struct task_struct *task)
 {
 	struct freezer *freezer;
-	int retval = 0;
+	int retval;
+
+	/* Anything frozen can't move or be moved to/from */
+
+	if (is_task_frozen_enough(task))
+		return -EBUSY;
 
-	/*
-	 * The call to cgroup_lock() in the freezer.state write method prevents
-	 * a write to that file racing against an attach, and hence the
-	 * can_attach() result will remain valid until the attach completes.
-	 */
 	freezer = cgroup_freezer(new_cgroup);
 	if (freezer->state == STATE_FROZEN)
+		return -EBUSY;
+
+	retval = 0;
+	task_lock(task);
+	freezer = task_freezer(task);
+	if (freezer->state == STATE_FROZEN)
 		retval = -EBUSY;
+	task_unlock(task);
 	return retval;
 }
 
@@ -193,12 +211,7 @@ static void check_if_frozen(struct cgrou
 	cgroup_iter_start(cgroup, &it);
 	while ((task = cgroup_iter_next(cgroup, &it))) {
 		ntotal++;
-		/*
-		 * Task is frozen or will freeze immediately when next it gets
-		 * woken
-		 */
-		if (frozen(task) ||
-		    (task_is_stopped_or_traced(task) && freezing(task)))
+		if (is_task_frozen_enough(task))
 			nfrozen++;
 	}
 
@@ -249,11 +262,7 @@ static int try_to_freeze_cgroup(struct c
 	while ((task = cgroup_iter_next(cgroup, &it))) {
 		if (!freeze_task(task, true))
 			continue;
-		if (task_is_stopped_or_traced(task) && freezing(task))
-			/*
-			 * The freeze flag is set so these tasks will
-			 * immediately go into the fridge upon waking.
-			 */
+		if (is_task_frozen_enough(task))
 			continue;
 		if (!freezing(task) && !freezer_should_skip(task))
 			num_cant_freeze_now++;

-- 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/5] Container Freezer: Make refrigerator always available
  2008-08-11 23:53 ` [PATCH 2/5] Container Freezer: Make refrigerator always available Matt Helsley
@ 2008-08-12 20:49   ` Rafael J. Wysocki
  2008-08-13  1:01     ` [ProbableSpam]Re: " Matt Helsley
  0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-08-12 20:49 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Andrew Morton, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

On Tuesday, 12 of August 2008, Matt Helsley wrote:
> Now that the TIF_FREEZE flag is available in all architectures,
> extract the refrigerator() and freeze_task() from kernel/power/process.c
> and make it available to all.
> 
> The refrigerator() can now be used in a control group subsystem
> implementing a control group freezer.
> 
> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
> Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
> Acked-by: Serge E. Hallyn <serue@us.ibm.com>
> Tested-by: Matt Helsley <matthltc@us.ibm.com>

Your Signed-off-by implies your Tested-by (at least it should ;-)).

> ---
[--snip--]
> Index: linux-2.6.27-rc1-mm1/kernel/power/Kconfig
> ===================================================================
> --- linux-2.6.27-rc1-mm1.orig/kernel/power/Kconfig
> +++ linux-2.6.27-rc1-mm1/kernel/power/Kconfig
> @@ -85,6 +85,9 @@ config PM_SLEEP
>  	depends on SUSPEND || HIBERNATION || XEN_SAVE_RESTORE
>  	default y
>  
> +config FREEZER
> +	def_bool PM_SLEEP
> +

I'd still prefer this to go into a Kconfig in the parent directory (ie. where
freezer.c and the Makefile building it are located).  Otherwise it's guaranteed
to confuse someone.

>  config SUSPEND
>  	bool "Suspend to RAM and standby"
>  	depends on PM && ARCH_SUSPEND_POSSIBLE
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/5] Container Freezer: Skip frozen cgroups during power management resume
  2008-08-11 23:53 ` [PATCH 4/5] Container Freezer: Skip frozen cgroups during power management resume Matt Helsley
@ 2008-08-12 20:50   ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-08-12 20:50 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Andrew Morton, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

On Tuesday, 12 of August 2008, Matt Helsley wrote:
> When a system is resumed after a suspend, it will also unfreeze
> frozen cgroups.
> 
> This patchs modifies the resume sequence to skip the tasks which
> are part of a frozen control group.
> 
> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
> Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
> Acked-by: Serge E. Hallyn <serue@us.ibm.com>
> Tested-by: Matt Helsley <matthltc@us.ibm.com>

Acked-by: Rafael J. Wysocki <rjw@sisk.pl>

> ---
>  kernel/power/process.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> Index: linux-2.6.27-rc1-mm1/kernel/power/process.c
> ===================================================================
> --- linux-2.6.27-rc1-mm1.orig/kernel/power/process.c
> +++ linux-2.6.27-rc1-mm1/kernel/power/process.c
> @@ -135,6 +135,9 @@ static void thaw_tasks(bool nosig_only)
>  		if (nosig_only && should_send_signal(p))
>  			continue;
>  
> +		if (cgroup_frozen(p))
> +			continue;
> +
>  		thaw_process(p);
>  	} while_each_thread(g, p);
>  	read_unlock(&tasklist_lock);
> 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer
  2008-08-11 23:53 [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Matt Helsley
                   ` (4 preceding siblings ...)
  2008-08-11 23:53 ` [PATCH 5/5] Container Freezer: Prevent frozen tasks or cgroups from changing Matt Helsley
@ 2008-08-12 22:44 ` Andrew Morton
  2008-08-13  3:47   ` [linux-pm] " Vivek Kashyap
  5 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2008-08-12 22:44 UTC (permalink / raw)
  To: Matt Helsley; +Cc: rjw, menage, lizf, linux-kernel, containers, linux-pm

On Mon, 11 Aug 2008 16:53:23 -0700
Matt Helsley <matthltc@us.ibm.com> wrote:

> This patch series introduces a cgroup subsystem that utilizes the swsusp
> freezer to freeze a group of tasks. It's immediately useful for batch job
> management scripts. It should also be useful in the future for implementing
> container checkpoint/restart.

I don't think that this provides anything like sufficient detail to justify
merging a whole bunch of stuff into Linux.

What does "It's immediately useful for batch job management scripts."
mean?  How is it useful?  Examples?  Why would an operator want this
feature, and how would it be used?  _much_ more information is needed!

Once we've actually found out what this work is useful for, we can move
onto identification of and discussion of alternatives.  One would be "why not
use plain old SIGSTOP?"  Another alternative is, of course "that's not useful
enough to justify merging the code".  But we don't know yet, coz you didn't
tell us.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem
  2008-08-11 23:53 ` [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem Matt Helsley
@ 2008-08-12 22:56   ` Andrew Morton
  2008-08-13  2:38     ` Matt Helsley
  2008-11-04  5:43   ` Paul Menage
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2008-08-12 22:56 UTC (permalink / raw)
  To: Matt Helsley
  Cc: rjw, menage, lizf, linux-kernel, containers, linux-pm, clg, serue

On Mon, 11 Aug 2008 16:53:26 -0700
Matt Helsley <matthltc@us.ibm.com> wrote:

> This patch implements a new freezer subsystem in the control groups framework.
> It provides a way to stop and resume execution of all tasks in a cgroup by
> writing in the cgroup filesystem.
> 
> The freezer subsystem in the container filesystem defines a file named
> freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
> cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup.
> Reading will return the current state.
> 
> * Examples of usage :
> 
>    # mkdir /containers/freezer
>    # mount -t cgroup -ofreezer freezer  /containers
>    # mkdir /containers/0
>    # echo $some_pid > /containers/0/tasks
> 
> to get status of the freezer subsystem :
> 
>    # cat /containers/0/freezer.state
>    RUNNING
> 
> to freeze all tasks in the container :
> 
>    # echo FROZEN > /containers/0/freezer.state
>    # cat /containers/0/freezer.state
>    FREEZING
>    # cat /containers/0/freezer.state
>    FROZEN
> 
> to unfreeze all tasks in the container :
> 
>    # echo RUNNING > /containers/0/freezer.state
>    # cat /containers/0/freezer.state
>    RUNNING
> 
> This is the basic mechanism which should do the right thing for user space task
> in a simple scenario.
> 
> It's important to note that freezing can be incomplete. In that case we return
> EBUSY. This means that some tasks in the cgroup are busy doing something that
> prevents us from completely freezing the cgroup at this time. After EBUSY,
> the cgroup will remain partially frozen -- reflected by freezer.state reporting
> "FREEZING" when read. The state will remain "FREEZING" until one of these
> things happens:
> 
> 	1) Userspace cancels the freezing operation by writing "RUNNING" to
> 		the freezer.state file
> 	2) Userspace retries the freezing operation by writing "FROZEN" to
> 		the freezer.state file (writing "FREEZING" is not legal
> 		and returns EIO)
> 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> 		state disappear from the cgroup's set of tasks.
> 
> ...

Is a Documentation/ update planned?  Documentation/cgroups.txt might be
the place, or not.

> +
> +#ifdef CONFIG_CGROUP_FREEZER
> +SUBSYS(freezer)
> +#endif
> +
> +/* */
> Index: linux-2.6.27-rc1-mm1/include/linux/freezer.h
> ===================================================================
> --- linux-2.6.27-rc1-mm1.orig/include/linux/freezer.h
> +++ linux-2.6.27-rc1-mm1/include/linux/freezer.h
> @@ -47,22 +47,30 @@ static inline bool should_send_signal(st
>  /*
>   * Wake up a frozen process
>   *
> - * task_lock() is taken to prevent the race with refrigerator() which may
> + * task_lock() is needed to prevent the race with refrigerator() which may
>   * occur if the freezing of tasks fails.  Namely, without the lock, if the
>   * freezing of tasks failed, thaw_tasks() might have run before a task in
>   * refrigerator() could call frozen_process(), in which case the task would be
>   * frozen and no one would thaw it.
>   */
> -static inline int thaw_process(struct task_struct *p)
> +static inline int __thaw_process(struct task_struct *p)
>  {
> -	task_lock(p);
>  	if (frozen(p)) {
>  		p->flags &= ~PF_FROZEN;
> +		return 1;
> +	}
> +	clear_freeze_flag(p);
> +	return 0;
> +}
> +
> +static inline int thaw_process(struct task_struct *p)
> +{
> +	task_lock(p);
> +	if (__thaw_process(p) == 1) {
>  		task_unlock(p);
>  		wake_up_process(p);
>  		return 1;
>  	}
> -	clear_freeze_flag(p);
>  	task_unlock(p);
>  	return 0;
>  }

I wonder why these are inlined.

> @@ -83,6 +91,12 @@ static inline int try_to_freeze(void)
>  extern bool freeze_task(struct task_struct *p, bool sig_only);
>  extern void cancel_freezing(struct task_struct *p);
>  
> +#ifdef CONFIG_CGROUP_FREEZER
> +extern int cgroup_frozen(struct task_struct *task);
> +#else /* !CONFIG_CGROUP_FREEZER */
> +static inline int cgroup_frozen(struct task_struct *task) { return 0; }
> +#endif /* !CONFIG_CGROUP_FREEZER */
> +
>  /*
>   * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
>   * calls wait_for_completion(&vfork) and reset right after it returns from this
> Index: linux-2.6.27-rc1-mm1/init/Kconfig
> ===================================================================
> --- linux-2.6.27-rc1-mm1.orig/init/Kconfig
> +++ linux-2.6.27-rc1-mm1/init/Kconfig
> @@ -299,6 +299,13 @@ config CGROUP_NS
>            for instance virtual servers and checkpoint/restart
>            jobs.
>  
> +config CGROUP_FREEZER
> +        bool "control group freezer subsystem"
> +        depends on CGROUPS

Should it depend on FREEZER also?

oh,

> --- linux-2.6.27-rc1-mm1.orig/kernel/power/Kconfig
> +++ linux-2.6.27-rc1-mm1/kernel/power/Kconfig
> @@ -86,7 +86,7 @@ config PM_SLEEP
>  	default y
>  
>  config FREEZER
> -	def_bool PM_SLEEP
> +	def_bool PM_SLEEP || CGROUP_FREEZER
>  

we did it that way.  Spose that makes sense.

> +        help
> +          Provides a way to freeze and unfreeze all tasks in a
> +	  cgroup.
> +
>  config CGROUP_DEVICE
>  	bool "Device controller for cgroups"
>  	depends on CGROUPS && EXPERIMENTAL
> Index: linux-2.6.27-rc1-mm1/kernel/Makefile
> ===================================================================
> --- linux-2.6.27-rc1-mm1.orig/kernel/Makefile
> +++ linux-2.6.27-rc1-mm1/kernel/Makefile
> @@ -54,6 +54,7 @@ obj-$(CONFIG_KEXEC) += kexec.o
>  obj-$(CONFIG_COMPAT) += compat.o
>  obj-$(CONFIG_CGROUPS) += cgroup.o
>  obj-$(CONFIG_CGROUP_DEBUG) += cgroup_debug.o
> +obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
>  obj-$(CONFIG_CPUSETS) += cpuset.o
>  obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
>  obj-$(CONFIG_UTS_NS) += utsname.o
> Index: linux-2.6.27-rc1-mm1/kernel/cgroup_freezer.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.27-rc1-mm1/kernel/cgroup_freezer.c
> @@ -0,0 +1,366 @@
> +/*
> + * cgroup_freezer.c -  control group freezer subsystem
> + *
> + * Copyright IBM Corporation, 2007
> + *
> + * Author : Cedric Le Goater <clg@fr.ibm.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of version 2.1 of the GNU Lesser General Public License
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/cgroup.h>
> +#include <linux/fs.h>
> +#include <linux/uaccess.h>
> +#include <linux/freezer.h>
> +#include <linux/seq_file.h>
> +
> +enum freezer_state {
> +	STATE_RUNNING = 0,

That's a pretty vanilla-sounding identifier.  Let's hope this file
never ends up including drivers/net/sfc/net_driver.h by some means. 
That's rather unlikely, but someone could easily choose to implement a
new STATE_RUNNING somewhere else.

> +	STATE_FREEZING,
> +	STATE_FROZEN,
> +};
> +
> +struct freezer {
> +	struct cgroup_subsys_state css;
> +	enum freezer_state state;
> +	spinlock_t lock; /* protects _writes_ to state */
> +};
> +
> +static inline struct freezer *cgroup_freezer(
> +		struct cgroup *cgroup)
> +{
> +	return container_of(
> +		cgroup_subsys_state(cgroup, freezer_subsys_id),
> +		struct freezer, css);
> +}
> +
> +static inline struct freezer *task_freezer(struct task_struct *task)
> +{
> +	return container_of(task_subsys_state(task, freezer_subsys_id),
> +			    struct freezer, css);
> +}
> +
> +int cgroup_frozen(struct task_struct *task)
> +{
> +	struct freezer *freezer;
> +	enum freezer_state state;
> +
> +	task_lock(task);
> +	freezer = task_freezer(task);
> +	state = freezer->state;
> +	task_unlock(task);
> +
> +	return state == STATE_FROZEN;
> +}
> +
> +/*
> + * Buffer size for freezer state is limited by cgroups write_string()
> + * interface. See cgroups code for the current size.
> + */

Is this comment in the correct place?

> +static const char *freezer_state_strs[] = {
> +	"RUNNING",
> +	"FREEZING",
> +	"FROZEN",
> +};
> +
>
> ...
>
> +
> +/*
> + * caller must hold freezer->lock
> + */
> +static void check_if_frozen(struct cgroup *cgroup,
> +			     struct freezer *freezer)

check_if_frozen() is an unfortunate name, I suspect.  Normally one
would expect a check_foo() to return a bool and have no side-effects.

Perhaps some comments explaining what it does would help.

> +{
> +	struct cgroup_iter it;
> +	struct task_struct *task;
> +	unsigned int nfrozen = 0, ntotal = 0;
> +
> +	cgroup_iter_start(cgroup, &it);
> +	while ((task = cgroup_iter_next(cgroup, &it))) {
> +		ntotal++;
> +		/*
> +		 * Task is frozen or will freeze immediately when next it gets
> +		 * woken
> +		 */
> +		if (frozen(task) ||
> +		    (task_is_stopped_or_traced(task) && freezing(task)))
> +			nfrozen++;
> +	}
> +
> +	/*
> +	 * Transition to FROZEN when no new tasks can be added ensures
> +	 * that we never exist in the FROZEN state while there are unfrozen
> +	 * tasks.
> +	 */
> +	if (nfrozen == ntotal)
> +		freezer->state = STATE_FROZEN;
> +	cgroup_iter_end(cgroup, &it);
> +}
> +
>
> ...
>
> +static int freezer_write(struct cgroup *cgroup,
> +			 struct cftype *cft,
> +			 const char *buffer)
> +{
> +	int retval;
> +	enum freezer_state goal_state;
> +
> +	if (strcmp(buffer, freezer_state_strs[STATE_RUNNING]) == 0)

Did some higher-level code take care of removing the trailing \n?

> +		goal_state = STATE_RUNNING;
> +	else if (strcmp(buffer, freezer_state_strs[STATE_FROZEN]) == 0)
> +		goal_state = STATE_FROZEN;
> +	else
> +		return -EIO;
> +
> +	if (!cgroup_lock_live_group(cgroup))
> +		return -ENODEV;
> +	retval = freezer_change_state(cgroup, goal_state);
> +	cgroup_unlock();
> +	return retval;
> +}
> +
>
> ...
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ProbableSpam]Re: [PATCH 2/5] Container Freezer: Make refrigerator always available
  2008-08-12 20:49   ` Rafael J. Wysocki
@ 2008-08-13  1:01     ` Matt Helsley
  2008-08-13 14:31       ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Matt Helsley @ 2008-08-13  1:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn


On Tue, 2008-08-12 at 22:49 +0200, Rafael J. Wysocki wrote:
> On Tuesday, 12 of August 2008, Matt Helsley wrote:
> > Now that the TIF_FREEZE flag is available in all architectures,
> > extract the refrigerator() and freeze_task() from kernel/power/process.c
> > and make it available to all.
> > 
> > The refrigerator() can now be used in a control group subsystem
> > implementing a control group freezer.
> > 
> > Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
> > Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
> > Acked-by: Serge E. Hallyn <serue@us.ibm.com>
> > Tested-by: Matt Helsley <matthltc@us.ibm.com>
> 
> Your Signed-off-by implies your Tested-by (at least it should ;-)).

I wasn't sure that was always true so I added it just in case. I'll take
it out of any future postings.

> > ---
> [--snip--]
> > Index: linux-2.6.27-rc1-mm1/kernel/power/Kconfig
> > ===================================================================
> > --- linux-2.6.27-rc1-mm1.orig/kernel/power/Kconfig
> > +++ linux-2.6.27-rc1-mm1/kernel/power/Kconfig
> > @@ -85,6 +85,9 @@ config PM_SLEEP
> >  	depends on SUSPEND || HIBERNATION || XEN_SAVE_RESTORE
> >  	default y
> >  
> > +config FREEZER
> > +	def_bool PM_SLEEP
> > +
> 
> I'd still prefer this to go into a Kconfig in the parent directory (ie. where
> freezer.c and the Makefile building it are located).  Otherwise it's guaranteed
> to confuse someone.

	I'm thinking of making a patch moving the cgroups config variables into
a kernel/Kconfig.cgroups file. Would moving config FREEZER to such a
file be satisfactory? Paul, what do you think?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem
  2008-08-12 22:56   ` Andrew Morton
@ 2008-08-13  2:38     ` Matt Helsley
  2008-08-13 14:51       ` Rafael J. Wysocki
  0 siblings, 1 reply; 27+ messages in thread
From: Matt Helsley @ 2008-08-13  2:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rjw, menage, lizf, linux-kernel, containers, linux-pm, clg, serue


On Tue, 2008-08-12 at 15:56 -0700, Andrew Morton wrote:
> On Mon, 11 Aug 2008 16:53:26 -0700
> Matt Helsley <matthltc@us.ibm.com> wrote:
> 
> > This patch implements a new freezer subsystem in the control groups framework.
> > It provides a way to stop and resume execution of all tasks in a cgroup by
> > writing in the cgroup filesystem.
> > 
> > The freezer subsystem in the container filesystem defines a file named
> > freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
> > cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup.
> > Reading will return the current state.
> > 
> > * Examples of usage :
> > 
> >    # mkdir /containers/freezer
> >    # mount -t cgroup -ofreezer freezer  /containers
> >    # mkdir /containers/0
> >    # echo $some_pid > /containers/0/tasks
> > 
> > to get status of the freezer subsystem :
> > 
> >    # cat /containers/0/freezer.state
> >    RUNNING
> > 
> > to freeze all tasks in the container :
> > 
> >    # echo FROZEN > /containers/0/freezer.state
> >    # cat /containers/0/freezer.state
> >    FREEZING
> >    # cat /containers/0/freezer.state
> >    FROZEN
> > 
> > to unfreeze all tasks in the container :
> > 
> >    # echo RUNNING > /containers/0/freezer.state
> >    # cat /containers/0/freezer.state
> >    RUNNING
> > 
> > This is the basic mechanism which should do the right thing for user space task
> > in a simple scenario.
> > 
> > It's important to note that freezing can be incomplete. In that case we return
> > EBUSY. This means that some tasks in the cgroup are busy doing something that
> > prevents us from completely freezing the cgroup at this time. After EBUSY,
> > the cgroup will remain partially frozen -- reflected by freezer.state reporting
> > "FREEZING" when read. The state will remain "FREEZING" until one of these
> > things happens:
> > 
> > 	1) Userspace cancels the freezing operation by writing "RUNNING" to
> > 		the freezer.state file
> > 	2) Userspace retries the freezing operation by writing "FROZEN" to
> > 		the freezer.state file (writing "FREEZING" is not legal
> > 		and returns EIO)
> > 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> > 		state disappear from the cgroup's set of tasks.
> > 
> > ...
> 
> Is a Documentation/ update planned?  Documentation/cgroups.txt might be
> the place, or not.

I'll post a patch for that.

> > +
> > +#ifdef CONFIG_CGROUP_FREEZER
> > +SUBSYS(freezer)
> > +#endif
> > +
> > +/* */
> > Index: linux-2.6.27-rc1-mm1/include/linux/freezer.h
> > ===================================================================
> > --- linux-2.6.27-rc1-mm1.orig/include/linux/freezer.h
> > +++ linux-2.6.27-rc1-mm1/include/linux/freezer.h
> > @@ -47,22 +47,30 @@ static inline bool should_send_signal(st
> >  /*
> >   * Wake up a frozen process
> >   *
> > - * task_lock() is taken to prevent the race with refrigerator() which may
> > + * task_lock() is needed to prevent the race with refrigerator() which may
> >   * occur if the freezing of tasks fails.  Namely, without the lock, if the
> >   * freezing of tasks failed, thaw_tasks() might have run before a task in
> >   * refrigerator() could call frozen_process(), in which case the task would be
> >   * frozen and no one would thaw it.
> >   */
> > -static inline int thaw_process(struct task_struct *p)
> > +static inline int __thaw_process(struct task_struct *p)
> >  {
> > -	task_lock(p);
> >  	if (frozen(p)) {
> >  		p->flags &= ~PF_FROZEN;
> > +		return 1;
> > +	}
> > +	clear_freeze_flag(p);
> > +	return 0;
> > +}
> > +
> > +static inline int thaw_process(struct task_struct *p)
> > +{
> > +	task_lock(p);
> > +	if (__thaw_process(p) == 1) {
> >  		task_unlock(p);
> >  		wake_up_process(p);
> >  		return 1;
> >  	}
> > -	clear_freeze_flag(p);
> >  	task_unlock(p);
> >  	return 0;
> >  }
> 
> I wonder why these are inlined.

I wanted the changes to be obvious. I think uninlining this would be a
separate improvement. I'll post a patch uninlining these.

> > @@ -83,6 +91,12 @@ static inline int try_to_freeze(void)
> >  extern bool freeze_task(struct task_struct *p, bool sig_only);
> >  extern void cancel_freezing(struct task_struct *p);
> >  
> > +#ifdef CONFIG_CGROUP_FREEZER
> > +extern int cgroup_frozen(struct task_struct *task);
> > +#else /* !CONFIG_CGROUP_FREEZER */
> > +static inline int cgroup_frozen(struct task_struct *task) { return 0; }
> > +#endif /* !CONFIG_CGROUP_FREEZER */
> > +
> >  /*
> >   * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
> >   * calls wait_for_completion(&vfork) and reset right after it returns from this
> > Index: linux-2.6.27-rc1-mm1/init/Kconfig
> > ===================================================================
> > --- linux-2.6.27-rc1-mm1.orig/init/Kconfig
> > +++ linux-2.6.27-rc1-mm1/init/Kconfig
> > @@ -299,6 +299,13 @@ config CGROUP_NS
> >            for instance virtual servers and checkpoint/restart
> >            jobs.
> >  
> > +config CGROUP_FREEZER
> > +        bool "control group freezer subsystem"
> > +        depends on CGROUPS
> 
> Should it depend on FREEZER also?
>
> oh,
> 
> > --- linux-2.6.27-rc1-mm1.orig/kernel/power/Kconfig
> > +++ linux-2.6.27-rc1-mm1/kernel/power/Kconfig
> > @@ -86,7 +86,7 @@ config PM_SLEEP
> >  	default y
> >  
> >  config FREEZER
> > -	def_bool PM_SLEEP
> > +	def_bool PM_SLEEP || CGROUP_FREEZER
> >  
> 
> we did it that way.  Spose that makes sense.

	I did consider a few alternatives for this. Makefile and cpp didn't
seem as nice as this. "select" didn't fit. Using "depends on" does
directly translate the build dependency. However I didn't think it would
be clear to everyone configuring a kernel that they had to enable
"FREEZER" before they could get PM_SLEEP or CGROUP_FREEZER.

	Also, Rafael has asked to see this in a kernel/Kconfig file instead
(see his reply to patch 2).

> > +        help
> > +          Provides a way to freeze and unfreeze all tasks in a
> > +	  cgroup.
> > +
> >  config CGROUP_DEVICE
> >  	bool "Device controller for cgroups"
> >  	depends on CGROUPS && EXPERIMENTAL
> > Index: linux-2.6.27-rc1-mm1/kernel/Makefile
> > ===================================================================
> > --- linux-2.6.27-rc1-mm1.orig/kernel/Makefile
> > +++ linux-2.6.27-rc1-mm1/kernel/Makefile
> > @@ -54,6 +54,7 @@ obj-$(CONFIG_KEXEC) += kexec.o
> >  obj-$(CONFIG_COMPAT) += compat.o
> >  obj-$(CONFIG_CGROUPS) += cgroup.o
> >  obj-$(CONFIG_CGROUP_DEBUG) += cgroup_debug.o
> > +obj-$(CONFIG_CGROUP_FREEZER) += cgroup_freezer.o
> >  obj-$(CONFIG_CPUSETS) += cpuset.o
> >  obj-$(CONFIG_CGROUP_NS) += ns_cgroup.o
> >  obj-$(CONFIG_UTS_NS) += utsname.o
> > Index: linux-2.6.27-rc1-mm1/kernel/cgroup_freezer.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6.27-rc1-mm1/kernel/cgroup_freezer.c
> > @@ -0,0 +1,366 @@
> > +/*
> > + * cgroup_freezer.c -  control group freezer subsystem
> > + *
> > + * Copyright IBM Corporation, 2007
> > + *
> > + * Author : Cedric Le Goater <clg@fr.ibm.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms of version 2.1 of the GNU Lesser General Public License
> > + * as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it would be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/cgroup.h>
> > +#include <linux/fs.h>
> > +#include <linux/uaccess.h>
> > +#include <linux/freezer.h>
> > +#include <linux/seq_file.h>
> > +
> > +enum freezer_state {
> > +	STATE_RUNNING = 0,
> 
> That's a pretty vanilla-sounding identifier.  Let's hope this file
> never ends up including drivers/net/sfc/net_driver.h by some means. 
> That's rather unlikely, but someone could easily choose to implement a
> new STATE_RUNNING somewhere else.

Good point. Do CGROUP_THAWED, CGROUP_FREEZING, CGROUP_FROZEN make
sensible substitutions?

> > +	STATE_FREEZING,
> > +	STATE_FROZEN,
> > +};
> > +
> > +struct freezer {
> > +	struct cgroup_subsys_state css;
> > +	enum freezer_state state;
> > +	spinlock_t lock; /* protects _writes_ to state */
> > +};
> > +
> > +static inline struct freezer *cgroup_freezer(
> > +		struct cgroup *cgroup)
> > +{
> > +	return container_of(
> > +		cgroup_subsys_state(cgroup, freezer_subsys_id),
> > +		struct freezer, css);
> > +}
> > +
> > +static inline struct freezer *task_freezer(struct task_struct *task)
> > +{
> > +	return container_of(task_subsys_state(task, freezer_subsys_id),
> > +			    struct freezer, css);
> > +}
> > +
> > +int cgroup_frozen(struct task_struct *task)
> > +{
> > +	struct freezer *freezer;
> > +	enum freezer_state state;
> > +
> > +	task_lock(task);
> > +	freezer = task_freezer(task);
> > +	state = freezer->state;
> > +	task_unlock(task);
> > +
> > +	return state == STATE_FROZEN;
> > +}
> > +
> > +/*
> > + * Buffer size for freezer state is limited by cgroups write_string()
> > + * interface. See cgroups code for the current size.
> > + */
> 
> Is this comment in the correct place?

I think so. Perhaps I should have more clearly connected it with
freezer_state_strs. How about:

/*
 * cgroups_write_string() limits the size of these strings to
 * CGROUP_LOCAL_BUFFER_SIZE
 */

> > +static const char *freezer_state_strs[] = {
> > +	"RUNNING",
> > +	"FREEZING",
> > +	"FROZEN",
> > +};
> > +
> >
> > ...
> >
> > +
> > +/*
> > + * caller must hold freezer->lock
> > + */
> > +static void check_if_frozen(struct cgroup *cgroup,
> > +			     struct freezer *freezer)
> 
> check_if_frozen() is an unfortunate name, I suspect.  Normally one
> would expect a check_foo() to return a bool and have no side-effects.
> 
> Perhaps some comments explaining what it does would help.

OK. I'll try to think up a better name and if that's not sufficiently
explanatory I'll add a comment explaining what it should do.

> > +{
> > +	struct cgroup_iter it;
> > +	struct task_struct *task;
> > +	unsigned int nfrozen = 0, ntotal = 0;
> > +
> > +	cgroup_iter_start(cgroup, &it);
> > +	while ((task = cgroup_iter_next(cgroup, &it))) {
> > +		ntotal++;
> > +		/*
> > +		 * Task is frozen or will freeze immediately when next it gets
> > +		 * woken
> > +		 */
> > +		if (frozen(task) ||
> > +		    (task_is_stopped_or_traced(task) && freezing(task)))
> > +			nfrozen++;
> > +	}
> > +
> > +	/*
> > +	 * Transition to FROZEN when no new tasks can be added ensures
> > +	 * that we never exist in the FROZEN state while there are unfrozen
> > +	 * tasks.
> > +	 */
> > +	if (nfrozen == ntotal)
> > +		freezer->state = STATE_FROZEN;
> > +	cgroup_iter_end(cgroup, &it);
> > +}
> > +
> >
> > ...
> >
> > +static int freezer_write(struct cgroup *cgroup,
> > +			 struct cftype *cft,
> > +			 const char *buffer)
> > +{
> > +	int retval;
> > +	enum freezer_state goal_state;
> > +
> > +	if (strcmp(buffer, freezer_state_strs[STATE_RUNNING]) == 0)
> 
> Did some higher-level code take care of removing the trailing \n?

Yes. cgroup_write_string() in kernel/cgroup.c does strstrip(buffer)

Thanks for the review!

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-pm] [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer
  2008-08-12 22:44 ` [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Andrew Morton
@ 2008-08-13  3:47   ` Vivek Kashyap
  2008-08-13  4:08     ` Andrew Morton
  0 siblings, 1 reply; 27+ messages in thread
From: Vivek Kashyap @ 2008-08-13  3:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matt Helsley, containers, lizf, linux-kernel, menage, linux-pm

On Tue, 12 Aug 2008, Andrew Morton wrote:

> On Mon, 11 Aug 2008 16:53:23 -0700
> Matt Helsley <matthltc@us.ibm.com> wrote:
>
>> This patch series introduces a cgroup subsystem that utilizes the swsusp
>> freezer to freeze a group of tasks. It's immediately useful for batch job
>> management scripts. It should also be useful in the future for implementing
>> container checkpoint/restart.
>
> I don't think that this provides anything like sufficient detail to justify
> merging a whole bunch of stuff into Linux.
>
> What does "It's immediately useful for batch job management scripts."
> mean?  How is it useful?  Examples?  Why would an operator want this
> feature, and how would it be used?  _much_ more information is needed!

A batch-manager/job scheduler (such as loadleveler) must at times stop all 
tasks associated with a workload being run in a container. The workload may 
constitute of multiple tasks - some of which are in different sessions. 
A signal (STOP/CONT) to the Containers 'init' wont be transmitted to all 
the tasks in the Container. The 'freezer' mechanism allows this control
to be implemented in a clean way.

Vivek
>
> Once we've actually found out what this work is useful for, we can move
> onto identification of and discussion of alternatives.  One would be "why not
> use plain old SIGSTOP?"  Another alternative is, of course "that's not useful
> enough to justify merging the code".  But we don't know yet, coz you didn't
> tell us.
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>

__

Vivek Kashyap
Linux Technology Center, IBM

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-pm] [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer
  2008-08-13  3:47   ` [linux-pm] " Vivek Kashyap
@ 2008-08-13  4:08     ` Andrew Morton
  2008-08-15 21:54       ` Matt Helsley
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Morton @ 2008-08-13  4:08 UTC (permalink / raw)
  To: Vivek Kashyap
  Cc: Matt Helsley, containers, lizf, linux-kernel, menage, linux-pm

On Tue, 12 Aug 2008 20:47:10 -0700 (Pacific Daylight Time) Vivek Kashyap <kashyapv@us.ibm.com> wrote:

> On Tue, 12 Aug 2008, Andrew Morton wrote:
> 
> > On Mon, 11 Aug 2008 16:53:23 -0700
> > Matt Helsley <matthltc@us.ibm.com> wrote:
> >
> >> This patch series introduces a cgroup subsystem that utilizes the swsusp
> >> freezer to freeze a group of tasks. It's immediately useful for batch job
> >> management scripts. It should also be useful in the future for implementing
> >> container checkpoint/restart.
> >
> > I don't think that this provides anything like sufficient detail to justify
> > merging a whole bunch of stuff into Linux.
> >
> > What does "It's immediately useful for batch job management scripts."
> > mean?  How is it useful?  Examples?  Why would an operator want this
> > feature, and how would it be used?  _much_ more information is needed!
> 
> A batch-manager/job scheduler (such as loadleveler)

what's that?

> must at times stop all 
> tasks associated with a workload being run in a container.

why?

I'm being deliberately obtuse here, but I'm afraid you guys haven't
come anywhere into the vague nearby neighbourhood of adequately describing
this feature.

Please provide proper and full reasons for merging this code into
Linux.  If they exist.  This shouldn't be too hard.

Please put yourself in my position:

me: [patch] <this stuff>
Linus: why are you sending me this?
me: I have not the faintest idea

trust me - many others will be in my position too.

> The workload may 
> constitute of multiple tasks - some of which are in different sessions. 
> A signal (STOP/CONT) to the Containers 'init' wont be transmitted to all 
> the tasks in the Container. The 'freezer' mechanism allows this control
> to be implemented in a clean way.

So why not implement a send-signal-to-all-tasks-in-a-container
controller?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [ProbableSpam]Re: [PATCH 2/5] Container Freezer: Make refrigerator always available
  2008-08-13  1:01     ` [ProbableSpam]Re: " Matt Helsley
@ 2008-08-13 14:31       ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-08-13 14:31 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Andrew Morton, Paul Menage, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

On Wednesday, 13 of August 2008, Matt Helsley wrote:
> 
> On Tue, 2008-08-12 at 22:49 +0200, Rafael J. Wysocki wrote:
> > On Tuesday, 12 of August 2008, Matt Helsley wrote:
> > > Now that the TIF_FREEZE flag is available in all architectures,
> > > extract the refrigerator() and freeze_task() from kernel/power/process.c
> > > and make it available to all.
> > > 
> > > The refrigerator() can now be used in a control group subsystem
> > > implementing a control group freezer.
> > > 
> > > Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
> > > Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
> > > Acked-by: Serge E. Hallyn <serue@us.ibm.com>
> > > Tested-by: Matt Helsley <matthltc@us.ibm.com>
> > 
> > Your Signed-off-by implies your Tested-by (at least it should ;-)).
> 
> I wasn't sure that was always true so I added it just in case. I'll take
> it out of any future postings.
> 
> > > ---
> > [--snip--]
> > > Index: linux-2.6.27-rc1-mm1/kernel/power/Kconfig
> > > ===================================================================
> > > --- linux-2.6.27-rc1-mm1.orig/kernel/power/Kconfig
> > > +++ linux-2.6.27-rc1-mm1/kernel/power/Kconfig
> > > @@ -85,6 +85,9 @@ config PM_SLEEP
> > >  	depends on SUSPEND || HIBERNATION || XEN_SAVE_RESTORE
> > >  	default y
> > >  
> > > +config FREEZER
> > > +	def_bool PM_SLEEP
> > > +
> > 
> > I'd still prefer this to go into a Kconfig in the parent directory (ie. where
> > freezer.c and the Makefile building it are located).  Otherwise it's guaranteed
> > to confuse someone.
> 
> 	I'm thinking of making a patch moving the cgroups config variables into
> a kernel/Kconfig.cgroups file. Would moving config FREEZER to such a
> file be satisfactory? Paul, what do you think?

Well, in fact FREEZER is not directly dependent on cgroups, as it can also
depend on PM_SLEEP, even if cgroups are not used at all.

I would just add 'Kconfig.freezer' to 'kernel', put 'config FREEZER' in there
and make it depend on whatever needs it.  Of course,
'source "kernel/Kconfig.freezer"' would have to be added to top-level
Kconfigs for all architectures, but please note that only a few architectures
include 'kernel/power/Kconfig', so you'd have to change the top-level
Kconfigs anyway.

Alternatively, you can just add 'config FREEZER' directly to the top-level
Kconfigs.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem
  2008-08-13  2:38     ` Matt Helsley
@ 2008-08-13 14:51       ` Rafael J. Wysocki
  0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2008-08-13 14:51 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Andrew Morton, menage, lizf, linux-kernel, containers, linux-pm,
	clg, serue

On Wednesday, 13 of August 2008, Matt Helsley wrote:
> 
> On Tue, 2008-08-12 at 15:56 -0700, Andrew Morton wrote:
> > On Mon, 11 Aug 2008 16:53:26 -0700
> > Matt Helsley <matthltc@us.ibm.com> wrote:
> > 
> > > This patch implements a new freezer subsystem in the control groups framework.
> > > It provides a way to stop and resume execution of all tasks in a cgroup by
> > > writing in the cgroup filesystem.
> > > 
> > > The freezer subsystem in the container filesystem defines a file named
> > > freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
> > > cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup.
> > > Reading will return the current state.
> > > 
> > > * Examples of usage :
> > > 
> > >    # mkdir /containers/freezer
> > >    # mount -t cgroup -ofreezer freezer  /containers
> > >    # mkdir /containers/0
> > >    # echo $some_pid > /containers/0/tasks
> > > 
> > > to get status of the freezer subsystem :
> > > 
> > >    # cat /containers/0/freezer.state
> > >    RUNNING
> > > 
> > > to freeze all tasks in the container :
> > > 
> > >    # echo FROZEN > /containers/0/freezer.state
> > >    # cat /containers/0/freezer.state
> > >    FREEZING
> > >    # cat /containers/0/freezer.state
> > >    FROZEN
> > > 
> > > to unfreeze all tasks in the container :
> > > 
> > >    # echo RUNNING > /containers/0/freezer.state
> > >    # cat /containers/0/freezer.state
> > >    RUNNING
> > > 
> > > This is the basic mechanism which should do the right thing for user space task
> > > in a simple scenario.
> > > 
> > > It's important to note that freezing can be incomplete. In that case we return
> > > EBUSY. This means that some tasks in the cgroup are busy doing something that
> > > prevents us from completely freezing the cgroup at this time. After EBUSY,
> > > the cgroup will remain partially frozen -- reflected by freezer.state reporting
> > > "FREEZING" when read. The state will remain "FREEZING" until one of these
> > > things happens:
> > > 
> > > 	1) Userspace cancels the freezing operation by writing "RUNNING" to
> > > 		the freezer.state file
> > > 	2) Userspace retries the freezing operation by writing "FROZEN" to
> > > 		the freezer.state file (writing "FREEZING" is not legal
> > > 		and returns EIO)
> > > 	3) The tasks that blocked the cgroup from entering the "FROZEN"
> > > 		state disappear from the cgroup's set of tasks.
> > > 
> > > ...
> > 
> > Is a Documentation/ update planned?  Documentation/cgroups.txt might be
> > the place, or not.
> 
> I'll post a patch for that.
> 
> > > +
> > > +#ifdef CONFIG_CGROUP_FREEZER
> > > +SUBSYS(freezer)
> > > +#endif
> > > +
> > > +/* */
> > > Index: linux-2.6.27-rc1-mm1/include/linux/freezer.h
> > > ===================================================================
> > > --- linux-2.6.27-rc1-mm1.orig/include/linux/freezer.h
> > > +++ linux-2.6.27-rc1-mm1/include/linux/freezer.h
> > > @@ -47,22 +47,30 @@ static inline bool should_send_signal(st
> > >  /*
> > >   * Wake up a frozen process
> > >   *
> > > - * task_lock() is taken to prevent the race with refrigerator() which may
> > > + * task_lock() is needed to prevent the race with refrigerator() which may
> > >   * occur if the freezing of tasks fails.  Namely, without the lock, if the
> > >   * freezing of tasks failed, thaw_tasks() might have run before a task in
> > >   * refrigerator() could call frozen_process(), in which case the task would be
> > >   * frozen and no one would thaw it.
> > >   */
> > > -static inline int thaw_process(struct task_struct *p)
> > > +static inline int __thaw_process(struct task_struct *p)
> > >  {
> > > -	task_lock(p);
> > >  	if (frozen(p)) {
> > >  		p->flags &= ~PF_FROZEN;
> > > +		return 1;
> > > +	}
> > > +	clear_freeze_flag(p);
> > > +	return 0;
> > > +}
> > > +
> > > +static inline int thaw_process(struct task_struct *p)
> > > +{
> > > +	task_lock(p);
> > > +	if (__thaw_process(p) == 1) {
> > >  		task_unlock(p);
> > >  		wake_up_process(p);
> > >  		return 1;
> > >  	}
> > > -	clear_freeze_flag(p);
> > >  	task_unlock(p);
> > >  	return 0;
> > >  }
> > 
> > I wonder why these are inlined.
> 
> I wanted the changes to be obvious. I think uninlining this would be a
> separate improvement. I'll post a patch uninlining these.
> 
> > > @@ -83,6 +91,12 @@ static inline int try_to_freeze(void)
> > >  extern bool freeze_task(struct task_struct *p, bool sig_only);
> > >  extern void cancel_freezing(struct task_struct *p);
> > >  
> > > +#ifdef CONFIG_CGROUP_FREEZER
> > > +extern int cgroup_frozen(struct task_struct *task);
> > > +#else /* !CONFIG_CGROUP_FREEZER */
> > > +static inline int cgroup_frozen(struct task_struct *task) { return 0; }
> > > +#endif /* !CONFIG_CGROUP_FREEZER */
> > > +
> > >  /*
> > >   * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it
> > >   * calls wait_for_completion(&vfork) and reset right after it returns from this
> > > Index: linux-2.6.27-rc1-mm1/init/Kconfig
> > > ===================================================================
> > > --- linux-2.6.27-rc1-mm1.orig/init/Kconfig
> > > +++ linux-2.6.27-rc1-mm1/init/Kconfig
> > > @@ -299,6 +299,13 @@ config CGROUP_NS
> > >            for instance virtual servers and checkpoint/restart
> > >            jobs.
> > >  
> > > +config CGROUP_FREEZER
> > > +        bool "control group freezer subsystem"
> > > +        depends on CGROUPS
> > 
> > Should it depend on FREEZER also?
> >
> > oh,
> > 
> > > --- linux-2.6.27-rc1-mm1.orig/kernel/power/Kconfig
> > > +++ linux-2.6.27-rc1-mm1/kernel/power/Kconfig
> > > @@ -86,7 +86,7 @@ config PM_SLEEP
> > >  	default y
> > >  
> > >  config FREEZER
> > > -	def_bool PM_SLEEP
> > > +	def_bool PM_SLEEP || CGROUP_FREEZER
> > >  
> > 
> > we did it that way.  Spose that makes sense.
> 
> 	I did consider a few alternatives for this. Makefile and cpp didn't
> seem as nice as this. "select" didn't fit. Using "depends on" does
> directly translate the build dependency. However I didn't think it would
> be clear to everyone configuring a kernel that they had to enable
> "FREEZER" before they could get PM_SLEEP or CGROUP_FREEZER.
> 
> 	Also, Rafael has asked to see this in a kernel/Kconfig file instead
> (see his reply to patch 2).

Yes, I have and there's a good reason for that IMO - you want FREEZER to be
used by architectures that don't include 'kernel/power/Kconfig', apparently.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [linux-pm] [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer
  2008-08-13  4:08     ` Andrew Morton
@ 2008-08-15 21:54       ` Matt Helsley
  0 siblings, 0 replies; 27+ messages in thread
From: Matt Helsley @ 2008-08-15 21:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vivek Kashyap, containers, lizf, linux-kernel, menage, linux-pm


On Tue, 2008-08-12 at 21:08 -0700, Andrew Morton wrote:
> On Tue, 12 Aug 2008 20:47:10 -0700 (Pacific Daylight Time) Vivek Kashyap <kashyapv@us.ibm.com> wrote:
> 
> > On Tue, 12 Aug 2008, Andrew Morton wrote:
> > 
> > > On Mon, 11 Aug 2008 16:53:23 -0700
> > > Matt Helsley <matthltc@us.ibm.com> wrote:
> > >
> > >> This patch series introduces a cgroup subsystem that utilizes the swsusp
> > >> freezer to freeze a group of tasks. It's immediately useful for batch job
> > >> management scripts. It should also be useful in the future for implementing
> > >> container checkpoint/restart.
> > >
> > > I don't think that this provides anything like sufficient detail to justify
> > > merging a whole bunch of stuff into Linux.
> > >
> > > What does "It's immediately useful for batch job management scripts."
> > > mean?  How is it useful?  Examples?  Why would an operator want this
> > > feature, and how would it be used?  _much_ more information is needed!
> >
> > A batch-manager/job scheduler (such as loadleveler)
> 
> what's that?
>
> > must at times stop all 
> > tasks associated with a workload being run in a container.
> 
> why?
> 
> I'm being deliberately obtuse here, but I'm afraid you guys haven't
> come anywhere into the vague nearby neighbourhood of adequately describing
> this feature.
> 
> Please provide proper and full reasons for merging this code into
> Linux.  If they exist.  This shouldn't be too hard.
> 
> Please put yourself in my position:
> 
> me: [patch] <this stuff>
> Linus: why are you sending me this?
> me: I have not the faintest idea
> 
> trust me - many others will be in my position too.

Hi Andrew,

	Sorry for being so quiet. I've been carefully considering your email
and composing what I hope is a much better description of why the code
should eventually be merged:

	The cgroup freezer is useful to batch job management system which start
and stop sets of tasks in order to schedule the resources of a machine
according to the desires of a system administrator. This sort of program
is often used on HPC clusters to schedule access to the cluster as a
whole. The cgroup freezer uses cgroups to describe the set of tasks to
be started/stopped by the batch job management system. It also provides
a means to start and stop the tasks composing the job.

	The cgroup freezer will also be useful for checkpointing running groups
of tasks. The freezer allows the checkpoint code to obtain a consistent
image of the tasks by attempting to force the tasks in a cgroup into a
quiescent state. Once the tasks are quiescent another task can
walk /proc or invoke a kernel interface to gather information about the
quiesced tasks. Checkpointed tasks can be restarted later should a
recoverable error occur. This also allows the checkpointed tasks to be
migrated between nodes in a cluster by copying the gathered information
to another node and restarting the tasks there.

	Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
and resuming tasks in userspace. Both of these signals are observable
from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
blocked, or ignored it can be seen by waiting or ptracing parent tasks.
SIGCONT is especially unsuitable since it can be caught by the task. Any
programs designed to watch for SIGSTOP and SIGCONT could be broken by
attempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
demonstrate this problem using nested bash shells:

	$ echo $$
	16644
	$ bash
	$ echo $$
	16690

	From a second, unrelated bash shell:
	$ kill -SIGSTOP 16690
	$ kill -SIGCONT 16990

	<at this point 16990 exits and causes 16644 to exit too>

	This happens because bash can observe both signals and choose how it
responds to them.

	Another example of a program which catches and responds to these
signals is gdb. In fact any program designed to use ptrace is likely to
have a problem with this method of stopping and resuming tasks.

	 In contrast, the cgroup freezer uses the kernel freezer code to
prevent the freeze/unfreeze cycle from becoming visible to the tasks
being frozen. This allows the bash example above and gdb to run as
expected.

> > The workload may 
> > constitute of multiple tasks - some of which are in different sessions. 
> > A signal (STOP/CONT) to the Containers 'init' wont be transmitted to all 
> > the tasks in the Container. The 'freezer' mechanism allows this control
> > to be implemented in a clean way.
> 
> So why not implement a send-signal-to-all-tasks-in-a-container
> controller?

	I have posted such a controller to the containers list in the past. For
the reasons cited above I don't think its suitable as a replacement for
the freezer controller.

Please let me know if the reasons for merging this code remain unclear.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem
  2008-08-11 23:53 ` [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem Matt Helsley
  2008-08-12 22:56   ` Andrew Morton
@ 2008-11-04  5:43   ` Paul Menage
  2008-11-04  6:12     ` Li Zefan
  1 sibling, 1 reply; 27+ messages in thread
From: Paul Menage @ 2008-11-04  5:43 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Andrew Morton, Rafael J. Wysocki, Li Zefan, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

On Mon, Aug 11, 2008 at 3:53 PM, Matt Helsley <matthltc@us.ibm.com> wrote:
> +}
> +
> +static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
> +{
> +       struct freezer *freezer;
> +
> +       task_lock(task);
> +       freezer = task_freezer(task);
> +       task_unlock(task);
> +
> +       BUG_ON(freezer->state == STATE_FROZEN);
> +       spin_lock_irq(&freezer->lock);
> +       /* Locking avoids race with FREEZING -> RUNNING transitions. */
> +       if (freezer->state == STATE_FREEZING)
> +               freeze_task(task, true);
> +       spin_unlock_irq(&freezer->lock);
> +}

Sorry for such a delayed response to this patch, but I just noticed
(in mainline now)
the change to move the task_lock() to only encompass the
task_freezer() call.

That results in absolutely zero protection from the task_lock() - as
soon as you drop it, then in theory the current task could move cgroup
and the old freezer structure be freed.

Having said that, I think that in this case any locking my be
unnecessary since task isn't on the tasklist yet, so can't be selected
to move cgroups. (Although this does make me wonder whether
cpuset.c:move_member_tasks_to_cpuset() can fail silently if it races
with a fork).

On top of that, for a system that configures in the cgroup freezer
system but doesn't ever use it, every task is in the same freezer
cgroup (the root cgroup) and task_freezer(task)->lock becomes a global
spinlock. I think this would certainly show up on some benchmarks
although I don't know how bad it would be in a practical sense. But it
might be worth considering making using of the cgroup bind() callback
to track whether or not the freezer subsystem is in use, and
short-circuiting this in freezer_fork() without any locking if you're
not active.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem
  2008-11-04  5:43   ` Paul Menage
@ 2008-11-04  6:12     ` Li Zefan
  2008-11-04  6:40       ` Paul Menage
  2008-11-04  6:48       ` [PATCH] freezer_cg: remove task_lock from freezer_fork() Li Zefan
  0 siblings, 2 replies; 27+ messages in thread
From: Li Zefan @ 2008-11-04  6:12 UTC (permalink / raw)
  To: Paul Menage
  Cc: Matt Helsley, Andrew Morton, Rafael J. Wysocki, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

Paul Menage wrote:
> On Mon, Aug 11, 2008 at 3:53 PM, Matt Helsley <matthltc@us.ibm.com> wrote:
>> +}
>> +
>> +static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
>> +{
>> +       struct freezer *freezer;
>> +
>> +       task_lock(task);
>> +       freezer = task_freezer(task);
>> +       task_unlock(task);
>> +
>> +       BUG_ON(freezer->state == STATE_FROZEN);
>> +       spin_lock_irq(&freezer->lock);
>> +       /* Locking avoids race with FREEZING -> RUNNING transitions. */
>> +       if (freezer->state == STATE_FREEZING)
>> +               freeze_task(task, true);
>> +       spin_unlock_irq(&freezer->lock);
>> +}
> 
> Sorry for such a delayed response to this patch, but I just noticed
> (in mainline now)
> the change to move the task_lock() to only encompass the
> task_freezer() call.
> 
> That results in absolutely zero protection from the task_lock() - as
> soon as you drop it, then in theory the current task could move cgroup
> and the old freezer structure be freed.
> 
> Having said that, I think that in this case any locking my be
> unnecessary since task isn't on the tasklist yet, so can't be selected
> to move cgroups. (Although this does make me wonder whether
> cpuset.c:move_member_tasks_to_cpuset() can fail silently if it races
> with a fork).
> 
> On top of that, for a system that configures in the cgroup freezer
> system but doesn't ever use it, every task is in the same freezer
> cgroup (the root cgroup) and task_freezer(task)->lock becomes a global
> spinlock. I think this would certainly show up on some benchmarks
> although I don't know how bad it would be in a practical sense. But it
> might be worth considering making using of the cgroup bind() callback
> to track whether or not the freezer subsystem is in use, and
> short-circuiting this in freezer_fork() without any locking if you're
> not active.
> 

I think another reasonable and easier way is to disable writing freezer.state
of top cgroup, so we can skip checks in freezer_fork() for tasks in top cgroup.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem
  2008-11-04  6:12     ` Li Zefan
@ 2008-11-04  6:40       ` Paul Menage
  2008-11-04  7:25         ` Li Zefan
  2008-11-04  7:35         ` [RFC][PATCH] freezer_cg: disable writing freezer.state of root cgroup Li Zefan
  2008-11-04  6:48       ` [PATCH] freezer_cg: remove task_lock from freezer_fork() Li Zefan
  1 sibling, 2 replies; 27+ messages in thread
From: Paul Menage @ 2008-11-04  6:40 UTC (permalink / raw)
  To: Li Zefan
  Cc: Matt Helsley, Andrew Morton, Rafael J. Wysocki, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

On Mon, Nov 3, 2008 at 10:12 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>
> I think another reasonable and easier way is to disable writing freezer.state
> of top cgroup, so we can skip checks in freezer_fork() for tasks in top cgroup.
>

Yes, if freezing the root cgroup isn't needed then that is definitely
simpler - I wasn't sure if Matt wanted to be able to freeze the root
group with this subsystem.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] freezer_cg: remove task_lock from freezer_fork()
  2008-11-04  6:12     ` Li Zefan
  2008-11-04  6:40       ` Paul Menage
@ 2008-11-04  6:48       ` Li Zefan
  1 sibling, 0 replies; 27+ messages in thread
From: Li Zefan @ 2008-11-04  6:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matt Helsley, Rafael J. Wysocki, Linux-Kernel, Linux Containers,
	linux-pm, Cedric Le Goater, Serge E. Hallyn, Paul Menage

In theory the task can be moved to another cgroup and the freezer will
be freed right after task_lock is dropped, so the lock results in zero
protection.

But in the case of freezer_fork() no lock is needed, since the task
is not in tasklist yet so it won't be moved to another cgroup, so
task->cgroups won't be changed or invalidated.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 kernel/cgroup_freezer.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 7fa476f..6605907 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -184,9 +184,13 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
 {
 	struct freezer *freezer;
 
-	task_lock(task);
+	/*
+	 * No lock is needed, since the task isn't on tasklist yet,
+	 * so it can't be moved to another cgroup, which means the
+	 * freezer won't be removed and will be valid during this
+	 * function call.
+	 */
 	freezer = task_freezer(task);
-	task_unlock(task);
 
 	spin_lock_irq(&freezer->lock);
 	BUG_ON(freezer->state == CGROUP_FROZEN);
-- 
1.5.4.rc3

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem
  2008-11-04  6:40       ` Paul Menage
@ 2008-11-04  7:25         ` Li Zefan
  2008-11-04  7:35         ` [RFC][PATCH] freezer_cg: disable writing freezer.state of root cgroup Li Zefan
  1 sibling, 0 replies; 27+ messages in thread
From: Li Zefan @ 2008-11-04  7:25 UTC (permalink / raw)
  To: Paul Menage
  Cc: Matt Helsley, Andrew Morton, Rafael J. Wysocki, Linux-Kernel,
	Linux Containers, linux-pm, Cedric Le Goater, Serge E. Hallyn

Paul Menage wrote:
> On Mon, Nov 3, 2008 at 10:12 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>> I think another reasonable and easier way is to disable writing freezer.state
>> of top cgroup, so we can skip checks in freezer_fork() for tasks in top cgroup.
>>
> 
> Yes, if freezing the root cgroup isn't needed then that is definitely
> simpler - I wasn't sure if Matt wanted to be able to freeze the root
> group with this subsystem.
> 

If someone just mounts the freezer subsystem and freeze the root cgroup,
all the tasks will be freezed, and then seems all he can do is reset the
machine (or do a suspend/resume).

That's why I think it's reasonable to disable it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC][PATCH] freezer_cg: disable writing freezer.state of root cgroup
  2008-11-04  6:40       ` Paul Menage
  2008-11-04  7:25         ` Li Zefan
@ 2008-11-04  7:35         ` Li Zefan
  2008-11-04  7:56           ` Paul Menage
  1 sibling, 1 reply; 27+ messages in thread
From: Li Zefan @ 2008-11-04  7:35 UTC (permalink / raw)
  To: Matt Helsley, Cedric Le Goater
  Cc: Paul Menage, Rafael J. Wysocki, Linux-Kernel, Linux Containers,
	linux-pm, Andrew Morton, Serge E. Hallyn

With this change, the root cgroup is unfreezable, and writing to
its freezer.state returns -EIO.

I think it's reasonable to disallow freezing all the tasks in the
root cgroup. And this avoids fork overhead when freezer subsystem
is compiled but not used.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
---
 Documentation/cgroups/freezer-subsystem.txt |   18 ++++++++++--------
 kernel/cgroup_freezer.c                     |    7 +++++++
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/Documentation/cgroups/freezer-subsystem.txt b/Documentation/cgroups/freezer-subsystem.txt
index c50ab58..e842bad 100644
--- a/Documentation/cgroups/freezer-subsystem.txt
+++ b/Documentation/cgroups/freezer-subsystem.txt
@@ -1,4 +1,4 @@
-	The cgroup freezer is useful to batch job management system which start
+The cgroup freezer is useful to batch job management system which start
 and stop sets of tasks in order to schedule the resources of a machine
 according to the desires of a system administrator. This sort of program
 is often used on HPC clusters to schedule access to the cluster as a
@@ -6,7 +6,7 @@ whole. The cgroup freezer uses cgroups to describe the set of tasks to
 be started/stopped by the batch job management system. It also provides
 a means to start and stop the tasks composing the job.
 
-	The cgroup freezer will also be useful for checkpointing running groups
+The cgroup freezer will also be useful for checkpointing running groups
 of tasks. The freezer allows the checkpoint code to obtain a consistent
 image of the tasks by attempting to force the tasks in a cgroup into a
 quiescent state. Once the tasks are quiescent another task can
@@ -16,7 +16,7 @@ recoverable error occur. This also allows the checkpointed tasks to be
 migrated between nodes in a cluster by copying the gathered information
 to another node and restarting the tasks there.
 
-	Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
+Sequences of SIGSTOP and SIGCONT are not always sufficient for stopping
 and resuming tasks in userspace. Both of these signals are observable
 from within the tasks we wish to freeze. While SIGSTOP cannot be caught,
 blocked, or ignored it can be seen by waiting or ptracing parent tasks.
@@ -37,26 +37,28 @@ demonstrate this problem using nested bash shells:
 
 	<at this point 16990 exits and causes 16644 to exit too>
 
-	This happens because bash can observe both signals and choose how it
+This happens because bash can observe both signals and choose how it
 responds to them.
 
-	Another example of a program which catches and responds to these
+Another example of a program which catches and responds to these
 signals is gdb. In fact any program designed to use ptrace is likely to
 have a problem with this method of stopping and resuming tasks.
 
-	 In contrast, the cgroup freezer uses the kernel freezer code to
+In contrast, the cgroup freezer uses the kernel freezer code to
 prevent the freeze/unfreeze cycle from becoming visible to the tasks
 being frozen. This allows the bash example above and gdb to run as
 expected.
 
-	The freezer subsystem in the container filesystem defines a file named
+The freezer subsystem in the container filesystem defines a file named
 freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the
 cgroup. Subsequently writing "THAWED" will unfreeze the tasks in the cgroup.
 Reading will return the current state.
 
+Note it's not allowed to freeze the root cgroup.
+
 * Examples of usage :
 
-   # mkdir /containers/freezer
+   # mkdir /containers/
    # mount -t cgroup -ofreezer freezer  /containers
    # mkdir /containers/0
    # echo $some_pid > /containers/0/tasks
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index 6605907..6b5c45d 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -192,6 +192,9 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
 	 */
 	freezer = task_freezer(task);
 
+	if (!freezer->css.cgroup->parent)
+		return;
+
 	spin_lock_irq(&freezer->lock);
 	BUG_ON(freezer->state == CGROUP_FROZEN);
 
@@ -330,6 +333,10 @@ static int freezer_write(struct cgroup *cgroup,
 	int retval;
 	enum freezer_state goal_state;
 
+	/* It's not allowed to freeze the root cgroup */
+	if (!cgroup->parent)
+		return -EIO;
+
 	if (strcmp(buffer, freezer_state_strs[CGROUP_THAWED]) == 0)
 		goal_state = CGROUP_THAWED;
 	else if (strcmp(buffer, freezer_state_strs[CGROUP_FROZEN]) == 0)
-- 
1.5.4.rc3

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] freezer_cg: disable writing freezer.state of root  cgroup
  2008-11-04  7:35         ` [RFC][PATCH] freezer_cg: disable writing freezer.state of root cgroup Li Zefan
@ 2008-11-04  7:56           ` Paul Menage
  2008-11-04  8:01             ` Li Zefan
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Menage @ 2008-11-04  7:56 UTC (permalink / raw)
  To: Li Zefan
  Cc: Matt Helsley, Cedric Le Goater, Rafael J. Wysocki, Linux-Kernel,
	Linux Containers, linux-pm, Andrew Morton, Serge E. Hallyn

On Mon, Nov 3, 2008 at 11:35 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
> With this change, the root cgroup is unfreezable, and writing to
> its freezer.state returns -EIO.

EINVAL might be more consistent with other systems like cpusets.

Or maybe just skip populating the file entirely for the root cgroup?
It's not useful if it can't be written and always returns the same
value.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] freezer_cg: disable writing freezer.state of root cgroup
  2008-11-04  7:56           ` Paul Menage
@ 2008-11-04  8:01             ` Li Zefan
  2008-11-05 10:18               ` Cedric Le Goater
  2008-11-06  0:48               ` Paul Menage
  0 siblings, 2 replies; 27+ messages in thread
From: Li Zefan @ 2008-11-04  8:01 UTC (permalink / raw)
  To: Paul Menage
  Cc: Matt Helsley, Cedric Le Goater, Rafael J. Wysocki, Linux-Kernel,
	Linux Containers, linux-pm, Andrew Morton, Serge E. Hallyn

Paul Menage wrote:
> On Mon, Nov 3, 2008 at 11:35 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>> With this change, the root cgroup is unfreezable, and writing to
>> its freezer.state returns -EIO.
> 
> EINVAL might be more consistent with other systems like cpusets.
> 

Because writing FREEZE (or other invalid value) to freezer.state returns EIO..

Or that EIO should be also changed to EINVAL.

> Or maybe just skip populating the file entirely for the root cgroup?
> It's not useful if it can't be written and always returns the same
> value.
> 

I thought about this. I'll see which way Matt and Cedric prefer.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] freezer_cg: disable writing freezer.state of root cgroup
  2008-11-04  8:01             ` Li Zefan
@ 2008-11-05 10:18               ` Cedric Le Goater
  2008-11-06  0:48               ` Paul Menage
  1 sibling, 0 replies; 27+ messages in thread
From: Cedric Le Goater @ 2008-11-05 10:18 UTC (permalink / raw)
  To: Li Zefan
  Cc: Paul Menage, Matt Helsley, Rafael J. Wysocki, Linux-Kernel,
	Linux Containers, linux-pm, Andrew Morton, Serge E. Hallyn

Hello !

>> Or maybe just skip populating the file entirely for the root cgroup?
>> It's not useful if it can't be written and always returns the same
>> value.
> 
> I thought about this. I'll see which way Matt and Cedric prefer.

I think that Paul's suggestion of disabling the root freezer makes sense.

Thanks,

C.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] freezer_cg: disable writing freezer.state of root  cgroup
  2008-11-04  8:01             ` Li Zefan
  2008-11-05 10:18               ` Cedric Le Goater
@ 2008-11-06  0:48               ` Paul Menage
  1 sibling, 0 replies; 27+ messages in thread
From: Paul Menage @ 2008-11-06  0:48 UTC (permalink / raw)
  To: Li Zefan
  Cc: Matt Helsley, Cedric Le Goater, Rafael J. Wysocki, Linux-Kernel,
	Linux Containers, linux-pm, Andrew Morton, Serge E. Hallyn

On Tue, Nov 4, 2008 at 12:01 AM, Li Zefan <lizf@cn.fujitsu.com> wrote:
> Paul Menage wrote:
>> On Mon, Nov 3, 2008 at 11:35 PM, Li Zefan <lizf@cn.fujitsu.com> wrote:
>>> With this change, the root cgroup is unfreezable, and writing to
>>> its freezer.state returns -EIO.
>>
>> EINVAL might be more consistent with other systems like cpusets.
>>
>
> Because writing FREEZE (or other invalid value) to freezer.state returns EIO..
>
> Or that EIO should be also changed to EINVAL.

Yes, I think that should be EINVAL too - seems more consistent.

Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-11-06  0:48 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-11 23:53 [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Matt Helsley
2008-08-11 23:53 ` [PATCH 1/5] Container Freezer: Add TIF_FREEZE flag to all architectures Matt Helsley
2008-08-11 23:53 ` [PATCH 2/5] Container Freezer: Make refrigerator always available Matt Helsley
2008-08-12 20:49   ` Rafael J. Wysocki
2008-08-13  1:01     ` [ProbableSpam]Re: " Matt Helsley
2008-08-13 14:31       ` Rafael J. Wysocki
2008-08-11 23:53 ` [PATCH 3/5] Container Freezer: Implement freezer cgroup subsystem Matt Helsley
2008-08-12 22:56   ` Andrew Morton
2008-08-13  2:38     ` Matt Helsley
2008-08-13 14:51       ` Rafael J. Wysocki
2008-11-04  5:43   ` Paul Menage
2008-11-04  6:12     ` Li Zefan
2008-11-04  6:40       ` Paul Menage
2008-11-04  7:25         ` Li Zefan
2008-11-04  7:35         ` [RFC][PATCH] freezer_cg: disable writing freezer.state of root cgroup Li Zefan
2008-11-04  7:56           ` Paul Menage
2008-11-04  8:01             ` Li Zefan
2008-11-05 10:18               ` Cedric Le Goater
2008-11-06  0:48               ` Paul Menage
2008-11-04  6:48       ` [PATCH] freezer_cg: remove task_lock from freezer_fork() Li Zefan
2008-08-11 23:53 ` [PATCH 4/5] Container Freezer: Skip frozen cgroups during power management resume Matt Helsley
2008-08-12 20:50   ` Rafael J. Wysocki
2008-08-11 23:53 ` [PATCH 5/5] Container Freezer: Prevent frozen tasks or cgroups from changing Matt Helsley
2008-08-12 22:44 ` [PATCH 0/5] Container Freezer v6: Reuse Suspend Freezer Andrew Morton
2008-08-13  3:47   ` [linux-pm] " Vivek Kashyap
2008-08-13  4:08     ` Andrew Morton
2008-08-15 21:54       ` Matt Helsley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).