LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v10 0/9] freezer for cgroup v2
@ 2019-04-05 17:46 Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 1/9] cgroup: rename freezer.c into legacy_freezer.c Roman Gushchin
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:46 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin

This patchset implements freezer for cgroup v2.

It provides similar functionality as v1 freezer, but the interface
conforms to the cgroup v2 interface design principles, and it
provides a better user experience: tasks can be killed, ptrace works,
there is no separate controller, which has to be enabled, etc.

Patches (1), (2) and (3) are some preparational work, patch (4) contains
the implementation, patch (5) is a small cgroup kselftest fix,
patch (6) covers freezer adds 6 new kselftests to cover the freezer
functionality. Patchse (7) and (8) adding tracing points to simplify
the debugging process. Patch (9) adds corresponding docs.

v10->v9:
  - removed redundant fatal_signal_pending() check
  - reworked vfork support
  - minor cleanups
  - rebase to cgroup/for-5.2

v9->v8:
  - added support for vfork
  - added a kselftest test for vfork case
  - several tests fixes/improvements
  - renamed stopped* into frozen* across the patchset
  - other minor fixes

v8->v7:
  - reworked/simplified cgroup frozen task accounting by merging nr_stopped
  and nr_frozen and removing nr_tasks_to_freeze
  - don't notify the parent process if the child is going from the stopped
  to the frozen state

v7->v6:
  - handle properly the case, when a task is both stopped and frozen
  - check for CGRP_FREEZE instead of CGRP_FROZEN in cgroup_exit()
  - minor cosmetic changes and rebase

v6->v5:
  - reverted to clear TIF_SIGPENDING with additional checks before schedule(),
  as proposed by Oleg Nesterov
  - made cgroup v2 freezer working with the system freezer (by using
  freezable_schedule())
  - make freezer working with SIGSTOPped and PTRACEd tasks
  - added tests to cover freezing a cgroup with SIGSTOPped and PTRACEd tasks

v5->v4:
  - rewored cgroup state transition code (suggested by Tejun Heo)
  - look at JOBCTL_TRAP_FREEZE instead of task->frozen in
    recalc_sigpending(), check for task->frozen and JOBCTL_TRAP_FREEZE
    in signal_pending_state() (suggested by Oleg Nesterov)
  - some cosmetic changes in signal.c (suggested by Oleg Nesterov)
  - cleaned up comments

v4->v3:
  - reading nr_descendants doesn't require taking css_set_lock anymore
  - fixed docs based on Mike Rapoport's feedback
  - fixed double irq lock found by Dan Carpenter

v3->v2:
  - dropped TASK_FROZEN for now, frozen tasks are put into TASK_INTERRUPTIBLE
  state; it's probably not the final version, but the API question can be
  discussed separately
  - don't clear TIF_SIGPENDING before going to sleep, instead add
  task->frozen check in signal_pending_state() and recalc_sigpending()
  - cgroup-level counter are now synchronized using css_set_lock,
  which simplified the whole code (e.g. per-cgroup works were removed)
  - the amount of comments increased significantly
  - many other improvements incorporating feedback from Tejun and Oleg

v2->v1:
  - fixed locking aroung calling cgroup_freezer_leave()
  - added docs

Roman Gushchin (9):
  cgroup: rename freezer.c into legacy_freezer.c
  cgroup: implement __cgroup_task_count() helper
  cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock
  cgroup: cgroup v2 freezer
  kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy()
  kselftests: cgroup: add freezer controller self-tests
  cgroup: make TRACE_CGROUP_PATH irq-safe
  cgroup: add tracing points for cgroup v2 freezer
  cgroup: document cgroup v2 freezer interface

 Documentation/admin-guide/cgroup-v2.rst       |  27 +
 include/linux/cgroup-defs.h                   |  33 +
 include/linux/cgroup.h                        |  43 +
 include/linux/sched.h                         |   2 +
 include/linux/sched/jobctl.h                  |   2 +
 include/trace/events/cgroup.h                 |  55 ++
 kernel/cgroup/Makefile                        |   4 +-
 kernel/cgroup/cgroup-internal.h               |   8 +-
 kernel/cgroup/cgroup-v1.c                     |  16 -
 kernel/cgroup/cgroup.c                        | 151 +++-
 kernel/cgroup/freezer.c                       | 647 +++++--------
 kernel/cgroup/legacy_freezer.c                | 481 ++++++++++
 kernel/fork.c                                 |   2 +
 kernel/signal.c                               |  70 +-
 tools/testing/selftests/cgroup/.gitignore     |   1 +
 tools/testing/selftests/cgroup/Makefile       |   2 +
 tools/testing/selftests/cgroup/cgroup_util.c  |  58 +-
 tools/testing/selftests/cgroup/cgroup_util.h  |   5 +
 tools/testing/selftests/cgroup/test_freezer.c | 851 ++++++++++++++++++
 19 files changed, 2026 insertions(+), 432 deletions(-)
 create mode 100644 kernel/cgroup/legacy_freezer.c
 create mode 100644 tools/testing/selftests/cgroup/test_freezer.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 1/9] cgroup: rename freezer.c into legacy_freezer.c
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
@ 2019-04-05 17:47 ` Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 2/9] cgroup: implement __cgroup_task_count() helper Roman Gushchin
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin

Freezer.c will contain an implementation of cgroup v2 freezer,
so let's rename the v1 freezer to avoid naming conflicts.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
---
 kernel/cgroup/Makefile                        | 2 +-
 kernel/cgroup/{freezer.c => legacy_freezer.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename kernel/cgroup/{freezer.c => legacy_freezer.c} (100%)

diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index bfcdae896122..8d5689ca94b9 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o
 
-obj-$(CONFIG_CGROUP_FREEZER) += freezer.o
+obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += pids.o
 obj-$(CONFIG_CGROUP_RDMA) += rdma.o
 obj-$(CONFIG_CPUSETS) += cpuset.o
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/legacy_freezer.c
similarity index 100%
rename from kernel/cgroup/freezer.c
rename to kernel/cgroup/legacy_freezer.c
-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 2/9] cgroup: implement __cgroup_task_count() helper
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 1/9] cgroup: rename freezer.c into legacy_freezer.c Roman Gushchin
@ 2019-04-05 17:47 ` Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 3/9] cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock Roman Gushchin
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin

The helper is identical to the existing cgroup_task_count()
except it doesn't take the css_set_lock by itself, assuming
that the caller does.

Also, move cgroup_task_count() implementation into
kernel/cgroup/cgroup.c, as there is nothing specific to cgroup v1.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
---
 kernel/cgroup/cgroup-internal.h |  1 +
 kernel/cgroup/cgroup-v1.c       | 16 ----------------
 kernel/cgroup/cgroup.c          | 33 +++++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index 30e39f3932ad..02c001ffe2e2 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -240,6 +240,7 @@ int cgroup_rmdir(struct kernfs_node *kn);
 int cgroup_show_path(struct seq_file *sf, struct kernfs_node *kf_node,
 		     struct kernfs_root *kf_root);
 
+int __cgroup_task_count(const struct cgroup *cgrp);
 int cgroup_task_count(const struct cgroup *cgrp);
 
 /*
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index c126b34fd4ff..68ca5de7ec27 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -342,22 +342,6 @@ static struct cgroup_pidlist *cgroup_pidlist_find_create(struct cgroup *cgrp,
 	return l;
 }
 
-/**
- * cgroup_task_count - count the number of tasks in a cgroup.
- * @cgrp: the cgroup in question
- */
-int cgroup_task_count(const struct cgroup *cgrp)
-{
-	int count = 0;
-	struct cgrp_cset_link *link;
-
-	spin_lock_irq(&css_set_lock);
-	list_for_each_entry(link, &cgrp->cset_links, cset_link)
-		count += link->cset->nr_tasks;
-	spin_unlock_irq(&css_set_lock);
-	return count;
-}
-
 /*
  * Load a cgroup's pidarray with either procs' tgids or tasks' pids
  */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f219c195a9a5..3008ea684aa0 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -593,6 +593,39 @@ static void cgroup_get_live(struct cgroup *cgrp)
 	css_get(&cgrp->self);
 }
 
+/**
+ * __cgroup_task_count - count the number of tasks in a cgroup. The caller
+ * is responsible for taking the css_set_lock.
+ * @cgrp: the cgroup in question
+ */
+int __cgroup_task_count(const struct cgroup *cgrp)
+{
+	int count = 0;
+	struct cgrp_cset_link *link;
+
+	lockdep_assert_held(&css_set_lock);
+
+	list_for_each_entry(link, &cgrp->cset_links, cset_link)
+		count += link->cset->nr_tasks;
+
+	return count;
+}
+
+/**
+ * cgroup_task_count - count the number of tasks in a cgroup.
+ * @cgrp: the cgroup in question
+ */
+int cgroup_task_count(const struct cgroup *cgrp)
+{
+	int count;
+
+	spin_lock_irq(&css_set_lock);
+	count = __cgroup_task_count(cgrp);
+	spin_unlock_irq(&css_set_lock);
+
+	return count;
+}
+
 struct cgroup_subsys_state *of_css(struct kernfs_open_file *of)
 {
 	struct cgroup *cgrp = of->kn->parent->priv;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 3/9] cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 1/9] cgroup: rename freezer.c into legacy_freezer.c Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 2/9] cgroup: implement __cgroup_task_count() helper Roman Gushchin
@ 2019-04-05 17:47 ` Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 4/9] cgroup: cgroup v2 freezer Roman Gushchin
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin

The number of descendant cgroups and the number of dying
descendant cgroups are currently synchronized using the cgroup_mutex.

The number of descendant cgroups will be required by the cgroup v2
freezer, which will use it to determine if a cgroup is frozen
(depending on total number of descendants and number of frozen
descendants). It's not always acceptable to grab the cgroup_mutex,
especially from quite hot paths (e.g. exit()).

To avoid this, let's additionally synchronize these counters using
the css_set_lock.

So, it's safe to read these counters with either cgroup_mutex or
css_set_lock locked, and for changing both locks should be acquired.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
---
 include/linux/cgroup-defs.h | 5 +++++
 kernel/cgroup/cgroup.c      | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 1c70803e9f77..7d57890cec67 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -349,6 +349,11 @@ struct cgroup {
 	 * Dying cgroups are cgroups which were deleted by a user,
 	 * but are still existing because someone else is holding a reference.
 	 * max_descendants is a maximum allowed number of descent cgroups.
+	 *
+	 * nr_descendants and nr_dying_descendants are protected
+	 * by cgroup_mutex and css_set_lock. It's fine to read them holding
+	 * any of cgroup_mutex and css_set_lock; for writing both locks
+	 * should be held.
 	 */
 	int nr_descendants;
 	int nr_dying_descendants;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 3008ea684aa0..786ceef2f222 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4811,9 +4811,11 @@ static void css_release_work_fn(struct work_struct *work)
 		if (cgroup_on_dfl(cgrp))
 			cgroup_rstat_flush(cgrp);
 
+		spin_lock_irq(&css_set_lock);
 		for (tcgrp = cgroup_parent(cgrp); tcgrp;
 		     tcgrp = cgroup_parent(tcgrp))
 			tcgrp->nr_dying_descendants--;
+		spin_unlock_irq(&css_set_lock);
 
 		cgroup_idr_remove(&cgrp->root->cgroup_idr, cgrp->id);
 		cgrp->id = -1;
@@ -5031,12 +5033,14 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	if (ret)
 		goto out_psi_free;
 
+	spin_lock_irq(&css_set_lock);
 	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) {
 		cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;
 
 		if (tcgrp != cgrp)
 			tcgrp->nr_descendants++;
 	}
+	spin_unlock_irq(&css_set_lock);
 
 	if (notify_on_release(parent))
 		set_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags);
@@ -5321,10 +5325,12 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	if (parent && cgroup_is_threaded(cgrp))
 		parent->nr_threaded_children--;
 
+	spin_lock_irq(&css_set_lock);
 	for (tcgrp = cgroup_parent(cgrp); tcgrp; tcgrp = cgroup_parent(tcgrp)) {
 		tcgrp->nr_descendants--;
 		tcgrp->nr_dying_descendants++;
 	}
+	spin_unlock_irq(&css_set_lock);
 
 	cgroup1_check_for_release(parent);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
                   ` (2 preceding siblings ...)
  2019-04-05 17:47 ` [PATCH v10 3/9] cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock Roman Gushchin
@ 2019-04-05 17:47 ` Roman Gushchin
  2019-04-19 15:19   ` Oleg Nesterov
  2019-04-24 16:02   ` Oleg Nesterov
  2019-04-05 17:47 ` [PATCH v10 5/9] kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy() Roman Gushchin
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin

Cgroup v1 implements the freezer controller, which provides an ability
to stop the workload in a cgroup and temporarily free up some
resources (cpu, io, network bandwidth and, potentially, memory)
for some other tasks. Cgroup v2 lacks this functionality.

This patch implements freezer for cgroup v2.

Cgroup v2 freezer tries to put tasks into a state similar to jobctl
stop. This means that tasks can be killed, ptraced (using
PTRACE_SEIZE*), and interrupted. It is possible to attach to
a frozen task, get some information (e.g. read registers) and detach.
It's also possible to migrate a frozen tasks to another cgroup.

This differs cgroup v2 freezer from cgroup v1 freezer, which mostly
tried to imitate the system-wide freezer. However uninterruptible
sleep is fine when all tasks are going to be frozen (hibernation case),
it's not the acceptable state for some subset of the system.

Cgroup v2 freezer is not supporting freezing kthreads.
If a non-root cgroup contains kthread, the cgroup still can be frozen,
but the kthread will remain running, the cgroup will be shown
as non-frozen, and the notification will not be delivered.

* PTRACE_ATTACH is not working because non-fatal signal delivery
is blocked in frozen state.

There are some interface differences between cgroup v1 and cgroup v2
freezer too, which are required to conform the cgroup v2 interface
design principles:
1) There is no separate controller, which has to be turned on:
the functionality is always available and is represented by
cgroup.freeze and cgroup.events cgroup control files.
2) The desired state is defined by the cgroup.freeze control file.
Any hierarchical configuration is allowed.
3) The interface is asynchronous. The actual state is available
using cgroup.events control file ("frozen" field). There are no
dedicated transitional states.
4) It's allowed to make any changes with the cgroup hierarchy
(create new cgroups, remove old cgroups, move tasks between cgroups)
no matter if some cgroups are frozen.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: kernel-team@fb.com
---
 include/linux/cgroup-defs.h  |  28 ++++
 include/linux/cgroup.h       |  43 +++++
 include/linux/sched.h        |   2 +
 include/linux/sched/jobctl.h |   2 +
 kernel/cgroup/Makefile       |   2 +-
 kernel/cgroup/cgroup.c       | 110 +++++++++++-
 kernel/cgroup/freezer.c      | 317 +++++++++++++++++++++++++++++++++++
 kernel/fork.c                |   2 +
 kernel/signal.c              |  70 +++++++-
 9 files changed, 566 insertions(+), 10 deletions(-)
 create mode 100644 kernel/cgroup/freezer.c

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 7d57890cec67..77258d276f93 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -65,6 +65,12 @@ enum {
 	 * specified at mount time and thus is implemented here.
 	 */
 	CGRP_CPUSET_CLONE_CHILDREN,
+
+	/* Control group has to be frozen. */
+	CGRP_FREEZE,
+
+	/* Cgroup is frozen. */
+	CGRP_FROZEN,
 };
 
 /* cgroup_root->flags */
@@ -317,6 +323,25 @@ struct cgroup_rstat_cpu {
 	struct cgroup *updated_next;		/* NULL iff not on the list */
 };
 
+struct cgroup_freezer_state {
+	/* Should the cgroup and its descendants be frozen. */
+	bool freeze;
+
+	/* Should the cgroup actually be frozen? */
+	int e_freeze;
+
+	/* Fields below are protected by css_set_lock */
+
+	/* Number of frozen descendant cgroups */
+	int nr_frozen_descendants;
+
+	/*
+	 * Number of tasks, which are counted as frozen:
+	 * frozen, SIGSTOPped, and PTRACEd.
+	 */
+	int nr_frozen_tasks;
+};
+
 struct cgroup {
 	/* self css with NULL ->ss, points back to this cgroup */
 	struct cgroup_subsys_state self;
@@ -453,6 +478,9 @@ struct cgroup {
 	/* If there is block congestion on this cgroup. */
 	atomic_t congestion_count;
 
+	/* Used to store internal freezer state */
+	struct cgroup_freezer_state freezer;
+
 	/* ids of the ancestors at each level including self */
 	int ancestor_ids[];
 };
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 81f58b4a5418..3e2efd412dfa 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -881,4 +881,47 @@ static inline void put_cgroup_ns(struct cgroup_namespace *ns)
 		free_cgroup_ns(ns);
 }
 
+#ifdef CONFIG_CGROUPS
+
+void cgroup_enter_frozen(void);
+void cgroup_leave_frozen(bool always_leave);
+void cgroup_update_frozen(struct cgroup *cgrp);
+void cgroup_freeze(struct cgroup *cgrp, bool freeze);
+void cgroup_freezer_migrate_task(struct task_struct *task, struct cgroup *src,
+				 struct cgroup *dst);
+void cgroup_freezer_frozen_exit(struct task_struct *task);
+static inline bool cgroup_task_freeze(struct task_struct *task)
+{
+	bool ret;
+
+	if (task->flags & PF_KTHREAD)
+		return false;
+
+	rcu_read_lock();
+	ret = test_bit(CGRP_FREEZE, &task_dfl_cgroup(task)->flags);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static inline bool cgroup_task_frozen(struct task_struct *task)
+{
+	return task->frozen;
+}
+
+#else /* !CONFIG_CGROUPS */
+
+static inline void cgroup_enter_frozen(void) { }
+static inline void cgroup_leave_frozen(bool always_leave) { }
+static inline bool cgroup_task_freeze(struct task_struct *task)
+{
+	return false;
+}
+static inline bool cgroup_task_frozen(struct task_struct *task)
+{
+	return false;
+}
+
+#endif /* !CONFIG_CGROUPS */
+
 #endif /* _LINUX_CGROUP_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1549584a1538..45b2199bd683 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -726,6 +726,8 @@ struct task_struct {
 #ifdef CONFIG_CGROUPS
 	/* disallow userland-initiated cgroup migration */
 	unsigned			no_cgroup_migration:1;
+	/* task is frozen/stopped (used by the cgroup freezer) */
+	unsigned			frozen:1;
 #endif
 #ifdef CONFIG_BLK_CGROUP
 	/* to be used once the psi infrastructure lands upstream. */
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index 98228bd48aee..fa067de9f1a9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -18,6 +18,7 @@ struct task_struct;
 #define JOBCTL_TRAP_NOTIFY_BIT	20	/* trap for NOTIFY */
 #define JOBCTL_TRAPPING_BIT	21	/* switching to TRACED */
 #define JOBCTL_LISTENING_BIT	22	/* ptracer is listening for events */
+#define JOBCTL_TRAP_FREEZE_BIT	23	/* trap for cgroup freezer */
 
 #define JOBCTL_STOP_DEQUEUED	(1UL << JOBCTL_STOP_DEQUEUED_BIT)
 #define JOBCTL_STOP_PENDING	(1UL << JOBCTL_STOP_PENDING_BIT)
@@ -26,6 +27,7 @@ struct task_struct;
 #define JOBCTL_TRAP_NOTIFY	(1UL << JOBCTL_TRAP_NOTIFY_BIT)
 #define JOBCTL_TRAPPING		(1UL << JOBCTL_TRAPPING_BIT)
 #define JOBCTL_LISTENING	(1UL << JOBCTL_LISTENING_BIT)
+#define JOBCTL_TRAP_FREEZE	(1UL << JOBCTL_TRAP_FREEZE_BIT)
 
 #define JOBCTL_TRAP_MASK	(JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
 #define JOBCTL_PENDING_MASK	(JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile
index 8d5689ca94b9..5d7a76bfbbb7 100644
--- a/kernel/cgroup/Makefile
+++ b/kernel/cgroup/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o
+obj-y := cgroup.o rstat.o namespace.o cgroup-v1.o freezer.o
 
 obj-$(CONFIG_CGROUP_FREEZER) += legacy_freezer.o
 obj-$(CONFIG_CGROUP_PIDS) += pids.o
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 786ceef2f222..6895464b54c6 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2435,8 +2435,15 @@ static int cgroup_migrate_execute(struct cgroup_mgctx *mgctx)
 			get_css_set(to_cset);
 			to_cset->nr_tasks++;
 			css_set_move_task(task, from_cset, to_cset, true);
-			put_css_set_locked(from_cset);
 			from_cset->nr_tasks--;
+			/*
+			 * If the source or destination cgroup is frozen,
+			 * the task might require to change its state.
+			 */
+			cgroup_freezer_migrate_task(task, from_cset->dfl_cgrp,
+						    to_cset->dfl_cgrp);
+			put_css_set_locked(from_cset);
+
 		}
 	}
 	spin_unlock_irq(&css_set_lock);
@@ -3477,8 +3484,11 @@ static ssize_t cgroup_max_depth_write(struct kernfs_open_file *of,
 
 static int cgroup_events_show(struct seq_file *seq, void *v)
 {
-	seq_printf(seq, "populated %d\n",
-		   cgroup_is_populated(seq_css(seq)->cgroup));
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+
+	seq_printf(seq, "populated %d\n", cgroup_is_populated(cgrp));
+	seq_printf(seq, "frozen %d\n", test_bit(CGRP_FROZEN, &cgrp->flags));
+
 	return 0;
 }
 
@@ -3540,6 +3550,40 @@ static int cgroup_cpu_pressure_show(struct seq_file *seq, void *v)
 }
 #endif
 
+static int cgroup_freeze_show(struct seq_file *seq, void *v)
+{
+	struct cgroup *cgrp = seq_css(seq)->cgroup;
+
+	seq_printf(seq, "%d\n", cgrp->freezer.freeze);
+
+	return 0;
+}
+
+static ssize_t cgroup_freeze_write(struct kernfs_open_file *of,
+				   char *buf, size_t nbytes, loff_t off)
+{
+	struct cgroup *cgrp;
+	ssize_t ret;
+	int freeze;
+
+	ret = kstrtoint(strstrip(buf), 0, &freeze);
+	if (ret)
+		return ret;
+
+	if (freeze < 0 || freeze > 1)
+		return -ERANGE;
+
+	cgrp = cgroup_kn_lock_live(of->kn, false);
+	if (!cgrp)
+		return -ENOENT;
+
+	cgroup_freeze(cgrp, freeze);
+
+	cgroup_kn_unlock(of->kn);
+
+	return nbytes;
+}
+
 static int cgroup_file_open(struct kernfs_open_file *of)
 {
 	struct cftype *cft = of->kn->priv;
@@ -4683,6 +4727,12 @@ static struct cftype cgroup_base_files[] = {
 		.name = "cgroup.stat",
 		.seq_show = cgroup_stat_show,
 	},
+	{
+		.name = "cgroup.freeze",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = cgroup_freeze_show,
+		.write = cgroup_freeze_write,
+	},
 	{
 		.name = "cpu.stat",
 		.flags = CFTYPE_NOT_ON_ROOT,
@@ -5033,12 +5083,29 @@ static struct cgroup *cgroup_create(struct cgroup *parent)
 	if (ret)
 		goto out_psi_free;
 
+	/*
+	 * New cgroup inherits effective freeze counter, and
+	 * if the parent has to be frozen, the child has too.
+	 */
+	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
+	if (cgrp->freezer.e_freeze)
+		set_bit(CGRP_FROZEN, &cgrp->flags);
+
 	spin_lock_irq(&css_set_lock);
 	for (tcgrp = cgrp; tcgrp; tcgrp = cgroup_parent(tcgrp)) {
 		cgrp->ancestor_ids[tcgrp->level] = tcgrp->id;
 
-		if (tcgrp != cgrp)
+		if (tcgrp != cgrp) {
 			tcgrp->nr_descendants++;
+
+			/*
+			 * If the new cgroup is frozen, all ancestor cgroups
+			 * get a new frozen descendant, but their state can't
+			 * change because of this.
+			 */
+			if (cgrp->freezer.e_freeze)
+				tcgrp->freezer.nr_frozen_descendants++;
+		}
 	}
 	spin_unlock_irq(&css_set_lock);
 
@@ -5329,6 +5396,12 @@ static int cgroup_destroy_locked(struct cgroup *cgrp)
 	for (tcgrp = cgroup_parent(cgrp); tcgrp; tcgrp = cgroup_parent(tcgrp)) {
 		tcgrp->nr_descendants--;
 		tcgrp->nr_dying_descendants++;
+		/*
+		 * If the dying cgroup is frozen, decrease frozen descendants
+		 * counters of ancestor cgroups.
+		 */
+		if (test_bit(CGRP_FROZEN, &cgrp->flags))
+			tcgrp->freezer.nr_frozen_descendants--;
 	}
 	spin_unlock_irq(&css_set_lock);
 
@@ -5782,6 +5855,29 @@ void cgroup_post_fork(struct task_struct *child)
 			cset->nr_tasks++;
 			css_set_move_task(child, NULL, cset, false);
 		}
+
+		/*
+		 * If the cgroup has to be frozen, the new task has too.
+		 * Let's set the JOBCTL_TRAP_FREEZE jobctl bit to get
+		 * the task into the frozen state.
+		 */
+		if (unlikely(cgroup_task_freeze(child))) {
+			struct cgroup *cgrp;
+
+			spin_lock(&child->sighand->siglock);
+			WARN_ON_ONCE(child->frozen);
+			cgrp = cset->dfl_cgrp;
+			child->jobctl |= JOBCTL_TRAP_FREEZE;
+			spin_unlock(&child->sighand->siglock);
+
+			/*
+			 * Calling cgroup_update_frozen() isn't required here,
+			 * because it will be called anyway a bit later
+			 * from do_freezer_trap(). So we avoid cgroup's
+			 * transient switch from the frozen state and back.
+			 */
+		}
+
 		spin_unlock_irq(&css_set_lock);
 	}
 
@@ -5830,6 +5926,12 @@ void cgroup_exit(struct task_struct *tsk)
 		spin_lock_irq(&css_set_lock);
 		css_set_move_task(tsk, cset, NULL, false);
 		cset->nr_tasks--;
+
+		if (unlikely(cgroup_task_frozen(tsk)))
+			cgroup_freezer_frozen_exit(tsk);
+		else if (unlikely(cgroup_task_freeze(tsk)))
+			cgroup_update_frozen(task_dfl_cgroup(tsk));
+
 		spin_unlock_irq(&css_set_lock);
 	} else {
 		get_css_set(cset);
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
new file mode 100644
index 000000000000..9d8cda478fc9
--- /dev/null
+++ b/kernel/cgroup/freezer.c
@@ -0,0 +1,317 @@
+//SPDX-License-Identifier: GPL-2.0
+#include <linux/cgroup.h>
+#include <linux/sched.h>
+#include <linux/sched/task.h>
+#include <linux/sched/signal.h>
+
+#include "cgroup-internal.h"
+
+/*
+ * Propagate the cgroup frozen state upwards by the cgroup tree.
+ */
+static void cgroup_propagate_frozen(struct cgroup *cgrp, bool frozen)
+{
+	int desc = 1;
+
+	/*
+	 * If the new state is frozen, some freezing ancestor cgroups may change
+	 * their state too, depending on if all their descendants are frozen.
+	 *
+	 * Otherwise, all ancestor cgroups are forced into the non-frozen state.
+	 */
+	while ((cgrp = cgroup_parent(cgrp))) {
+		if (frozen) {
+			cgrp->freezer.nr_frozen_descendants += desc;
+			if (!test_bit(CGRP_FROZEN, &cgrp->flags) &&
+			    test_bit(CGRP_FREEZE, &cgrp->flags) &&
+			    cgrp->freezer.nr_frozen_descendants ==
+			    cgrp->nr_descendants) {
+				set_bit(CGRP_FROZEN, &cgrp->flags);
+				cgroup_file_notify(&cgrp->events_file);
+				desc++;
+			}
+		} else {
+			cgrp->freezer.nr_frozen_descendants -= desc;
+			if (test_bit(CGRP_FROZEN, &cgrp->flags)) {
+				clear_bit(CGRP_FROZEN, &cgrp->flags);
+				cgroup_file_notify(&cgrp->events_file);
+				desc++;
+			}
+		}
+	}
+}
+
+/*
+ * Revisit the cgroup frozen state.
+ * Checks if the cgroup is really frozen and perform all state transitions.
+ */
+void cgroup_update_frozen(struct cgroup *cgrp)
+{
+	bool frozen;
+
+	lockdep_assert_held(&css_set_lock);
+
+	/*
+	 * If the cgroup has to be frozen (CGRP_FREEZE bit set),
+	 * and all tasks are frozen and/or stopped, let's consider
+	 * the cgroup frozen. Otherwise it's not frozen.
+	 */
+	frozen = test_bit(CGRP_FREEZE, &cgrp->flags) &&
+		cgrp->freezer.nr_frozen_tasks == __cgroup_task_count(cgrp);
+
+	if (frozen) {
+		/* Already there? */
+		if (test_bit(CGRP_FROZEN, &cgrp->flags))
+			return;
+
+		set_bit(CGRP_FROZEN, &cgrp->flags);
+	} else {
+		/* Already there? */
+		if (!test_bit(CGRP_FROZEN, &cgrp->flags))
+			return;
+
+		clear_bit(CGRP_FROZEN, &cgrp->flags);
+	}
+	cgroup_file_notify(&cgrp->events_file);
+
+	/* Update the state of ancestor cgroups. */
+	cgroup_propagate_frozen(cgrp, frozen);
+}
+
+/*
+ * Increment cgroup's nr_frozen_tasks.
+ */
+static void cgroup_inc_frozen_cnt(struct cgroup *cgrp)
+{
+	cgrp->freezer.nr_frozen_tasks++;
+}
+
+/*
+ * Decrement cgroup's nr_frozen_tasks.
+ */
+static void cgroup_dec_frozen_cnt(struct cgroup *cgrp)
+{
+	cgrp->freezer.nr_frozen_tasks--;
+	WARN_ON_ONCE(cgrp->freezer.nr_frozen_tasks < 0);
+}
+
+/*
+ * Enter frozen/stopped state, if not yet there. Update cgroup's counters,
+ * and revisit the state of the cgroup, if necessary.
+ */
+void cgroup_enter_frozen(void)
+{
+	struct cgroup *cgrp;
+
+	if (current->frozen)
+		return;
+
+	spin_lock_irq(&css_set_lock);
+	current->frozen = true;
+	cgrp = task_dfl_cgroup(current);
+	cgroup_inc_frozen_cnt(cgrp);
+	cgroup_update_frozen(cgrp);
+	spin_unlock_irq(&css_set_lock);
+}
+
+/*
+ * Conditionally leave frozen/stopped state. Update cgroup's counters,
+ * and revisit the state of the cgroup, if necessary.
+ *
+ * If always_leave is not set, and the cgroup is freezing,
+ * we're racing with the cgroup freezing. In this case, we don't
+ * drop the frozen counter to avoid a transient switch to
+ * the unfrozen state.
+ */
+void cgroup_leave_frozen(bool always_leave)
+{
+	struct cgroup *cgrp;
+
+	spin_lock_irq(&css_set_lock);
+	cgrp = task_dfl_cgroup(current);
+	if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
+		cgroup_dec_frozen_cnt(cgrp);
+		cgroup_update_frozen(cgrp);
+		WARN_ON_ONCE(!current->frozen);
+		current->frozen = false;
+	}
+	spin_unlock_irq(&css_set_lock);
+
+	if (unlikely(current->frozen)) {
+		/*
+		 * If the task remained in the frozen state,
+		 * make sure it won't reach userspace without
+		 * entering the signal handling loop.
+		 */
+		spin_lock_irq(&current->sighand->siglock);
+		recalc_sigpending();
+		spin_unlock_irq(&current->sighand->siglock);
+	}
+}
+
+/*
+ * Freeze or unfreeze the task by setting or clearing the JOBCTL_TRAP_FREEZE
+ * jobctl bit.
+ */
+static void cgroup_freeze_task(struct task_struct *task, bool freeze)
+{
+	unsigned long flags;
+
+	/* If the task is about to die, don't bother with freezing it. */
+	if (!lock_task_sighand(task, &flags))
+		return;
+
+	if (freeze) {
+		task->jobctl |= JOBCTL_TRAP_FREEZE;
+		signal_wake_up(task, false);
+	} else {
+		task->jobctl &= ~JOBCTL_TRAP_FREEZE;
+		wake_up_process(task);
+	}
+
+	unlock_task_sighand(task, &flags);
+}
+
+/*
+ * Freeze or unfreeze all tasks in the given cgroup.
+ */
+static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
+{
+	struct css_task_iter it;
+	struct task_struct *task;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	spin_lock_irq(&css_set_lock);
+	if (freeze)
+		set_bit(CGRP_FREEZE, &cgrp->flags);
+	else
+		clear_bit(CGRP_FREEZE, &cgrp->flags);
+	spin_unlock_irq(&css_set_lock);
+
+	css_task_iter_start(&cgrp->self, 0, &it);
+	while ((task = css_task_iter_next(&it))) {
+		/*
+		 * Ignore kernel threads here. Freezing cgroups containing
+		 * kthreads isn't supported.
+		 */
+		if (task->flags & PF_KTHREAD)
+			continue;
+		cgroup_freeze_task(task, freeze);
+	}
+	css_task_iter_end(&it);
+
+	/*
+	 * Cgroup state should be revisited here to cover empty leaf cgroups
+	 * and cgroups which descendants are already in the desired state.
+	 */
+	spin_lock_irq(&css_set_lock);
+	if (cgrp->nr_descendants == cgrp->freezer.nr_frozen_descendants)
+		cgroup_update_frozen(cgrp);
+	spin_unlock_irq(&css_set_lock);
+}
+
+/*
+ * Adjust the task state (freeze or unfreeze) and revisit the state of
+ * source and destination cgroups.
+ */
+void cgroup_freezer_migrate_task(struct task_struct *task,
+				 struct cgroup *src, struct cgroup *dst)
+{
+	lockdep_assert_held(&css_set_lock);
+
+	/*
+	 * Kernel threads are not supposed to be frozen at all.
+	 */
+	if (task->flags & PF_KTHREAD)
+		return;
+
+	/*
+	 * Adjust counters of freezing and frozen tasks.
+	 * Note, that if the task is frozen, but the destination cgroup is not
+	 * frozen, we bump both counters to keep them balanced.
+	 */
+	if (task->frozen) {
+		cgroup_inc_frozen_cnt(dst);
+		cgroup_dec_frozen_cnt(src);
+	}
+	cgroup_update_frozen(dst);
+	cgroup_update_frozen(src);
+
+	/*
+	 * Force the task to the desired state.
+	 */
+	cgroup_freeze_task(task, test_bit(CGRP_FREEZE, &dst->flags));
+}
+
+void cgroup_freezer_frozen_exit(struct task_struct *task)
+{
+	struct cgroup *cgrp = task_dfl_cgroup(task);
+
+	lockdep_assert_held(&css_set_lock);
+
+	cgroup_dec_frozen_cnt(cgrp);
+	cgroup_update_frozen(cgrp);
+}
+
+void cgroup_freeze(struct cgroup *cgrp, bool freeze)
+{
+	struct cgroup_subsys_state *css;
+	struct cgroup *dsct;
+	bool applied = false;
+
+	lockdep_assert_held(&cgroup_mutex);
+
+	/*
+	 * Nothing changed? Just exit.
+	 */
+	if (cgrp->freezer.freeze == freeze)
+		return;
+
+	cgrp->freezer.freeze = freeze;
+
+	/*
+	 * Propagate changes downwards the cgroup tree.
+	 */
+	css_for_each_descendant_pre(css, &cgrp->self) {
+		dsct = css->cgroup;
+
+		if (cgroup_is_dead(dsct))
+			continue;
+
+		if (freeze) {
+			dsct->freezer.e_freeze++;
+			/*
+			 * Already frozen because of ancestor's settings?
+			 */
+			if (dsct->freezer.e_freeze > 1)
+				continue;
+		} else {
+			dsct->freezer.e_freeze--;
+			/*
+			 * Still frozen because of ancestor's settings?
+			 */
+			if (dsct->freezer.e_freeze > 0)
+				continue;
+
+			WARN_ON_ONCE(dsct->freezer.e_freeze < 0);
+		}
+
+		/*
+		 * Do change actual state: freeze or unfreeze.
+		 */
+		cgroup_do_freeze(dsct, freeze);
+		applied = true;
+	}
+
+	/*
+	 * Even if the actual state hasn't changed, let's notify a user.
+	 * The state can be enforced by an ancestor cgroup: the cgroup
+	 * can already be in the desired state or it can be locked in the
+	 * opposite state, so that the transition will never happen.
+	 * In both cases it's better to notify a user, that there is
+	 * nothing to wait for.
+	 */
+	if (!applied)
+		cgroup_file_notify(&cgrp->events_file);
+}
diff --git a/kernel/fork.c b/kernel/fork.c
index 9dcd18aa210b..8097a0cce4db 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1222,7 +1222,9 @@ static int wait_for_vfork_done(struct task_struct *child,
 	int killed;
 
 	freezer_do_not_count();
+	cgroup_enter_frozen();
 	killed = wait_for_completion_killable(vfork);
+	cgroup_leave_frozen(false);
 	freezer_count();
 
 	if (killed) {
diff --git a/kernel/signal.c b/kernel/signal.c
index f98448cf2def..095e0fc57b25 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -43,6 +43,7 @@
 #include <linux/compiler.h>
 #include <linux/posix-timers.h>
 #include <linux/livepatch.h>
+#include <linux/cgroup.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/signal.h>
@@ -146,9 +147,10 @@ static inline bool has_pending_signals(sigset_t *signal, sigset_t *blocked)
 
 static bool recalc_sigpending_tsk(struct task_struct *t)
 {
-	if ((t->jobctl & JOBCTL_PENDING_MASK) ||
+	if ((t->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) ||
 	    PENDING(&t->pending, &t->blocked) ||
-	    PENDING(&t->signal->shared_pending, &t->blocked)) {
+	    PENDING(&t->signal->shared_pending, &t->blocked) ||
+	    cgroup_task_frozen(t)) {
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
 		return true;
 	}
@@ -2108,6 +2110,7 @@ static void ptrace_stop(int exit_code, int why, int clear_code, kernel_siginfo_t
 		preempt_disable();
 		read_unlock(&tasklist_lock);
 		preempt_enable_no_resched();
+		cgroup_enter_frozen();
 		freezable_schedule();
 	} else {
 		/*
@@ -2286,6 +2289,7 @@ static bool do_signal_stop(int signr)
 		}
 
 		/* Now we don't run again until woken by SIGCONT or SIGKILL */
+		cgroup_enter_frozen();
 		freezable_schedule();
 		return true;
 	} else {
@@ -2332,6 +2336,43 @@ static void do_jobctl_trap(void)
 	}
 }
 
+/**
+ * do_freezer_trap - handle the freezer jobctl trap
+ *
+ * Puts the task into frozen state, if only the task is not about to quit.
+ * In this case it drops JOBCTL_TRAP_FREEZE.
+ *
+ * CONTEXT:
+ * Must be called with @current->sighand->siglock held,
+ * which is always released before returning.
+ */
+static void do_freezer_trap(void)
+	__releases(&current->sighand->siglock)
+{
+	/*
+	 * If there are other trap bits pending except JOBCTL_TRAP_FREEZE,
+	 * let's make another loop to give it a chance to be handled.
+	 * In any case, we'll return back.
+	 */
+	if ((current->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) !=
+	     JOBCTL_TRAP_FREEZE) {
+		spin_unlock_irq(&current->sighand->siglock);
+		return;
+	}
+
+	/*
+	 * Now we're sure that there is no pending fatal signal and no
+	 * pending traps. Clear TIF_SIGPENDING to not get out of schedule()
+	 * immediately (if there is a non-fatal signal pending), and
+	 * put the task into sleep.
+	 */
+	__set_current_state(TASK_INTERRUPTIBLE);
+	clear_thread_flag(TIF_SIGPENDING);
+	spin_unlock_irq(&current->sighand->siglock);
+	cgroup_enter_frozen();
+	freezable_schedule();
+}
+
 static int ptrace_signal(int signr, kernel_siginfo_t *info)
 {
 	/*
@@ -2442,6 +2483,10 @@ bool get_signal(struct ksignal *ksig)
 		ksig->info.si_signo = signr = SIGKILL;
 		sigdelset(&current->pending.signal, SIGKILL);
 		recalc_sigpending();
+		current->jobctl &= ~JOBCTL_TRAP_FREEZE;
+		spin_unlock_irq(&sighand->siglock);
+		if (unlikely(cgroup_task_frozen(current)))
+			cgroup_leave_frozen(true);
 		goto fatal;
 	}
 
@@ -2452,9 +2497,24 @@ bool get_signal(struct ksignal *ksig)
 		    do_signal_stop(0))
 			goto relock;
 
-		if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) {
-			do_jobctl_trap();
+		if (unlikely(current->jobctl &
+			     (JOBCTL_TRAP_MASK | JOBCTL_TRAP_FREEZE))) {
+			if (current->jobctl & JOBCTL_TRAP_MASK) {
+				do_jobctl_trap();
+				spin_unlock_irq(&sighand->siglock);
+			} else if (current->jobctl & JOBCTL_TRAP_FREEZE)
+				do_freezer_trap();
+
+			goto relock;
+		}
+
+		/*
+		 * If the task is leaving the frozen state, let's update
+		 * cgroup counters and reset the frozen bit.
+		 */
+		if (unlikely(cgroup_task_frozen(current))) {
 			spin_unlock_irq(&sighand->siglock);
+			cgroup_leave_frozen(true);
 			goto relock;
 		}
 
@@ -2548,8 +2608,8 @@ bool get_signal(struct ksignal *ksig)
 			continue;
 		}
 
-	fatal:
 		spin_unlock_irq(&sighand->siglock);
+	fatal:
 
 		/*
 		 * Anything else is fatal, maybe with a core dump.
-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 5/9] kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy()
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
                   ` (3 preceding siblings ...)
  2019-04-05 17:47 ` [PATCH v10 4/9] cgroup: cgroup v2 freezer Roman Gushchin
@ 2019-04-05 17:47 ` Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests Roman Gushchin
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin, Shuah Khan,
	linux-kselftest

If the cgroup destruction races with an exit() of a belonging
process(es), cg_kill_all() may fail. It's not a good reason to make
cg_destroy() fail and leave the cgroup in place, potentially causing
next test runs to fail.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
Cc: linux-kselftest@vger.kernel.org
---
 tools/testing/selftests/cgroup/cgroup_util.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c
index 14c9fe284806..eba06f94433b 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -227,9 +227,7 @@ int cg_destroy(const char *cgroup)
 retry:
 	ret = rmdir(cgroup);
 	if (ret && errno == EBUSY) {
-		ret = cg_killall(cgroup);
-		if (ret)
-			return ret;
+		cg_killall(cgroup);
 		usleep(100);
 		goto retry;
 	}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
                   ` (4 preceding siblings ...)
  2019-04-05 17:47 ` [PATCH v10 5/9] kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy() Roman Gushchin
@ 2019-04-05 17:47 ` Roman Gushchin
  2019-07-16 14:48   ` Naresh Kamboju
  2019-04-05 17:47 ` [PATCH v10 7/9] cgroup: make TRACE_CGROUP_PATH irq-safe Roman Gushchin
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin, Shuah Khan,
	linux-kselftest

This patch implements 9 tests for the freezer controller for
cgroup v2:
1) a simple test, which aims to freeze and unfreeze a cgroup with 100
processes
2) a more complicated tree test, which creates a hierarchy of cgroups,
puts some processes in some cgroups, and tries to freeze and unfreeze
different parts of the subtree
3) a forkbomb test: the test aims to freeze a forkbomb running in a
cgroup, kill all tasks in the cgroup and remove the cgroup without
the unfreezing.
4) rmdir test: the test creates two nested cgroups, freezes the parent
one, checks that the child can be successfully removed, and a new
child can be created
5) migration tests: the test checks migration of a task between
frozen cgroups: from a frozen to a running, from a running to a
frozen, and from a frozen to a frozen.
6) ptrace test: the test checks that it's possible to attach to
a process in a frozen cgroup, get some information and detach, and
the cgroup will remain frozen.
7) stopped test: the test checks that it's possible to freeze a cgroup
with a stopped task
8) ptraced test: the test checks that it's possible to freeze a cgroup
with a ptraced task
9) vfork test: the test checks that it's possible to freeze a cgroup
with a parent process waiting for the child process in vfork()

Expected output:
  $ ./test_freezer
  ok 1 test_cgfreezer_simple
  ok 2 test_cgfreezer_tree
  ok 3 test_cgfreezer_forkbomb
  ok 4 test_cgrreezer_rmdir
  ok 5 test_cgfreezer_migrate
  ok 6 test_cgfreezer_ptrace
  ok 7 test_cgfreezer_stopped
  ok 8 test_cgfreezer_ptraced
  ok 9 test_cgfreezer_vfork

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
Cc: linux-kselftest@vger.kernel.org
---
 tools/testing/selftests/cgroup/.gitignore     |   1 +
 tools/testing/selftests/cgroup/Makefile       |   2 +
 tools/testing/selftests/cgroup/cgroup_util.c  |  54 +-
 tools/testing/selftests/cgroup/cgroup_util.h  |   5 +
 tools/testing/selftests/cgroup/test_freezer.c | 851 ++++++++++++++++++
 5 files changed, 912 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/cgroup/test_freezer.c

diff --git a/tools/testing/selftests/cgroup/.gitignore b/tools/testing/selftests/cgroup/.gitignore
index adacda50a4b2..7f9835624793 100644
--- a/tools/testing/selftests/cgroup/.gitignore
+++ b/tools/testing/selftests/cgroup/.gitignore
@@ -1,2 +1,3 @@
 test_memcontrol
 test_core
+test_freezer
diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
index 23fbaa4a9630..8d369b6a2069 100644
--- a/tools/testing/selftests/cgroup/Makefile
+++ b/tools/testing/selftests/cgroup/Makefile
@@ -5,8 +5,10 @@ all:
 
 TEST_GEN_PROGS = test_memcontrol
 TEST_GEN_PROGS += test_core
+TEST_GEN_PROGS += test_freezer
 
 include ../lib.mk
 
 $(OUTPUT)/test_memcontrol: cgroup_util.c
 $(OUTPUT)/test_core: cgroup_util.c
+$(OUTPUT)/test_freezer: cgroup_util.c
diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c
index eba06f94433b..4c223266299a 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -74,6 +74,16 @@ char *cg_name_indexed(const char *root, const char *name, int index)
 	return ret;
 }
 
+char *cg_control(const char *cgroup, const char *control)
+{
+	size_t len = strlen(cgroup) + strlen(control) + 2;
+	char *ret = malloc(len);
+
+	snprintf(ret, len, "%s/%s", cgroup, control);
+
+	return ret;
+}
+
 int cg_read(const char *cgroup, const char *control, char *buf, size_t len)
 {
 	char path[PATH_MAX];
@@ -196,7 +206,32 @@ int cg_create(const char *cgroup)
 	return mkdir(cgroup, 0644);
 }
 
-static int cg_killall(const char *cgroup)
+int cg_wait_for_proc_count(const char *cgroup, int count)
+{
+	char buf[10 * PAGE_SIZE] = {0};
+	int attempts;
+	char *ptr;
+
+	for (attempts = 10; attempts >= 0; attempts--) {
+		int nr = 0;
+
+		if (cg_read(cgroup, "cgroup.procs", buf, sizeof(buf)))
+			break;
+
+		for (ptr = buf; *ptr; ptr++)
+			if (*ptr == '\n')
+				nr++;
+
+		if (nr >= count)
+			return 0;
+
+		usleep(100000);
+	}
+
+	return -1;
+}
+
+int cg_killall(const char *cgroup)
 {
 	char buf[PAGE_SIZE];
 	char *ptr = buf;
@@ -238,6 +273,14 @@ int cg_destroy(const char *cgroup)
 	return ret;
 }
 
+int cg_enter(const char *cgroup, int pid)
+{
+	char pidbuf[64];
+
+	snprintf(pidbuf, sizeof(pidbuf), "%d", pid);
+	return cg_write(cgroup, "cgroup.procs", pidbuf);
+}
+
 int cg_enter_current(const char *cgroup)
 {
 	char pidbuf[64];
@@ -367,3 +410,12 @@ int set_oom_adj_score(int pid, int score)
 	close(fd);
 	return 0;
 }
+
+char proc_read_text(int pid, const char *item, char *buf, size_t size)
+{
+	char path[PATH_MAX];
+
+	snprintf(path, sizeof(path), "/proc/%d/%s", pid, item);
+
+	return read_text(path, buf, size);
+}
diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h
index 9ac8b7958f83..c72f28046bfa 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.h
+++ b/tools/testing/selftests/cgroup/cgroup_util.h
@@ -18,6 +18,7 @@ static inline int values_close(long a, long b, int err)
 extern int cg_find_unified_root(char *root, size_t len);
 extern char *cg_name(const char *root, const char *name);
 extern char *cg_name_indexed(const char *root, const char *name, int index);
+extern char *cg_control(const char *cgroup, const char *control);
 extern int cg_create(const char *cgroup);
 extern int cg_destroy(const char *cgroup);
 extern int cg_read(const char *cgroup, const char *control,
@@ -32,6 +33,7 @@ extern int cg_write(const char *cgroup, const char *control, char *buf);
 extern int cg_run(const char *cgroup,
 		  int (*fn)(const char *cgroup, void *arg),
 		  void *arg);
+extern int cg_enter(const char *cgroup, int pid);
 extern int cg_enter_current(const char *cgroup);
 extern int cg_run_nowait(const char *cgroup,
 			 int (*fn)(const char *cgroup, void *arg),
@@ -41,3 +43,6 @@ extern int alloc_pagecache(int fd, size_t size);
 extern int alloc_anon(const char *cgroup, void *arg);
 extern int is_swap_enabled(void);
 extern int set_oom_adj_score(int pid, int score);
+extern int cg_wait_for_proc_count(const char *cgroup, int count);
+extern int cg_killall(const char *cgroup);
+extern char proc_read_text(int pid, const char *item, char *buf, size_t size);
diff --git a/tools/testing/selftests/cgroup/test_freezer.c b/tools/testing/selftests/cgroup/test_freezer.c
new file mode 100644
index 000000000000..2bfddb6d6d3b
--- /dev/null
+++ b/tools/testing/selftests/cgroup/test_freezer.c
@@ -0,0 +1,851 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <stdbool.h>
+#include <linux/limits.h>
+#include <sys/ptrace.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <unistd.h>
+#include <stdio.h>
+#include <errno.h>
+#include <poll.h>
+#include <stdlib.h>
+#include <sys/inotify.h>
+#include <string.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "../kselftest.h"
+#include "cgroup_util.h"
+
+#define DEBUG
+#ifdef DEBUG
+#define debug(args...) fprintf(stderr, args)
+#else
+#define debug(args...)
+#endif
+
+/*
+ * Check if the cgroup is frozen by looking at the cgroup.events::frozen value.
+ */
+static int cg_check_frozen(const char *cgroup, bool frozen)
+{
+	if (frozen) {
+		if (cg_read_strstr(cgroup, "cgroup.events", "frozen 1") != 0) {
+			debug("Cgroup %s isn't frozen\n", cgroup);
+			return -1;
+		}
+	} else {
+		/*
+		 * Check the cgroup.events::frozen value.
+		 */
+		if (cg_read_strstr(cgroup, "cgroup.events", "frozen 0") != 0) {
+			debug("Cgroup %s is frozen\n", cgroup);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * Freeze the given cgroup.
+ */
+static int cg_freeze_nowait(const char *cgroup, bool freeze)
+{
+	return cg_write(cgroup, "cgroup.freeze", freeze ? "1" : "0");
+}
+
+/*
+ * Prepare for waiting on cgroup.events file.
+ */
+static int cg_prepare_for_wait(const char *cgroup)
+{
+	int fd, ret = -1;
+
+	fd = inotify_init1(0);
+	if (fd == -1) {
+		debug("Error: inotify_init1() failed\n");
+		return fd;
+	}
+
+	ret = inotify_add_watch(fd, cg_control(cgroup, "cgroup.events"),
+				IN_MODIFY);
+	if (ret == -1) {
+		debug("Error: inotify_add_watch() failed\n");
+		close(fd);
+	}
+
+	return fd;
+}
+
+/*
+ * Wait for an event. If there are no events for 10 seconds,
+ * treat this an error.
+ */
+static int cg_wait_for(int fd)
+{
+	int ret = -1;
+	struct pollfd fds = {
+		.fd = fd,
+		.events = POLLIN,
+	};
+
+	while (true) {
+		ret = poll(&fds, 1, 10000);
+
+		if (ret == -1) {
+			if (errno == EINTR)
+				continue;
+			debug("Error: poll() failed\n");
+			break;
+		}
+
+		if (ret > 0 && fds.revents & POLLIN) {
+			ret = 0;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * Attach a task to the given cgroup and wait for a cgroup frozen event.
+ * All transient events (e.g. populated) are ignored.
+ */
+static int cg_enter_and_wait_for_frozen(const char *cgroup, int pid,
+					bool frozen)
+{
+	int fd, ret = -1;
+	int attempts;
+
+	fd = cg_prepare_for_wait(cgroup);
+	if (fd < 0)
+		return fd;
+
+	ret = cg_enter(cgroup, pid);
+	if (ret)
+		goto out;
+
+	for (attempts = 0; attempts < 10; attempts++) {
+		ret = cg_wait_for(fd);
+		if (ret)
+			break;
+
+		ret = cg_check_frozen(cgroup, frozen);
+		if (ret)
+			continue;
+	}
+
+out:
+	close(fd);
+	return ret;
+}
+
+/*
+ * Freeze the given cgroup and wait for the inotify signal.
+ * If there are no events in 10 seconds, treat this as an error.
+ * Then check that the cgroup is in the desired state.
+ */
+static int cg_freeze_wait(const char *cgroup, bool freeze)
+{
+	int fd, ret = -1;
+
+	fd = cg_prepare_for_wait(cgroup);
+	if (fd < 0)
+		return fd;
+
+	ret = cg_freeze_nowait(cgroup, freeze);
+	if (ret) {
+		debug("Error: cg_freeze_nowait() failed\n");
+		goto out;
+	}
+
+	ret = cg_wait_for(fd);
+	if (ret)
+		goto out;
+
+	ret = cg_check_frozen(cgroup, freeze);
+out:
+	close(fd);
+	return ret;
+}
+
+/*
+ * A simple process running in a sleep loop until being
+ * re-parented.
+ */
+static int child_fn(const char *cgroup, void *arg)
+{
+	int ppid = getppid();
+
+	while (getppid() == ppid)
+		usleep(1000);
+
+	return getppid() == ppid;
+}
+
+/*
+ * A simple test for the cgroup freezer: populated the cgroup with 100
+ * running processes and freeze it. Then unfreeze it. Then it kills all
+ * processes and destroys the cgroup.
+ */
+static int test_cgfreezer_simple(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *cgroup = NULL;
+	int i;
+
+	cgroup = cg_name(root, "cg_test_simple");
+	if (!cgroup)
+		goto cleanup;
+
+	if (cg_create(cgroup))
+		goto cleanup;
+
+	for (i = 0; i < 100; i++)
+		cg_run_nowait(cgroup, child_fn, NULL);
+
+	if (cg_wait_for_proc_count(cgroup, 100))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup, false))
+		goto cleanup;
+
+	if (cg_freeze_wait(cgroup, true))
+		goto cleanup;
+
+	if (cg_freeze_wait(cgroup, false))
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup)
+		cg_destroy(cgroup);
+	free(cgroup);
+	return ret;
+}
+
+/*
+ * The test creates the following hierarchy:
+ *       A
+ *    / / \ \
+ *   B  E  I K
+ *  /\  |
+ * C  D F
+ *      |
+ *      G
+ *      |
+ *      H
+ *
+ * with a process in C, H and 3 processes in K.
+ * Then it tries to freeze and unfreeze the whole tree.
+ */
+static int test_cgfreezer_tree(const char *root)
+{
+	char *cgroup[10] = {0};
+	int ret = KSFT_FAIL;
+	int i;
+
+	cgroup[0] = cg_name(root, "cg_test_tree_A");
+	if (!cgroup[0])
+		goto cleanup;
+
+	cgroup[1] = cg_name(cgroup[0], "B");
+	if (!cgroup[1])
+		goto cleanup;
+
+	cgroup[2] = cg_name(cgroup[1], "C");
+	if (!cgroup[2])
+		goto cleanup;
+
+	cgroup[3] = cg_name(cgroup[1], "D");
+	if (!cgroup[3])
+		goto cleanup;
+
+	cgroup[4] = cg_name(cgroup[0], "E");
+	if (!cgroup[4])
+		goto cleanup;
+
+	cgroup[5] = cg_name(cgroup[4], "F");
+	if (!cgroup[5])
+		goto cleanup;
+
+	cgroup[6] = cg_name(cgroup[5], "G");
+	if (!cgroup[6])
+		goto cleanup;
+
+	cgroup[7] = cg_name(cgroup[6], "H");
+	if (!cgroup[7])
+		goto cleanup;
+
+	cgroup[8] = cg_name(cgroup[0], "I");
+	if (!cgroup[8])
+		goto cleanup;
+
+	cgroup[9] = cg_name(cgroup[0], "K");
+	if (!cgroup[9])
+		goto cleanup;
+
+	for (i = 0; i < 10; i++)
+		if (cg_create(cgroup[i]))
+			goto cleanup;
+
+	cg_run_nowait(cgroup[2], child_fn, NULL);
+	cg_run_nowait(cgroup[7], child_fn, NULL);
+	cg_run_nowait(cgroup[9], child_fn, NULL);
+	cg_run_nowait(cgroup[9], child_fn, NULL);
+	cg_run_nowait(cgroup[9], child_fn, NULL);
+
+	/*
+	 * Wait until all child processes will enter
+	 * corresponding cgroups.
+	 */
+
+	if (cg_wait_for_proc_count(cgroup[2], 1) ||
+	    cg_wait_for_proc_count(cgroup[7], 1) ||
+	    cg_wait_for_proc_count(cgroup[9], 3))
+		goto cleanup;
+
+	/*
+	 * Freeze B.
+	 */
+	if (cg_freeze_wait(cgroup[1], true))
+		goto cleanup;
+
+	/*
+	 * Freeze F.
+	 */
+	if (cg_freeze_wait(cgroup[5], true))
+		goto cleanup;
+
+	/*
+	 * Freeze G.
+	 */
+	if (cg_freeze_wait(cgroup[6], true))
+		goto cleanup;
+
+	/*
+	 * Check that A and E are not frozen.
+	 */
+	if (cg_check_frozen(cgroup[0], false))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup[4], false))
+		goto cleanup;
+
+	/*
+	 * Freeze A. Check that A, B and E are frozen.
+	 */
+	if (cg_freeze_wait(cgroup[0], true))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup[1], true))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup[4], true))
+		goto cleanup;
+
+	/*
+	 * Unfreeze B, F and G
+	 */
+	if (cg_freeze_nowait(cgroup[1], false))
+		goto cleanup;
+
+	if (cg_freeze_nowait(cgroup[5], false))
+		goto cleanup;
+
+	if (cg_freeze_nowait(cgroup[6], false))
+		goto cleanup;
+
+	/*
+	 * Check that C and H are still frozen.
+	 */
+	if (cg_check_frozen(cgroup[2], true))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup[7], true))
+		goto cleanup;
+
+	/*
+	 * Unfreeze A. Check that A, C and K are not frozen.
+	 */
+	if (cg_freeze_wait(cgroup[0], false))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup[2], false))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup[9], false))
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	for (i = 9; i >= 0 && cgroup[i]; i--) {
+		cg_destroy(cgroup[i]);
+		free(cgroup[i]);
+	}
+
+	return ret;
+}
+
+/*
+ * A fork bomb emulator.
+ */
+static int forkbomb_fn(const char *cgroup, void *arg)
+{
+	int ppid;
+
+	fork();
+	fork();
+
+	ppid = getppid();
+
+	while (getppid() == ppid)
+		usleep(1000);
+
+	return getppid() == ppid;
+}
+
+/*
+ * The test runs a fork bomb in a cgroup and tries to freeze it.
+ * Then it kills all processes and checks that cgroup isn't populated
+ * anymore.
+ */
+static int test_cgfreezer_forkbomb(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *cgroup = NULL;
+
+	cgroup = cg_name(root, "cg_forkbomb_test");
+	if (!cgroup)
+		goto cleanup;
+
+	if (cg_create(cgroup))
+		goto cleanup;
+
+	cg_run_nowait(cgroup, forkbomb_fn, NULL);
+
+	usleep(100000);
+
+	if (cg_freeze_wait(cgroup, true))
+		goto cleanup;
+
+	if (cg_killall(cgroup))
+		goto cleanup;
+
+	if (cg_wait_for_proc_count(cgroup, 0))
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup)
+		cg_destroy(cgroup);
+	free(cgroup);
+	return ret;
+}
+
+/*
+ * The test creates two nested cgroups, freezes the parent
+ * and removes the child. Then it checks that the parent cgroup
+ * remains frozen and it's possible to create a new child
+ * without unfreezing. The new child is frozen too.
+ */
+static int test_cgfreezer_rmdir(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *parent, *child = NULL;
+
+	parent = cg_name(root, "cg_test_rmdir_A");
+	if (!parent)
+		goto cleanup;
+
+	child = cg_name(parent, "cg_test_rmdir_B");
+	if (!child)
+		goto cleanup;
+
+	if (cg_create(parent))
+		goto cleanup;
+
+	if (cg_create(child))
+		goto cleanup;
+
+	if (cg_freeze_wait(parent, true))
+		goto cleanup;
+
+	if (cg_destroy(child))
+		goto cleanup;
+
+	if (cg_check_frozen(parent, true))
+		goto cleanup;
+
+	if (cg_create(child))
+		goto cleanup;
+
+	if (cg_check_frozen(child, true))
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (child)
+		cg_destroy(child);
+	free(child);
+	if (parent)
+		cg_destroy(parent);
+	free(parent);
+	return ret;
+}
+
+/*
+ * The test creates two cgroups: A and B, runs a process in A
+ * and performs several migrations:
+ * 1) A (running) -> B (frozen)
+ * 2) B (frozen) -> A (running)
+ * 3) A (frozen) -> B (frozen)
+ *
+ * On each step it checks the actual state of both cgroups.
+ */
+static int test_cgfreezer_migrate(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *cgroup[2] = {0};
+	int pid;
+
+	cgroup[0] = cg_name(root, "cg_test_migrate_A");
+	if (!cgroup[0])
+		goto cleanup;
+
+	cgroup[1] = cg_name(root, "cg_test_migrate_B");
+	if (!cgroup[1])
+		goto cleanup;
+
+	if (cg_create(cgroup[0]))
+		goto cleanup;
+
+	if (cg_create(cgroup[1]))
+		goto cleanup;
+
+	pid = cg_run_nowait(cgroup[0], child_fn, NULL);
+	if (pid < 0)
+		goto cleanup;
+
+	if (cg_wait_for_proc_count(cgroup[0], 1))
+		goto cleanup;
+
+	/*
+	 * Migrate from A (running) to B (frozen)
+	 */
+	if (cg_freeze_wait(cgroup[1], true))
+		goto cleanup;
+
+	if (cg_enter_and_wait_for_frozen(cgroup[1], pid, true))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup[0], false))
+		goto cleanup;
+
+	/*
+	 * Migrate from B (frozen) to A (running)
+	 */
+	if (cg_enter_and_wait_for_frozen(cgroup[0], pid, false))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup[1], true))
+		goto cleanup;
+
+	/*
+	 * Migrate from A (frozen) to B (frozen)
+	 */
+	if (cg_freeze_wait(cgroup[0], true))
+		goto cleanup;
+
+	if (cg_enter_and_wait_for_frozen(cgroup[1], pid, true))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup[0], true))
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup[0])
+		cg_destroy(cgroup[0]);
+	free(cgroup[0]);
+	if (cgroup[1])
+		cg_destroy(cgroup[1]);
+	free(cgroup[1]);
+	return ret;
+}
+
+/*
+ * The test checks that ptrace works with a tracing process in a frozen cgroup.
+ */
+static int test_cgfreezer_ptrace(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *cgroup = NULL;
+	siginfo_t siginfo;
+	int pid;
+
+	cgroup = cg_name(root, "cg_test_ptrace");
+	if (!cgroup)
+		goto cleanup;
+
+	if (cg_create(cgroup))
+		goto cleanup;
+
+	pid = cg_run_nowait(cgroup, child_fn, NULL);
+	if (pid < 0)
+		goto cleanup;
+
+	if (cg_wait_for_proc_count(cgroup, 1))
+		goto cleanup;
+
+	if (cg_freeze_wait(cgroup, true))
+		goto cleanup;
+
+	if (ptrace(PTRACE_SEIZE, pid, NULL, NULL))
+		goto cleanup;
+
+	if (ptrace(PTRACE_INTERRUPT, pid, NULL, NULL))
+		goto cleanup;
+
+	waitpid(pid, NULL, 0);
+
+	/*
+	 * Cgroup has to remain frozen, however the test task
+	 * is in traced state.
+	 */
+	if (cg_check_frozen(cgroup, true))
+		goto cleanup;
+
+	if (ptrace(PTRACE_GETSIGINFO, pid, NULL, &siginfo))
+		goto cleanup;
+
+	if (ptrace(PTRACE_DETACH, pid, NULL, NULL))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup, true))
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup)
+		cg_destroy(cgroup);
+	free(cgroup);
+	return ret;
+}
+
+/*
+ * Check if the process is stopped.
+ */
+static int proc_check_stopped(int pid)
+{
+	char buf[PAGE_SIZE];
+	int len;
+
+	len = proc_read_text(pid, "stat", buf, sizeof(buf));
+	if (len == -1) {
+		debug("Can't get %d stat\n", pid);
+		return -1;
+	}
+
+	if (strstr(buf, "(test_freezer) T ") == NULL) {
+		debug("Process %d in the unexpected state: %s\n", pid, buf);
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Test that it's possible to freeze a cgroup with a stopped process.
+ */
+static int test_cgfreezer_stopped(const char *root)
+{
+	int pid, ret = KSFT_FAIL;
+	char *cgroup = NULL;
+
+	cgroup = cg_name(root, "cg_test_stopped");
+	if (!cgroup)
+		goto cleanup;
+
+	if (cg_create(cgroup))
+		goto cleanup;
+
+	pid = cg_run_nowait(cgroup, child_fn, NULL);
+
+	if (cg_wait_for_proc_count(cgroup, 1))
+		goto cleanup;
+
+	if (kill(pid, SIGSTOP))
+		goto cleanup;
+
+	if (cg_check_frozen(cgroup, false))
+		goto cleanup;
+
+	if (cg_freeze_wait(cgroup, true))
+		goto cleanup;
+
+	if (cg_freeze_wait(cgroup, false))
+		goto cleanup;
+
+	if (proc_check_stopped(pid))
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup)
+		cg_destroy(cgroup);
+	free(cgroup);
+	return ret;
+}
+
+/*
+ * Test that it's possible to freeze a cgroup with a ptraced process.
+ */
+static int test_cgfreezer_ptraced(const char *root)
+{
+	int pid, ret = KSFT_FAIL;
+	char *cgroup = NULL;
+	siginfo_t siginfo;
+
+	cgroup = cg_name(root, "cg_test_ptraced");
+	if (!cgroup)
+		goto cleanup;
+
+	if (cg_create(cgroup))
+		goto cleanup;
+
+	pid = cg_run_nowait(cgroup, child_fn, NULL);
+
+	if (cg_wait_for_proc_count(cgroup, 1))
+		goto cleanup;
+
+	if (ptrace(PTRACE_SEIZE, pid, NULL, NULL))
+		goto cleanup;
+
+	if (ptrace(PTRACE_INTERRUPT, pid, NULL, NULL))
+		goto cleanup;
+
+	waitpid(pid, NULL, 0);
+
+	if (cg_check_frozen(cgroup, false))
+		goto cleanup;
+
+	if (cg_freeze_wait(cgroup, true))
+		goto cleanup;
+
+	/*
+	 * cg_check_frozen(cgroup, true) will fail here,
+	 * because the task in in the TRACEd state.
+	 */
+	if (cg_freeze_wait(cgroup, false))
+		goto cleanup;
+
+	if (ptrace(PTRACE_GETSIGINFO, pid, NULL, &siginfo))
+		goto cleanup;
+
+	if (ptrace(PTRACE_DETACH, pid, NULL, NULL))
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup)
+		cg_destroy(cgroup);
+	free(cgroup);
+	return ret;
+}
+
+static int vfork_fn(const char *cgroup, void *arg)
+{
+	int pid = vfork();
+
+	if (pid == 0)
+		while (true)
+			sleep(1);
+
+	return pid;
+}
+
+/*
+ * Test that it's possible to freeze a cgroup with a process,
+ * which called vfork() and is waiting for a child.
+ */
+static int test_cgfreezer_vfork(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *cgroup = NULL;
+
+	cgroup = cg_name(root, "cg_test_vfork");
+	if (!cgroup)
+		goto cleanup;
+
+	if (cg_create(cgroup))
+		goto cleanup;
+
+	cg_run_nowait(cgroup, vfork_fn, NULL);
+
+	if (cg_wait_for_proc_count(cgroup, 2))
+		goto cleanup;
+
+	if (cg_freeze_wait(cgroup, true))
+		goto cleanup;
+
+	ret = KSFT_PASS;
+
+cleanup:
+	if (cgroup)
+		cg_destroy(cgroup);
+	free(cgroup);
+	return ret;
+}
+
+#define T(x) { x, #x }
+struct cgfreezer_test {
+	int (*fn)(const char *root);
+	const char *name;
+} tests[] = {
+	T(test_cgfreezer_simple),
+	T(test_cgfreezer_tree),
+	T(test_cgfreezer_forkbomb),
+	T(test_cgfreezer_rmdir),
+	T(test_cgfreezer_migrate),
+	T(test_cgfreezer_ptrace),
+	T(test_cgfreezer_stopped),
+	T(test_cgfreezer_ptraced),
+	T(test_cgfreezer_vfork),
+};
+#undef T
+
+int main(int argc, char *argv[])
+{
+	char root[PATH_MAX];
+	int i, ret = EXIT_SUCCESS;
+
+	if (cg_find_unified_root(root, sizeof(root)))
+		ksft_exit_skip("cgroup v2 isn't mounted\n");
+	for (i = 0; i < ARRAY_SIZE(tests); i++) {
+		switch (tests[i].fn(root)) {
+		case KSFT_PASS:
+			ksft_test_result_pass("%s\n", tests[i].name);
+			break;
+		case KSFT_SKIP:
+			ksft_test_result_skip("%s\n", tests[i].name);
+			break;
+		default:
+			ret = EXIT_FAILURE;
+			ksft_test_result_fail("%s\n", tests[i].name);
+			break;
+		}
+	}
+
+	return ret;
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 7/9] cgroup: make TRACE_CGROUP_PATH irq-safe
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
                   ` (5 preceding siblings ...)
  2019-04-05 17:47 ` [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests Roman Gushchin
@ 2019-04-05 17:47 ` Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 8/9] cgroup: add tracing points for cgroup v2 freezer Roman Gushchin
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin

To use the TRACE_CGROUP_PATH() macro with css_set_lock
locked, let's make the macro irq-safe.
It's necessary in order to trace cgroup freezer state
transitions (frozen/not frozen), which are happening
with css_set_lock locked.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 kernel/cgroup/cgroup-internal.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index 02c001ffe2e2..809e34a3c017 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -28,12 +28,15 @@ extern void __init enable_debug_cgroup(void);
 #define TRACE_CGROUP_PATH(type, cgrp, ...)				\
 	do {								\
 		if (trace_cgroup_##type##_enabled()) {			\
-			spin_lock(&trace_cgroup_path_lock);		\
+			unsigned long flags;				\
+			spin_lock_irqsave(&trace_cgroup_path_lock,	\
+					  flags);			\
 			cgroup_path(cgrp, trace_cgroup_path,		\
 				    TRACE_CGROUP_PATH_LEN);		\
 			trace_cgroup_##type(cgrp, trace_cgroup_path,	\
 					    ##__VA_ARGS__);		\
-			spin_unlock(&trace_cgroup_path_lock);		\
+			spin_unlock_irqrestore(&trace_cgroup_path_lock, \
+					       flags);			\
 		}							\
 	} while (0)
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 8/9] cgroup: add tracing points for cgroup v2 freezer
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
                   ` (6 preceding siblings ...)
  2019-04-05 17:47 ` [PATCH v10 7/9] cgroup: make TRACE_CGROUP_PATH irq-safe Roman Gushchin
@ 2019-04-05 17:47 ` Roman Gushchin
  2019-04-05 17:47 ` [PATCH v10 9/9] cgroup: document cgroup v2 freezer interface Roman Gushchin
  2019-04-19 18:29 ` [PATCH v10 0/9] freezer for cgroup v2 Tejun Heo
  9 siblings, 0 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin

Add cgroup:cgroup_freeze and cgroup:cgroup_unfreeze events,
which are using the existing cgroup tracing infrastructure.

Add the cgroup_event event class, which is similar to the cgroup
class, but contains an additional integer field to store a new
value (the level field is dropped).

Also add two tracing events: cgroup_notify_populated and
cgroup_notify_frozen, which are raised in a generic way using
the TRACE_CGROUP_PATH() macro.

This allows to trace cgroup state transitions and is generally
helpful for debugging the cgroup freezer code.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 include/trace/events/cgroup.h | 55 +++++++++++++++++++++++++++++++++++
 kernel/cgroup/cgroup.c        |  2 ++
 kernel/cgroup/freezer.c       | 15 +++++++++-
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/cgroup.h b/include/trace/events/cgroup.h
index a401ff5e7847..a566cc521476 100644
--- a/include/trace/events/cgroup.h
+++ b/include/trace/events/cgroup.h
@@ -103,6 +103,20 @@ DEFINE_EVENT(cgroup, cgroup_rename,
 	TP_ARGS(cgrp, path)
 );
 
+DEFINE_EVENT(cgroup, cgroup_freeze,
+
+	TP_PROTO(struct cgroup *cgrp, const char *path),
+
+	TP_ARGS(cgrp, path)
+);
+
+DEFINE_EVENT(cgroup, cgroup_unfreeze,
+
+	TP_PROTO(struct cgroup *cgrp, const char *path),
+
+	TP_ARGS(cgrp, path)
+);
+
 DECLARE_EVENT_CLASS(cgroup_migrate,
 
 	TP_PROTO(struct cgroup *dst_cgrp, const char *path,
@@ -149,6 +163,47 @@ DEFINE_EVENT(cgroup_migrate, cgroup_transfer_tasks,
 	TP_ARGS(dst_cgrp, path, task, threadgroup)
 );
 
+DECLARE_EVENT_CLASS(cgroup_event,
+
+	TP_PROTO(struct cgroup *cgrp, const char *path, int val),
+
+	TP_ARGS(cgrp, path, val),
+
+	TP_STRUCT__entry(
+		__field(	int,		root			)
+		__field(	int,		id			)
+		__field(	int,		level			)
+		__string(	path,		path			)
+		__field(	int,		val			)
+	),
+
+	TP_fast_assign(
+		__entry->root = cgrp->root->hierarchy_id;
+		__entry->id = cgrp->id;
+		__entry->level = cgrp->level;
+		__assign_str(path, path);
+		__entry->val = val;
+	),
+
+	TP_printk("root=%d id=%d level=%d path=%s val=%d",
+		  __entry->root, __entry->id, __entry->level, __get_str(path),
+		  __entry->val)
+);
+
+DEFINE_EVENT(cgroup_event, cgroup_notify_populated,
+
+	TP_PROTO(struct cgroup *cgrp, const char *path, int val),
+
+	TP_ARGS(cgrp, path, val)
+);
+
+DEFINE_EVENT(cgroup_event, cgroup_notify_frozen,
+
+	TP_PROTO(struct cgroup *cgrp, const char *path, int val),
+
+	TP_ARGS(cgrp, path, val)
+);
+
 #endif /* _TRACE_CGROUP_H */
 
 /* This part must be outside protection */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 6895464b54c6..57edcf398d71 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -816,6 +816,8 @@ static void cgroup_update_populated(struct cgroup *cgrp, bool populated)
 			break;
 
 		cgroup1_check_for_release(cgrp);
+		TRACE_CGROUP_PATH(notify_populated, cgrp,
+				  cgroup_is_populated(cgrp));
 		cgroup_file_notify(&cgrp->events_file);
 
 		child = cgrp;
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index 9d8cda478fc9..3bfbb3c8baf3 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -6,6 +6,8 @@
 
 #include "cgroup-internal.h"
 
+#include <trace/events/cgroup.h>
+
 /*
  * Propagate the cgroup frozen state upwards by the cgroup tree.
  */
@@ -28,6 +30,7 @@ static void cgroup_propagate_frozen(struct cgroup *cgrp, bool frozen)
 			    cgrp->nr_descendants) {
 				set_bit(CGRP_FROZEN, &cgrp->flags);
 				cgroup_file_notify(&cgrp->events_file);
+				TRACE_CGROUP_PATH(notify_frozen, cgrp, 1);
 				desc++;
 			}
 		} else {
@@ -35,6 +38,7 @@ static void cgroup_propagate_frozen(struct cgroup *cgrp, bool frozen)
 			if (test_bit(CGRP_FROZEN, &cgrp->flags)) {
 				clear_bit(CGRP_FROZEN, &cgrp->flags);
 				cgroup_file_notify(&cgrp->events_file);
+				TRACE_CGROUP_PATH(notify_frozen, cgrp, 0);
 				desc++;
 			}
 		}
@@ -73,6 +77,7 @@ void cgroup_update_frozen(struct cgroup *cgrp)
 		clear_bit(CGRP_FROZEN, &cgrp->flags);
 	}
 	cgroup_file_notify(&cgrp->events_file);
+	TRACE_CGROUP_PATH(notify_frozen, cgrp, frozen);
 
 	/* Update the state of ancestor cgroups. */
 	cgroup_propagate_frozen(cgrp, frozen);
@@ -189,6 +194,11 @@ static void cgroup_do_freeze(struct cgroup *cgrp, bool freeze)
 		clear_bit(CGRP_FREEZE, &cgrp->flags);
 	spin_unlock_irq(&css_set_lock);
 
+	if (freeze)
+		TRACE_CGROUP_PATH(freeze, cgrp);
+	else
+		TRACE_CGROUP_PATH(unfreeze, cgrp);
+
 	css_task_iter_start(&cgrp->self, 0, &it);
 	while ((task = css_task_iter_next(&it))) {
 		/*
@@ -312,6 +322,9 @@ void cgroup_freeze(struct cgroup *cgrp, bool freeze)
 	 * In both cases it's better to notify a user, that there is
 	 * nothing to wait for.
 	 */
-	if (!applied)
+	if (!applied) {
+		TRACE_CGROUP_PATH(notify_frozen, cgrp,
+				  test_bit(CGRP_FROZEN, &cgrp->flags));
 		cgroup_file_notify(&cgrp->events_file);
+	}
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v10 9/9] cgroup: document cgroup v2 freezer interface
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
                   ` (7 preceding siblings ...)
  2019-04-05 17:47 ` [PATCH v10 8/9] cgroup: add tracing points for cgroup v2 freezer Roman Gushchin
@ 2019-04-05 17:47 ` Roman Gushchin
  2019-04-19 18:29 ` [PATCH v10 0/9] freezer for cgroup v2 Tejun Heo
  9 siblings, 0 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-05 17:47 UTC (permalink / raw)
  To: Tejun Heo, Oleg Nesterov
  Cc: kernel-team, cgroups, linux-kernel, Roman Gushchin,
	Mike Rapoport, linux-doc

Describe cgroup v2 freezer interface in the cgroup v2 admin guide.

Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-doc@vger.kernel.org
Cc: kernel-team@fb.com
---
 Documentation/admin-guide/cgroup-v2.rst | 27 +++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 20f92c16ffbf..88e746074252 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -864,6 +864,8 @@ All cgroup core files are prefixed with "cgroup."
 	  populated
 		1 if the cgroup or its descendants contains any live
 		processes; otherwise, 0.
+	  frozen
+		1 if the cgroup is frozen; otherwise, 0.
 
   cgroup.max.descendants
 	A read-write single value files.  The default is "max".
@@ -897,6 +899,31 @@ All cgroup core files are prefixed with "cgroup."
 		A dying cgroup can consume system resources not exceeding
 		limits, which were active at the moment of cgroup deletion.
 
+  cgroup.freeze
+	A read-write single value file which exists on non-root cgroups.
+	Allowed values are "0" and "1". The default is "0".
+
+	Writing "1" to the file causes freezing of the cgroup and all
+	descendant cgroups. This means that all belonging processes will
+	be stopped and will not run until the cgroup will be explicitly
+	unfrozen. Freezing of the cgroup may take some time; when this action
+	is completed, the "frozen" value in the cgroup.events control file
+	will be updated to "1" and the corresponding notification will be
+	issued.
+
+	A cgroup can be frozen either by its own settings, or by settings
+	of any ancestor cgroups. If any of ancestor cgroups is frozen, the
+	cgroup will remain frozen.
+
+	Processes in the frozen cgroup can be killed by a fatal signal.
+	They also can enter and leave a frozen cgroup: either by an explicit
+	move by a user, or if freezing of the cgroup races with fork().
+	If a process is moved to a frozen cgroup, it stops. If a process is
+	moved out of a frozen cgroup, it becomes running.
+
+	Frozen status of a cgroup doesn't affect any cgroup tree operations:
+	it's possible to delete a frozen (and empty) cgroup, as well as
+	create new sub-cgroups.
 
 Controllers
 ===========
-- 
2.20.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-05 17:47 ` [PATCH v10 4/9] cgroup: cgroup v2 freezer Roman Gushchin
@ 2019-04-19 15:19   ` Oleg Nesterov
  2019-04-19 16:08     ` Oleg Nesterov
  2019-04-19 16:11     ` Roman Gushchin
  2019-04-24 16:02   ` Oleg Nesterov
  1 sibling, 2 replies; 29+ messages in thread
From: Oleg Nesterov @ 2019-04-19 15:19 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Tejun Heo, kernel-team, cgroups, linux-kernel, Roman Gushchin

On 04/05, Roman Gushchin wrote:
>
> +void cgroup_leave_frozen(bool always_leave)
> +{
> +	struct cgroup *cgrp;
> +
> +	spin_lock_irq(&css_set_lock);
> +	cgrp = task_dfl_cgroup(current);
> +	if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
> +		cgroup_dec_frozen_cnt(cgrp);
> +		cgroup_update_frozen(cgrp);
> +		WARN_ON_ONCE(!current->frozen);
> +		current->frozen = false;
> +	}
> +	spin_unlock_irq(&css_set_lock);
> +
> +	if (unlikely(current->frozen)) {
> +		/*
> +		 * If the task remained in the frozen state,
> +		 * make sure it won't reach userspace without
> +		 * entering the signal handling loop.
> +		 */
> +		spin_lock_irq(&current->sighand->siglock);
> +		recalc_sigpending();
> +		spin_unlock_irq(&current->sighand->siglock);

I still can't understand this logic.

Once again, suppose we race with CGRP_FREEZE. If JOBCTL_TRAP_FREEZE is already
set then signal_pending() must be already T and we do not need recalc_sigpending?
If JOBCTL_TRAP_FREEZE is not set yet, how can recalc_sigpending() help?

> +static void cgroup_freeze_task(struct task_struct *task, bool freeze)
> +{
> +	unsigned long flags;
> +
> +	/* If the task is about to die, don't bother with freezing it. */
> +	if (!lock_task_sighand(task, &flags))
> +		return;
> +
> +	if (freeze) {
> +		task->jobctl |= JOBCTL_TRAP_FREEZE;
> +		signal_wake_up(task, false);
> +	} else {
> +		task->jobctl &= ~JOBCTL_TRAP_FREEZE;
> +		wake_up_process(task);

wake_up_interruptible() ?

>  static int ptrace_signal(int signr, kernel_siginfo_t *info)
>  {
>  	/*
> @@ -2442,6 +2483,10 @@ bool get_signal(struct ksignal *ksig)
>  		ksig->info.si_signo = signr = SIGKILL;
>  		sigdelset(&current->pending.signal, SIGKILL);
>  		recalc_sigpending();
> +		current->jobctl &= ~JOBCTL_TRAP_FREEZE;
> +		spin_unlock_irq(&sighand->siglock);
> +		if (unlikely(cgroup_task_frozen(current)))
> +			cgroup_leave_frozen(true);

Oh, and another leave_frozen below...

I feel this must be simplified somehow, but nothing comes to my mind right now.

> +		/*
> +		 * If the task is leaving the frozen state, let's update
> +		 * cgroup counters and reset the frozen bit.
> +		 */
> +		if (unlikely(cgroup_task_frozen(current))) {
>  			spin_unlock_irq(&sighand->siglock);
> +			cgroup_leave_frozen(true);
>  			goto relock;
>  		}

afaics cgroup_leave_frozen(false) makes more sense here.

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-19 15:19   ` Oleg Nesterov
@ 2019-04-19 16:08     ` Oleg Nesterov
  2019-04-19 16:36       ` Roman Gushchin
  2019-04-19 16:11     ` Roman Gushchin
  1 sibling, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2019-04-19 16:08 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Tejun Heo, kernel-team, cgroups, linux-kernel, Roman Gushchin

Anyway, I do not want to delay this feature. Even if I am right we can
cleanup this code later. I see nothing really wrong, so no objections
from me. Sorry for delay.


On 04/19, Oleg Nesterov wrote:
>
> On 04/05, Roman Gushchin wrote:
> >
> > +void cgroup_leave_frozen(bool always_leave)
> > +{
> > +	struct cgroup *cgrp;
> > +
> > +	spin_lock_irq(&css_set_lock);
> > +	cgrp = task_dfl_cgroup(current);
> > +	if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
> > +		cgroup_dec_frozen_cnt(cgrp);
> > +		cgroup_update_frozen(cgrp);
> > +		WARN_ON_ONCE(!current->frozen);
> > +		current->frozen = false;
> > +	}
> > +	spin_unlock_irq(&css_set_lock);
> > +
> > +	if (unlikely(current->frozen)) {
> > +		/*
> > +		 * If the task remained in the frozen state,
> > +		 * make sure it won't reach userspace without
> > +		 * entering the signal handling loop.
> > +		 */
> > +		spin_lock_irq(&current->sighand->siglock);
> > +		recalc_sigpending();
> > +		spin_unlock_irq(&current->sighand->siglock);
>
> I still can't understand this logic.
>
> Once again, suppose we race with CGRP_FREEZE. If JOBCTL_TRAP_FREEZE is already
> set then signal_pending() must be already T and we do not need recalc_sigpending?
> If JOBCTL_TRAP_FREEZE is not set yet, how can recalc_sigpending() help?
>
> > +static void cgroup_freeze_task(struct task_struct *task, bool freeze)
> > +{
> > +	unsigned long flags;
> > +
> > +	/* If the task is about to die, don't bother with freezing it. */
> > +	if (!lock_task_sighand(task, &flags))
> > +		return;
> > +
> > +	if (freeze) {
> > +		task->jobctl |= JOBCTL_TRAP_FREEZE;
> > +		signal_wake_up(task, false);
> > +	} else {
> > +		task->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > +		wake_up_process(task);
>
> wake_up_interruptible() ?
>
> >  static int ptrace_signal(int signr, kernel_siginfo_t *info)
> >  {
> >  	/*
> > @@ -2442,6 +2483,10 @@ bool get_signal(struct ksignal *ksig)
> >  		ksig->info.si_signo = signr = SIGKILL;
> >  		sigdelset(&current->pending.signal, SIGKILL);
> >  		recalc_sigpending();
> > +		current->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > +		spin_unlock_irq(&sighand->siglock);
> > +		if (unlikely(cgroup_task_frozen(current)))
> > +			cgroup_leave_frozen(true);
>
> Oh, and another leave_frozen below...
>
> I feel this must be simplified somehow, but nothing comes to my mind right now.
>
> > +		/*
> > +		 * If the task is leaving the frozen state, let's update
> > +		 * cgroup counters and reset the frozen bit.
> > +		 */
> > +		if (unlikely(cgroup_task_frozen(current))) {
> >  			spin_unlock_irq(&sighand->siglock);
> > +			cgroup_leave_frozen(true);
> >  			goto relock;
> >  		}
>
> afaics cgroup_leave_frozen(false) makes more sense here.
>
> Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-19 15:19   ` Oleg Nesterov
  2019-04-19 16:08     ` Oleg Nesterov
@ 2019-04-19 16:11     ` Roman Gushchin
  2019-04-19 16:26       ` Oleg Nesterov
  1 sibling, 1 reply; 29+ messages in thread
From: Roman Gushchin @ 2019-04-19 16:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

Hi Oleg!

On Fri, Apr 19, 2019 at 05:19:12PM +0200, Oleg Nesterov wrote:
> On 04/05, Roman Gushchin wrote:
> >
> > +void cgroup_leave_frozen(bool always_leave)
> > +{
> > +	struct cgroup *cgrp;
> > +
> > +	spin_lock_irq(&css_set_lock);
> > +	cgrp = task_dfl_cgroup(current);
> > +	if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
> > +		cgroup_dec_frozen_cnt(cgrp);
> > +		cgroup_update_frozen(cgrp);
> > +		WARN_ON_ONCE(!current->frozen);
> > +		current->frozen = false;
> > +	}
> > +	spin_unlock_irq(&css_set_lock);
> > +
> > +	if (unlikely(current->frozen)) {
> > +		/*
> > +		 * If the task remained in the frozen state,
> > +		 * make sure it won't reach userspace without
> > +		 * entering the signal handling loop.
> > +		 */
> > +		spin_lock_irq(&current->sighand->siglock);
> > +		recalc_sigpending();
> > +		spin_unlock_irq(&current->sighand->siglock);
> 
> I still can't understand this logic.
> 
> Once again, suppose we race with CGRP_FREEZE. If JOBCTL_TRAP_FREEZE is already
> set then signal_pending() must be already T and we do not need recalc_sigpending?
> If JOBCTL_TRAP_FREEZE is not set yet, how can recalc_sigpending() help?

This is paired with cgroup_task_frozen() check in recalc_sigpending_tsk().
If the task is waking from waiting in vfork(), and it races with
unfreezing of the cgroup, we should guarantee that the task won't
return to userspace with task->frozen flag set, otherwise it would break
accounting of frozen tasks.
We can't rely solely on JOBCTL_TRAP_FREEZE bit, as it can be cleared
in parallel at any moment. So we backup it with the task->frozen check.

> 
> > +static void cgroup_freeze_task(struct task_struct *task, bool freeze)
> > +{
> > +	unsigned long flags;
> > +
> > +	/* If the task is about to die, don't bother with freezing it. */
> > +	if (!lock_task_sighand(task, &flags))
> > +		return;
> > +
> > +	if (freeze) {
> > +		task->jobctl |= JOBCTL_TRAP_FREEZE;
> > +		signal_wake_up(task, false);
> > +	} else {
> > +		task->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > +		wake_up_process(task);
> 
> wake_up_interruptible() ?

Wait_up_interruptible() is supposed to work with a workqueue,
but here there is nothing like this. Probably, I didn't understand your idea.
Can you, please, elaborate a bit more?

> 
> >  static int ptrace_signal(int signr, kernel_siginfo_t *info)
> >  {
> >  	/*
> > @@ -2442,6 +2483,10 @@ bool get_signal(struct ksignal *ksig)
> >  		ksig->info.si_signo = signr = SIGKILL;
> >  		sigdelset(&current->pending.signal, SIGKILL);
> >  		recalc_sigpending();
> > +		current->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > +		spin_unlock_irq(&sighand->siglock);
> > +		if (unlikely(cgroup_task_frozen(current)))
> > +			cgroup_leave_frozen(true);
> 
> Oh, and another leave_frozen below...

Yeah, because of this new "goto fatal" shortcut.

> 
> I feel this must be simplified somehow, but nothing comes to my mind right now.
> 
> > +		/*
> > +		 * If the task is leaving the frozen state, let's update
> > +		 * cgroup counters and reset the frozen bit.
> > +		 */
> > +		if (unlikely(cgroup_task_frozen(current))) {
> >  			spin_unlock_irq(&sighand->siglock);
> > +			cgroup_leave_frozen(true);
> >  			goto relock;
> >  		}
> 
> afaics cgroup_leave_frozen(false) makes more sense here.

Why? I don't see any reasons why the task should remain in the frozen
state after this point. Can you, please, provide an example?

Thank you for looking into it!

Roman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-19 16:11     ` Roman Gushchin
@ 2019-04-19 16:26       ` Oleg Nesterov
  2019-04-19 16:56         ` Roman Gushchin
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2019-04-19 16:26 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On 04/19, Roman Gushchin wrote:
>
> > Once again, suppose we race with CGRP_FREEZE. If JOBCTL_TRAP_FREEZE is already
> > set then signal_pending() must be already T and we do not need recalc_sigpending?
> > If JOBCTL_TRAP_FREEZE is not set yet, how can recalc_sigpending() help?
>
> This is paired with cgroup_task_frozen() check in recalc_sigpending_tsk().

Ooh, I didn't notice this version added cgroup_task_frozen() into
recalc_sigpending_tsk() ...

Honestly, I don't like this. But see another email I sent, we can cleanup
this code later.

> > > +static void cgroup_freeze_task(struct task_struct *task, bool freeze)
> > > +{
> > > +	unsigned long flags;
> > > +
> > > +	/* If the task is about to die, don't bother with freezing it. */
> > > +	if (!lock_task_sighand(task, &flags))
> > > +		return;
> > > +
> > > +	if (freeze) {
> > > +		task->jobctl |= JOBCTL_TRAP_FREEZE;
> > > +		signal_wake_up(task, false);
> > > +	} else {
> > > +		task->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > > +		wake_up_process(task);
> >
> > wake_up_interruptible() ?
>
> Wait_up_interruptible() is supposed to work with a workqueue,
> but here there is nothing like this. Probably, I didn't understand your idea.
> Can you, please, elaborate a bit more?

Not sure I understand... We need to wake up the task if it sleeps in
do_freezer_trap(), right? do_freezer_trap() uses TASK_INTERRUPTIBLE, so
why can't wake_up_interruptible() == __wake_up(TASK_INTERRUPTIBLE) work?

> > >  static int ptrace_signal(int signr, kernel_siginfo_t *info)
> > >  {
> > >  	/*
> > > @@ -2442,6 +2483,10 @@ bool get_signal(struct ksignal *ksig)
> > >  		ksig->info.si_signo = signr = SIGKILL;
> > >  		sigdelset(&current->pending.signal, SIGKILL);
> > >  		recalc_sigpending();
> > > +		current->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > > +		spin_unlock_irq(&sighand->siglock);
> > > +		if (unlikely(cgroup_task_frozen(current)))
> > > +			cgroup_leave_frozen(true);
> >
> > Oh, and another leave_frozen below...
>
> Yeah, because of this new "goto fatal" shortcut.

I understand, but the code doesn't look nice... but again, I can't suggest
anything better at least right now, so please forget.

> > > +		if (unlikely(cgroup_task_frozen(current))) {
> > >  			spin_unlock_irq(&sighand->siglock);
> > > +			cgroup_leave_frozen(true);
> > >  			goto relock;
> > >  		}
> >
> > afaics cgroup_leave_frozen(false) makes more sense here.
>
> Why? I don't see any reasons why the task should remain in the frozen
> state after this point.

But cgroup_leave_frozen(false) will equally clear ->frozen if !CGRP_FREEZE ?
OTOH, if CGRP_FREEZE is set again, why do we need to clear ->frozen?

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-19 16:08     ` Oleg Nesterov
@ 2019-04-19 16:36       ` Roman Gushchin
  0 siblings, 0 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-04-19 16:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On Fri, Apr 19, 2019 at 06:08:06PM +0200, Oleg Nesterov wrote:
> Anyway, I do not want to delay this feature. Even if I am right we can
> cleanup this code later. I see nothing really wrong, so no objections
> from me. Sorry for delay.

This is great!

Thank you so much for your time spent reviewing all these iterations!
I hope the patchset is in a much better state now.

Roman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-19 16:26       ` Oleg Nesterov
@ 2019-04-19 16:56         ` Roman Gushchin
  2019-04-20 10:58           ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Roman Gushchin @ 2019-04-19 16:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On Fri, Apr 19, 2019 at 06:26:00PM +0200, Oleg Nesterov wrote:
> On 04/19, Roman Gushchin wrote:
> >
> > > Once again, suppose we race with CGRP_FREEZE. If JOBCTL_TRAP_FREEZE is already
> > > set then signal_pending() must be already T and we do not need recalc_sigpending?
> > > If JOBCTL_TRAP_FREEZE is not set yet, how can recalc_sigpending() help?
> >
> > This is paired with cgroup_task_frozen() check in recalc_sigpending_tsk().
> 
> Ooh, I didn't notice this version added cgroup_task_frozen() into
> recalc_sigpending_tsk() ...
> 
> Honestly, I don't like this. But see another email I sent, we can cleanup
> this code later.

Yeah, totally agree: it's not pretty. But honestly I've no better ideas,
so let's fix it later.

> 
> > > > +static void cgroup_freeze_task(struct task_struct *task, bool freeze)
> > > > +{
> > > > +	unsigned long flags;
> > > > +
> > > > +	/* If the task is about to die, don't bother with freezing it. */
> > > > +	if (!lock_task_sighand(task, &flags))
> > > > +		return;
> > > > +
> > > > +	if (freeze) {
> > > > +		task->jobctl |= JOBCTL_TRAP_FREEZE;
> > > > +		signal_wake_up(task, false);
> > > > +	} else {
> > > > +		task->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > > > +		wake_up_process(task);
> > >
> > > wake_up_interruptible() ?
> >
> > Wait_up_interruptible() is supposed to work with a workqueue,
> > but here there is nothing like this. Probably, I didn't understand your idea.
> > Can you, please, elaborate a bit more?
> 
> Not sure I understand... We need to wake up the task if it sleeps in
> do_freezer_trap(), right? do_freezer_trap() uses TASK_INTERRUPTIBLE, so
> why can't wake_up_interruptible() == __wake_up(TASK_INTERRUPTIBLE) work?

Right, but __wake_up is supposed to wake threads blocked on a waitqueue:

    /**
     * __wake_up - wake up threads blocked on a waitqueue.
     * @wq_head: the waitqueue
     * @mode: which threads
     * @nr_exclusive: how many wake-one or wake-many threads to wake up
     * @key: is directly passed to the wakeup function
     *
     * If this function wakes up a task, it executes a full memory barrier before
     * accessing the task state.
     */
    void __wake_up(struct wait_queue_head *wq_head, unsigned int mode,
			int nr_exclusive, void *key)

What should I pass as wq_head?

> 
> > > >  static int ptrace_signal(int signr, kernel_siginfo_t *info)
> > > >  {
> > > >  	/*
> > > > @@ -2442,6 +2483,10 @@ bool get_signal(struct ksignal *ksig)
> > > >  		ksig->info.si_signo = signr = SIGKILL;
> > > >  		sigdelset(&current->pending.signal, SIGKILL);
> > > >  		recalc_sigpending();
> > > > +		current->jobctl &= ~JOBCTL_TRAP_FREEZE;
> > > > +		spin_unlock_irq(&sighand->siglock);
> > > > +		if (unlikely(cgroup_task_frozen(current)))
> > > > +			cgroup_leave_frozen(true);
> > >
> > > Oh, and another leave_frozen below...
> >
> > Yeah, because of this new "goto fatal" shortcut.
> 
> I understand, but the code doesn't look nice... but again, I can't suggest
> anything better at least right now, so please forget.
> 
> > > > +		if (unlikely(cgroup_task_frozen(current))) {
> > > >  			spin_unlock_irq(&sighand->siglock);
> > > > +			cgroup_leave_frozen(true);
> > > >  			goto relock;
> > > >  		}
> > >
> > > afaics cgroup_leave_frozen(false) makes more sense here.
> >
> > Why? I don't see any reasons why the task should remain in the frozen
> > state after this point.
> 
> But cgroup_leave_frozen(false) will equally clear ->frozen if !CGRP_FREEZE ?
> OTOH, if CGRP_FREEZE is set again, why do we need to clear ->frozen?

Hm, it might work too, but I'm not sure I like it more. IMO, the best option
is to have a single cgroup_leave_frozen(true) in signal.c, it's just simpler.
If a user changed the desired state of cgroup twice, there is no need to avoid
state transitions. Or maybe I don't see it yet.

Thank you!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 0/9] freezer for cgroup v2
  2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
                   ` (8 preceding siblings ...)
  2019-04-05 17:47 ` [PATCH v10 9/9] cgroup: document cgroup v2 freezer interface Roman Gushchin
@ 2019-04-19 18:29 ` Tejun Heo
  9 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2019-04-19 18:29 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Oleg Nesterov, kernel-team, cgroups, linux-kernel, Roman Gushchin

On Fri, Apr 05, 2019 at 10:46:59AM -0700, Roman Gushchin wrote:
> This patchset implements freezer for cgroup v2.
> 
> It provides similar functionality as v1 freezer, but the interface
> conforms to the cgroup v2 interface design principles, and it
> provides a better user experience: tasks can be killed, ptrace works,
> there is no separate controller, which has to be enabled, etc.
> 
> Patches (1), (2) and (3) are some preparational work, patch (4) contains
> the implementation, patch (5) is a small cgroup kselftest fix,
> patch (6) covers freezer adds 6 new kselftests to cover the freezer
> functionality. Patchse (7) and (8) adding tracing points to simplify
> the debugging process. Patch (9) adds corresponding docs.

All patches applied to cgroup/for-5.2 w/ Oleg's
No-objection-from-me-by added to (4).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-19 16:56         ` Roman Gushchin
@ 2019-04-20 10:58           ` Oleg Nesterov
  2019-04-22 22:11             ` Roman Gushchin
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2019-04-20 10:58 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On 04/19, Roman Gushchin wrote:
>
> > > >
> > > > wake_up_interruptible() ?
> > >
> > > Wait_up_interruptible() is supposed to work with a workqueue,
> > > but here there is nothing like this. Probably, I didn't understand your idea.
> > > Can you, please, elaborate a bit more?
> >
> > Not sure I understand... We need to wake up the task if it sleeps in
> > do_freezer_trap(), right? do_freezer_trap() uses TASK_INTERRUPTIBLE, so
> > why can't wake_up_interruptible() == __wake_up(TASK_INTERRUPTIBLE) work?
>
> Right, but __wake_up is supposed to wake threads blocked on a waitqueue:

Ugh sorry ;) of course I meant wake_up_state(task, TASK_INTERRUPTIBLE).

> > > > > +		if (unlikely(cgroup_task_frozen(current))) {
> > > > >  			spin_unlock_irq(&sighand->siglock);
> > > > > +			cgroup_leave_frozen(true);
> > > > >  			goto relock;
> > > > >  		}
> > > >
> > > > afaics cgroup_leave_frozen(false) makes more sense here.
> > >
> > > Why? I don't see any reasons why the task should remain in the frozen
> > > state after this point.
> >
> > But cgroup_leave_frozen(false) will equally clear ->frozen if !CGRP_FREEZE ?
> > OTOH, if CGRP_FREEZE is set again, why do we need to clear ->frozen?
>
> Hm, it might work too, but I'm not sure I like it more. IMO, the best option
> is to have a single cgroup_leave_frozen(true) in signal.c, it's just simpler.
> If a user changed the desired state of cgroup twice, there is no need to avoid
> state transitions. Or maybe I don't see it yet.

Then why do we need cgroup_leave_frozen(false) in wait_for_vfork_done() ? How
does it differ from get_signal() ?

If nothing else. Suppose that wait_for_vfork_done() calls leave(false) and this
races with freezer, CGRP_FREEZE is already set but JOBCTL_TRAP_FREEZE is not.

This sets TIF_SIGPENDING to ensure the task won't return to user mode, thus it
calls get_signal().

get_signal() doesn't see JOBCTL_TRAP_FREEZE, it notices ->frozen == T and does
cgroup_leave_frozen(true) which clears ->frozen.

Then the task calls dequeue_signal(), clears TIF_SIGPENDING and returns to user
mode?

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-20 10:58           ` Oleg Nesterov
@ 2019-04-22 22:11             ` Roman Gushchin
  2019-04-24 15:46               ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Roman Gushchin @ 2019-04-22 22:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On Sat, Apr 20, 2019 at 12:58:38PM +0200, Oleg Nesterov wrote:
> On 04/19, Roman Gushchin wrote:
> >
> > > > >
> > > > > wake_up_interruptible() ?
> > > >
> > > > Wait_up_interruptible() is supposed to work with a workqueue,
> > > > but here there is nothing like this. Probably, I didn't understand your idea.
> > > > Can you, please, elaborate a bit more?
> > >
> > > Not sure I understand... We need to wake up the task if it sleeps in
> > > do_freezer_trap(), right? do_freezer_trap() uses TASK_INTERRUPTIBLE, so
> > > why can't wake_up_interruptible() == __wake_up(TASK_INTERRUPTIBLE) work?
> >
> > Right, but __wake_up is supposed to wake threads blocked on a waitqueue:
> 
> Ugh sorry ;) of course I meant wake_up_state(task, TASK_INTERRUPTIBLE).

Agh, then it makes total sense to me. I'll master a follow-up patch.

> 
> > > > > > +		if (unlikely(cgroup_task_frozen(current))) {
> > > > > >  			spin_unlock_irq(&sighand->siglock);
> > > > > > +			cgroup_leave_frozen(true);
> > > > > >  			goto relock;
> > > > > >  		}
> > > > >
> > > > > afaics cgroup_leave_frozen(false) makes more sense here.
> > > >
> > > > Why? I don't see any reasons why the task should remain in the frozen
> > > > state after this point.
> > >
> > > But cgroup_leave_frozen(false) will equally clear ->frozen if !CGRP_FREEZE ?
> > > OTOH, if CGRP_FREEZE is set again, why do we need to clear ->frozen?
> >
> > Hm, it might work too, but I'm not sure I like it more. IMO, the best option
> > is to have a single cgroup_leave_frozen(true) in signal.c, it's just simpler.
> > If a user changed the desired state of cgroup twice, there is no need to avoid
> > state transitions. Or maybe I don't see it yet.
> 
> Then why do we need cgroup_leave_frozen(false) in wait_for_vfork_done() ? How
> does it differ from get_signal() ?

We need it because sleeping in vfork is a special state which we want to
account as frozen. And if the parent process wakes up while the cgroup is frozen
(because of the child death, for example), we want to push it into the "proper"
frozen state without changing the state of the cgroup.

> 
> If nothing else. Suppose that wait_for_vfork_done() calls leave(false) and this
> races with freezer, CGRP_FREEZE is already set but JOBCTL_TRAP_FREEZE is not.
> 
> This sets TIF_SIGPENDING to ensure the task won't return to user mode, thus it
> calls get_signal().
> 
> get_signal() doesn't see JOBCTL_TRAP_FREEZE, it notices ->frozen == T and does
> cgroup_leave_frozen(true) which clears ->frozen.
> 
> Then the task calls dequeue_signal(), clears TIF_SIGPENDING and returns to user
> mode?

Got it, a good catch! So if the freezer races with vfork() completion, we might
have a spurious frozen->unfrozen->frozen transition of the cgroup state.

Switching to cgroup_leave_frozen(false) seems to solve it, but I'm slightly
concerned that we're basically putting the task in a busy loop between
the setting CGRP_FREEZE and setting TRAP_FREEZE. Do you think it's ok?
I wonder if there are better solutions.

Thank you!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-22 22:11             ` Roman Gushchin
@ 2019-04-24 15:46               ` Oleg Nesterov
  2019-04-24 22:06                 ` Roman Gushchin
  0 siblings, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2019-04-24 15:46 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On 04/22, Roman Gushchin wrote:
>
> > > Hm, it might work too, but I'm not sure I like it more. IMO, the best option
> > > is to have a single cgroup_leave_frozen(true) in signal.c, it's just simpler.
> > > If a user changed the desired state of cgroup twice, there is no need to avoid
> > > state transitions. Or maybe I don't see it yet.
> >
> > Then why do we need cgroup_leave_frozen(false) in wait_for_vfork_done() ? How
> > does it differ from get_signal() ?
>
> We need it because sleeping in vfork is a special state which we want to
> account as frozen. And if the parent process wakes up while the cgroup is frozen
> (because of the child death, for example), we want to push it into the "proper"
> frozen state without changing the state of the cgroup.

Again, I do not see how vfork() differs from get_signal() in this respect.

Let me provide another example. A TASK_STOPPED task reacts to SIGCONT and
returns to get_signal(), current->frozen is true.

If this races with CGRP_FREEZE, the task should not return to user-space,
just like vfork(). I see no difference.

They differ in that wait_for_vfork_done() should guarentee TIF_SIGPENDING
in this case, but this is another story...

>
> > If nothing else. Suppose that wait_for_vfork_done() calls leave(false) and this
> > races with freezer, CGRP_FREEZE is already set but JOBCTL_TRAP_FREEZE is not.
> >
> > This sets TIF_SIGPENDING to ensure the task won't return to user mode, thus it
> > calls get_signal().
> >
> > get_signal() doesn't see JOBCTL_TRAP_FREEZE, it notices ->frozen == T and does
> > cgroup_leave_frozen(true) which clears ->frozen.
> >
> > Then the task calls dequeue_signal(), clears TIF_SIGPENDING and returns to user
> > mode?
>
> Got it, a good catch! So if the freezer races with vfork() completion, we might
> have a spurious frozen->unfrozen->frozen transition of the cgroup state.
>
> Switching to cgroup_leave_frozen(false) seems to solve it, but I'm slightly
> concerned that we're basically putting the task in a busy loop between
> the setting CGRP_FREEZE and setting TRAP_FREEZE.

Yes, yes. Didn't I say I dislike the new ->frozen check in recalc() ? ;)

OK, how about the ABSOLUTELY UNTESTED patch below? For the start.

Oleg.

diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index 3bfbb3c..b5ccc87 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -132,26 +132,21 @@ void cgroup_leave_frozen(bool always_leave)
 {
 	struct cgroup *cgrp;
 
+	WARN_ON_ONCE(!current->frozen);
+
 	spin_lock_irq(&css_set_lock);
 	cgrp = task_dfl_cgroup(current);
 	if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
 		cgroup_dec_frozen_cnt(cgrp);
 		cgroup_update_frozen(cgrp);
-		WARN_ON_ONCE(!current->frozen);
 		current->frozen = false;
+	} else if (!(current->jobctl & JOBCTL_TRAP_FREEZE)) {
+		spin_lock(&current->sighand->siglock);
+		current->jobctl |= JOBCTL_TRAP_FREEZE;
+		set_thread_flag(TIF_SIGPENDING);
+		spin_unlock(&current->sighand->siglock);
 	}
 	spin_unlock_irq(&css_set_lock);
-
-	if (unlikely(current->frozen)) {
-		/*
-		 * If the task remained in the frozen state,
-		 * make sure it won't reach userspace without
-		 * entering the signal handling loop.
-		 */
-		spin_lock_irq(&current->sighand->siglock);
-		recalc_sigpending();
-		spin_unlock_irq(&current->sighand->siglock);
-	}
 }
 
 /*
diff --git a/kernel/signal.c b/kernel/signal.c
index e46d527..155b273 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2515,7 +2515,7 @@ bool get_signal(struct ksignal *ksig)
 		 */
 		if (unlikely(cgroup_task_frozen(current))) {
 			spin_unlock_irq(&sighand->siglock);
-			cgroup_leave_frozen(true);
+			cgroup_leave_frozen(false);
 			goto relock;
 		}
 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-05 17:47 ` [PATCH v10 4/9] cgroup: cgroup v2 freezer Roman Gushchin
  2019-04-19 15:19   ` Oleg Nesterov
@ 2019-04-24 16:02   ` Oleg Nesterov
  2019-04-24 22:10     ` Roman Gushchin
  1 sibling, 1 reply; 29+ messages in thread
From: Oleg Nesterov @ 2019-04-24 16:02 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Tejun Heo, kernel-team, cgroups, linux-kernel, Roman Gushchin

On 04/05, Roman Gushchin wrote:
>
> @@ -5830,6 +5926,12 @@ void cgroup_exit(struct task_struct *tsk)
>  		spin_lock_irq(&css_set_lock);
>  		css_set_move_task(tsk, cset, NULL, false);
>  		cset->nr_tasks--;
> +
> +		if (unlikely(cgroup_task_frozen(tsk)))
> +			cgroup_freezer_frozen_exit(tsk);
> +		else if (unlikely(cgroup_task_freeze(tsk)))
> +			cgroup_update_frozen(task_dfl_cgroup(tsk));
> +

Now that I actually applied these patches, I can't understand how can
cgroup_exit() hit cgroup_task_frozen(current)...

A cgroup_task_frozen() task must never return to user-mode or escape
from get_signal() without leave_frozen() or we have a bug?

I must have missed something, could you explain why the patch below is
wrong?

Oleg.


diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 3e2efd4..24b8757 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -889,7 +889,6 @@ void cgroup_update_frozen(struct cgroup *cgrp);
 void cgroup_freeze(struct cgroup *cgrp, bool freeze);
 void cgroup_freezer_migrate_task(struct task_struct *task, struct cgroup *src,
 				 struct cgroup *dst);
-void cgroup_freezer_frozen_exit(struct task_struct *task);
 static inline bool cgroup_task_freeze(struct task_struct *task)
 {
 	bool ret;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 6f09f9b..3d5f76a 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6008,11 +6008,8 @@ void cgroup_exit(struct task_struct *tsk)
 		css_set_move_task(tsk, cset, NULL, false);
 		cset->nr_tasks--;
 
-		if (unlikely(cgroup_task_frozen(tsk)))
-			cgroup_freezer_frozen_exit(tsk);
-		else if (unlikely(cgroup_task_freeze(tsk)))
+		if (unlikely(cgroup_task_freeze(tsk)))
 			cgroup_update_frozen(task_dfl_cgroup(tsk));
-
 		spin_unlock_irq(&css_set_lock);
 	} else {
 		get_css_set(cset);
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index b5ccc87..68aa506 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -249,16 +249,6 @@ void cgroup_freezer_migrate_task(struct task_struct *task,
 	cgroup_freeze_task(task, test_bit(CGRP_FREEZE, &dst->flags));
 }
 
-void cgroup_freezer_frozen_exit(struct task_struct *task)
-{
-	struct cgroup *cgrp = task_dfl_cgroup(task);
-
-	lockdep_assert_held(&css_set_lock);
-
-	cgroup_dec_frozen_cnt(cgrp);
-	cgroup_update_frozen(cgrp);
-}
-
 void cgroup_freeze(struct cgroup *cgrp, bool freeze)
 {
 	struct cgroup_subsys_state *css;


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-24 15:46               ` Oleg Nesterov
@ 2019-04-24 22:06                 ` Roman Gushchin
  2019-04-26 17:40                   ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Roman Gushchin @ 2019-04-24 22:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On Wed, Apr 24, 2019 at 05:46:19PM +0200, Oleg Nesterov wrote:
> On 04/22, Roman Gushchin wrote:
> >
> > > > Hm, it might work too, but I'm not sure I like it more. IMO, the best option
> > > > is to have a single cgroup_leave_frozen(true) in signal.c, it's just simpler.
> > > > If a user changed the desired state of cgroup twice, there is no need to avoid
> > > > state transitions. Or maybe I don't see it yet.
> > >
> > > Then why do we need cgroup_leave_frozen(false) in wait_for_vfork_done() ? How
> > > does it differ from get_signal() ?
> >
> > We need it because sleeping in vfork is a special state which we want to
> > account as frozen. And if the parent process wakes up while the cgroup is frozen
> > (because of the child death, for example), we want to push it into the "proper"
> > frozen state without changing the state of the cgroup.
> 
> Again, I do not see how vfork() differs from get_signal() in this respect.
> 
> Let me provide another example. A TASK_STOPPED task reacts to SIGCONT and
> returns to get_signal(), current->frozen is true.
> 
> If this races with CGRP_FREEZE, the task should not return to user-space,
> just like vfork(). I see no difference.
> 
> They differ in that wait_for_vfork_done() should guarentee TIF_SIGPENDING
> in this case, but this is another story...

Right, I agree.

> 
> >
> > > If nothing else. Suppose that wait_for_vfork_done() calls leave(false) and this
> > > races with freezer, CGRP_FREEZE is already set but JOBCTL_TRAP_FREEZE is not.
> > >
> > > This sets TIF_SIGPENDING to ensure the task won't return to user mode, thus it
> > > calls get_signal().
> > >
> > > get_signal() doesn't see JOBCTL_TRAP_FREEZE, it notices ->frozen == T and does
> > > cgroup_leave_frozen(true) which clears ->frozen.
> > >
> > > Then the task calls dequeue_signal(), clears TIF_SIGPENDING and returns to user
> > > mode?
> >
> > Got it, a good catch! So if the freezer races with vfork() completion, we might
> > have a spurious frozen->unfrozen->frozen transition of the cgroup state.
> >
> > Switching to cgroup_leave_frozen(false) seems to solve it, but I'm slightly
> > concerned that we're basically putting the task in a busy loop between
> > the setting CGRP_FREEZE and setting TRAP_FREEZE.
> 
> Yes, yes. Didn't I say I dislike the new ->frozen check in recalc() ? ;)
> 
> OK, how about the ABSOLUTELY UNTESTED patch below? For the start.

It looks good to me (and all freezer selftests pass).

Just to be sure, is it a solution to avoid the busy loop in the signal handling
loop, right? Because it doesn't allow to drop the ->frozen check from recalc().

The JOBCTL_TRAP_FREEZE check without siglock initially looked dangerous to me,
but after some thoughts I didn't find any case when it's wrong.

Do you prefer me to master a patch or to do it by yourself?

Thank you!

Roman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-24 16:02   ` Oleg Nesterov
@ 2019-04-24 22:10     ` Roman Gushchin
  2019-04-26 17:30       ` Oleg Nesterov
  0 siblings, 1 reply; 29+ messages in thread
From: Roman Gushchin @ 2019-04-24 22:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On Wed, Apr 24, 2019 at 06:02:38PM +0200, Oleg Nesterov wrote:
> On 04/05, Roman Gushchin wrote:
> >
> > @@ -5830,6 +5926,12 @@ void cgroup_exit(struct task_struct *tsk)
> >  		spin_lock_irq(&css_set_lock);
> >  		css_set_move_task(tsk, cset, NULL, false);
> >  		cset->nr_tasks--;
> > +
> > +		if (unlikely(cgroup_task_frozen(tsk)))
> > +			cgroup_freezer_frozen_exit(tsk);
> > +		else if (unlikely(cgroup_task_freeze(tsk)))
> > +			cgroup_update_frozen(task_dfl_cgroup(tsk));
> > +

Hello, Oleg!

A good catch! It's actually a leftover from one of the previous versions
of the patchset. In the current iteration I'd replace it with:

	WARN_ON_ONCE(cgroup_task_frozen(tsk));

just to make sure we're not leaking the frozen bit.
Do you agree?

Thank you!

Roman

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-24 22:10     ` Roman Gushchin
@ 2019-04-26 17:30       ` Oleg Nesterov
  0 siblings, 0 replies; 29+ messages in thread
From: Oleg Nesterov @ 2019-04-26 17:30 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On 04/24, Roman Gushchin wrote:
>
> of the patchset. In the current iteration I'd replace it with:
> 
> 	WARN_ON_ONCE(cgroup_task_frozen(tsk));
> 
> just to make sure we're not leaking the frozen bit.
> Do you agree?

Yes sure,

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
  2019-04-24 22:06                 ` Roman Gushchin
@ 2019-04-26 17:40                   ` Oleg Nesterov
  0 siblings, 0 replies; 29+ messages in thread
From: Oleg Nesterov @ 2019-04-26 17:40 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Roman Gushchin, Tejun Heo, Kernel Team, cgroups, linux-kernel

On 04/24, Roman Gushchin wrote:
>
> On Wed, Apr 24, 2019 at 05:46:19PM +0200, Oleg Nesterov wrote:
> >
> > OK, how about the ABSOLUTELY UNTESTED patch below? For the start.
>
> It looks good to me (and all freezer selftests pass).
>
> Just to be sure, is it a solution to avoid the busy loop in the signal handling
> loop, right?

Yes,

> Because it doesn't allow to drop the ->frozen check from recalc().

Yes. Because we can race with unfreeze after leave_frozen().

> The JOBCTL_TRAP_FREEZE check without siglock initially looked dangerous to me,
> but after some thoughts I didn't find any case when it's wrong.

I think this is fine... Yes, JOBCTL_TRAP_FREEZE can be already set when we take
siglock, but I don't think we need to recheck this flag.

The only important thing (afaics) is that CGRP_FREEZE is stable under css_set_lock,
so we can't wrongly set TRAP_FREEZE.

> Do you prefer me to master a patch

Yes please ;)

Oleg.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests
  2019-04-05 17:47 ` [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests Roman Gushchin
@ 2019-07-16 14:48   ` Naresh Kamboju
  2019-07-17  0:49     ` Roman Gushchin
  0 siblings, 1 reply; 29+ messages in thread
From: Naresh Kamboju @ 2019-07-16 14:48 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Tejun Heo, Oleg Nesterov, kernel-team, cgroups, open list,
	Roman Gushchin, Shuah Khan, open list:KERNEL SELFTEST FRAMEWORK,
	lkft-triage

Hi Roman,

Just want to share information here on what we notice on running this test case,

On Fri, 5 Apr 2019 at 23:17, Roman Gushchin <guroan@gmail.com> wrote:
>
> This patch implements 9 tests for the freezer controller for
> cgroup v2:
...
> 6) ptrace test: the test checks that it's possible to attach to
> a process in a frozen cgroup, get some information and detach, and
> the cgroup will remain frozen.

selftests cgroup test_freezer failed because of the sys entry path not found.
 Cgroup /sys/fs/cgroup/unified/cg_test_ptrace isn't frozen
/sys/fs/cgroup/unified/cg_test_ptrace: isn't_frozen #
# not ok 6 test_cgfreezer_ptrace

This test case fails intermittently.

Test output:
-------------
# selftests cgroup test_freezer
cgroup: test_freezer_ #
# Cgroup /sys/fs/cgroup/unified/cg_test_ptrace isn't frozen
/sys/fs/cgroup/unified/cg_test_ptrace: isn't_frozen #
# ok 1 test_cgfreezer_simple
1: test_cgfreezer_simple_ #
# ok 2 test_cgfreezer_tree
2: test_cgfreezer_tree_ #
# ok 3 test_cgfreezer_forkbomb
3: test_cgfreezer_forkbomb_ #
# ok 4 test_cgfreezer_rmdir
4: test_cgfreezer_rmdir_ #
# ok 5 test_cgfreezer_migrate
5: test_cgfreezer_migrate_ #
# not ok 6 test_cgfreezer_ptrace
ok: 6_test_cgfreezer_ptrace #
# ok 7 test_cgfreezer_stopped
7: test_cgfreezer_stopped_ #
# ok 8 test_cgfreezer_ptraced
8: test_cgfreezer_ptraced_ #
# ok 9 test_cgfreezer_vfork
9: test_cgfreezer_vfork_ #
[FAIL] 3 selftests cgroup test_freezer
selftests: cgroup_test_freezer [FAIL]

Test results link,
https://qa-reports.linaro.org/lkft/linux-mainline-oe/tests/kselftest/cgroup_test_freezer?page=1

- Naresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests
  2019-07-16 14:48   ` Naresh Kamboju
@ 2019-07-17  0:49     ` Roman Gushchin
  2019-07-17  8:56       ` Naresh Kamboju
  0 siblings, 1 reply; 29+ messages in thread
From: Roman Gushchin @ 2019-07-17  0:49 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Roman Gushchin, Tejun Heo, Oleg Nesterov, Kernel Team, cgroups,
	open list, Shuah Khan, open list:KERNEL SELFTEST FRAMEWORK,
	lkft-triage

On Tue, Jul 16, 2019 at 08:18:39PM +0530, Naresh Kamboju wrote:
> Hi Roman,
> 
> Just want to share information here on what we notice on running this test case,

Hi Naresh!

Thank you for the report!

The interaction between ptrace and freezer is complicated
(as ptrace in general),so there are known cases when spurious
cgroup transitions (frozen <-> non frozen <-> frozen ) can happen.

Does it create any difficulties? If so, I'll send a diff to ignore
the result of this test for now.

Thank you!

> 
> On Fri, 5 Apr 2019 at 23:17, Roman Gushchin <guroan@gmail.com> wrote:
> >
> > This patch implements 9 tests for the freezer controller for
> > cgroup v2:
> ...
> > 6) ptrace test: the test checks that it's possible to attach to
> > a process in a frozen cgroup, get some information and detach, and
> > the cgroup will remain frozen.
> 
> selftests cgroup test_freezer failed because of the sys entry path not found.

Can you, please, elaborate on this?

>  Cgroup /sys/fs/cgroup/unified/cg_test_ptrace isn't frozen
> /sys/fs/cgroup/unified/cg_test_ptrace: isn't_frozen #
> # not ok 6 test_cgfreezer_ptrace
> 
> This test case fails intermittently.

Thank you!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests
  2019-07-17  0:49     ` Roman Gushchin
@ 2019-07-17  8:56       ` Naresh Kamboju
  2019-07-18 18:19         ` Roman Gushchin
  0 siblings, 1 reply; 29+ messages in thread
From: Naresh Kamboju @ 2019-07-17  8:56 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Roman Gushchin, Tejun Heo, Oleg Nesterov, Kernel Team, cgroups,
	open list, Shuah Khan, open list:KERNEL SELFTEST FRAMEWORK,
	lkft-triage

Hi Roman,

> The interaction between ptrace and freezer is complicated
> (as ptrace in general),so there are known cases when spurious
> cgroup transitions (frozen <-> non frozen <-> frozen ) can happen.

When test ran for 8 times it got failed 5 times.
I have noticed intermittent failure on x86_64, arm64 hikey device and
qemu_arm64.

>
> Does it create any difficulties? If so, I'll send a diff to ignore
> the result of this test for now.

No difficulties. I will mark this test case as known failure.

> > > This patch implements 9 tests for the freezer controller for
> > > cgroup v2:
> > ...
> > > 6) ptrace test: the test checks that it's possible to attach to
> > > a process in a frozen cgroup, get some information and detach, and
> > > the cgroup will remain frozen.
> >
> > selftests cgroup test_freezer failed because of the sys entry path not found.
>
> Can you, please, elaborate on this?

I see this failure output on screen and sharing log here.

Please find full test log here,
https://lkft.validation.linaro.org/scheduler/job/825346#L673

- Naresh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests
  2019-07-17  8:56       ` Naresh Kamboju
@ 2019-07-18 18:19         ` Roman Gushchin
  0 siblings, 0 replies; 29+ messages in thread
From: Roman Gushchin @ 2019-07-18 18:19 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Roman Gushchin, Tejun Heo, Oleg Nesterov, Kernel Team, cgroups,
	open list, Shuah Khan, open list:KERNEL SELFTEST FRAMEWORK,
	lkft-triage

On Wed, Jul 17, 2019 at 02:26:49PM +0530, Naresh Kamboju wrote:
> Hi Roman,
> 
> > The interaction between ptrace and freezer is complicated
> > (as ptrace in general),so there are known cases when spurious
> > cgroup transitions (frozen <-> non frozen <-> frozen ) can happen.
> 
> When test ran for 8 times it got failed 5 times.
> I have noticed intermittent failure on x86_64, arm64 hikey device and
> qemu_arm64.

Oh, it wasn't nearly as bad when I ran it. I'll take a look.

> 
> >
> > Does it create any difficulties? If so, I'll send a diff to ignore
> > the result of this test for now.
> 
> No difficulties. I will mark this test case as known failure.

Sure.

Thanks!

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-07-18 18:19 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 1/9] cgroup: rename freezer.c into legacy_freezer.c Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 2/9] cgroup: implement __cgroup_task_count() helper Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 3/9] cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 4/9] cgroup: cgroup v2 freezer Roman Gushchin
2019-04-19 15:19   ` Oleg Nesterov
2019-04-19 16:08     ` Oleg Nesterov
2019-04-19 16:36       ` Roman Gushchin
2019-04-19 16:11     ` Roman Gushchin
2019-04-19 16:26       ` Oleg Nesterov
2019-04-19 16:56         ` Roman Gushchin
2019-04-20 10:58           ` Oleg Nesterov
2019-04-22 22:11             ` Roman Gushchin
2019-04-24 15:46               ` Oleg Nesterov
2019-04-24 22:06                 ` Roman Gushchin
2019-04-26 17:40                   ` Oleg Nesterov
2019-04-24 16:02   ` Oleg Nesterov
2019-04-24 22:10     ` Roman Gushchin
2019-04-26 17:30       ` Oleg Nesterov
2019-04-05 17:47 ` [PATCH v10 5/9] kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy() Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests Roman Gushchin
2019-07-16 14:48   ` Naresh Kamboju
2019-07-17  0:49     ` Roman Gushchin
2019-07-17  8:56       ` Naresh Kamboju
2019-07-18 18:19         ` Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 7/9] cgroup: make TRACE_CGROUP_PATH irq-safe Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 8/9] cgroup: add tracing points for cgroup v2 freezer Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 9/9] cgroup: document cgroup v2 freezer interface Roman Gushchin
2019-04-19 18:29 ` [PATCH v10 0/9] freezer for cgroup v2 Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).