LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v3 0/4] Use obj_cgroup APIs to charge kmem pages
@ 2021-03-09 10:07 Muchun Song
  2021-03-09 10:07 ` [PATCH v3 1/4] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Muchun Song
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-09 10:07 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

Since Roman series "The new cgroup slab memory controller" applied. All
slab objects are charged with the new APIs of obj_cgroup. The new APIs
introduce a struct obj_cgroup to charge slab objects. It prevents
long-living objects from pinning the original memory cgroup in the memory.
But there are still some corner objects (e.g. allocations larger than
order-1 page on SLUB) which are not charged with the new APIs. Those
objects (include the pages which are allocated from buddy allocator
directly) are charged as kmem pages which still hold a reference to
the memory cgroup.

E.g. We know that the kernel stack is charged as kmem pages because the
size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64
or arm64). If we create a thread (suppose the thread stack is charged to
memory cgroup A) and then move it from memory cgroup A to memory cgroup
B. Because the kernel stack of the thread hold a reference to the memory
cgroup A. The thread can pin the memory cgroup A in the memory even if
we remove the cgroup A. If we want to see this scenario by using the
following script. We can see that the system has added 500 dying cgroups
(This is not a real world issue, just a script to show that the large
kmallocs are charged as kmem pages which can pin the memory cgroup in the
memory).

	#!/bin/bash

	cat /proc/cgroups | grep memory

	cd /sys/fs/cgroup/memory
	echo 1 > memory.move_charge_at_immigrate

	for i in range{1..500}
	do
		mkdir kmem_test
		echo $$ > kmem_test/cgroup.procs
		sleep 3600 &
		echo $$ > cgroup.procs
		echo `cat kmem_test/cgroup.procs` > cgroup.procs
		rmdir kmem_test
	done

	cat /proc/cgroups | grep memory

This patchset aims to make those kmem pages to drop the reference to memory
cgroup by using the APIs of obj_cgroup. Finally, we can see that the number
of the dying cgroups will not increase if we run the above test script.

Changlogs in v3:
  1. Drop "remote objcg charging APIs" patch.
  2. Rename obj_cgroup_{un}charge_page to obj_cgroup_{un}charge_pages.
  3. Make page_memcg/page_memcg_rcu safe for adding new memcg_data flags.
  4. Reuse the ug infrastructure to uncharge the kmem pages.
  5. Add a new patch to move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM.

  Thanks to Roman's review and suggestions.

Changlogs in v2:
  1. Fix some types in the commit log (Thanks Roman).
  2. Do not introduce page_memcg_kmem helper (Thanks to Johannes and Shakeel).
  3. Reduce the CC list to mm/memcg folks (Thanks to Johannes).
  4. Introduce remote objcg charging APIs instead of convert "remote memcg
     charging APIs" to "remote objcg charging APIs".

Muchun Song (4):
  mm: memcontrol: introduce obj_cgroup_{un}charge_pages
  mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem
    page
  mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM

 include/linux/memcontrol.h | 103 ++++++++++++++++++++++++------
 mm/memcontrol.c            | 156 +++++++++++++++++++++++++++++++--------------
 mm/page_alloc.c            |   4 +-
 3 files changed, 191 insertions(+), 72 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/4] mm: memcontrol: introduce obj_cgroup_{un}charge_pages
  2021-03-09 10:07 [PATCH v3 0/4] Use obj_cgroup APIs to charge kmem pages Muchun Song
@ 2021-03-09 10:07 ` Muchun Song
  2021-03-11 12:30   ` Johannes Weiner
  2021-03-11 18:56   ` Shakeel Butt
  2021-03-09 10:07 ` [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-09 10:07 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

We know that the unit of slab object charging is bytes, the unit of
kmem page charging is PAGE_SIZE. If we want to reuse obj_cgroup APIs
to charge the kmem pages, we should pass PAGE_SIZE (as third parameter)
to obj_cgroup_charge(). Because the size is already PAGE_SIZE, we can
skip touch the objcg stock. And obj_cgroup_{un}charge_pages() are
introduced to charge in units of page level.

In the later patch, we also can reuse those two helpers to charge or
uncharge a number of kernel pages to a object cgroup. This is just
a code movement without any functional changes.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <guro@fb.com>
---
 mm/memcontrol.c | 46 +++++++++++++++++++++++++++++++---------------
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 845eec01ef9d..fc22da9805fb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3056,6 +3056,34 @@ static void memcg_free_cache_id(int id)
 	ida_simple_remove(&memcg_cache_ida, id);
 }
 
+static inline void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
+					     unsigned int nr_pages)
+{
+	rcu_read_lock();
+	__memcg_kmem_uncharge(obj_cgroup_memcg(objcg), nr_pages);
+	rcu_read_unlock();
+}
+
+static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
+				   unsigned int nr_pages)
+{
+	struct mem_cgroup *memcg;
+	int ret;
+
+	rcu_read_lock();
+retry:
+	memcg = obj_cgroup_memcg(objcg);
+	if (unlikely(!css_tryget(&memcg->css)))
+		goto retry;
+	rcu_read_unlock();
+
+	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+
+	css_put(&memcg->css);
+
+	return ret;
+}
+
 /**
  * __memcg_kmem_charge: charge a number of kernel pages to a memcg
  * @memcg: memory cgroup to charge
@@ -3180,11 +3208,8 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 		unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
 		unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
 
-		if (nr_pages) {
-			rcu_read_lock();
-			__memcg_kmem_uncharge(obj_cgroup_memcg(old), nr_pages);
-			rcu_read_unlock();
-		}
+		if (nr_pages)
+			obj_cgroup_uncharge_pages(old, nr_pages);
 
 		/*
 		 * The leftover is flushed to the centralized per-memcg value.
@@ -3242,7 +3267,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 
 int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 {
-	struct mem_cgroup *memcg;
 	unsigned int nr_pages, nr_bytes;
 	int ret;
 
@@ -3259,24 +3283,16 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 	 * refill_obj_stock(), called from this function or
 	 * independently later.
 	 */
-	rcu_read_lock();
-retry:
-	memcg = obj_cgroup_memcg(objcg);
-	if (unlikely(!css_tryget(&memcg->css)))
-		goto retry;
-	rcu_read_unlock();
-
 	nr_pages = size >> PAGE_SHIFT;
 	nr_bytes = size & (PAGE_SIZE - 1);
 
 	if (nr_bytes)
 		nr_pages += 1;
 
-	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
+	ret = obj_cgroup_charge_pages(objcg, gfp, nr_pages);
 	if (!ret && nr_bytes)
 		refill_obj_stock(objcg, PAGE_SIZE - nr_bytes);
 
-	css_put(&memcg->css);
 	return ret;
 }
 
-- 
2.11.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-09 10:07 [PATCH v3 0/4] Use obj_cgroup APIs to charge kmem pages Muchun Song
  2021-03-09 10:07 ` [PATCH v3 1/4] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Muchun Song
@ 2021-03-09 10:07 ` Muchun Song
  2021-03-10 19:57   ` Roman Gushchin
                     ` (2 more replies)
  2021-03-09 10:07 ` [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
  2021-03-09 10:07 ` [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Muchun Song
  3 siblings, 3 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-09 10:07 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

We want to reuse the obj_cgroup APIs to charge the kmem pages.
If we do that, we should store an object cgroup pointer to
page->memcg_data for the kmem pages.

Finally, page->memcg_data can have 3 different meanings.

  1) For the slab pages, page->memcg_data points to an object cgroups
     vector.

  2) For the kmem pages (exclude the slab pages), page->memcg_data
     points to an object cgroup.

  3) For the user pages (e.g. the LRU pages), page->memcg_data points
     to a memory cgroup.

Currently we always get the memory cgroup associated with a page via
page_memcg() or page_memcg_rcu(). page_memcg_check() is special, it
has to be used in cases when it's not known if a page has an
associated memory cgroup pointer or an object cgroups vector. Because
the page->memcg_data of the kmem page is not pointing to a memory
cgroup in the later patch, the page_memcg() and page_memcg_rcu()
cannot be applicable for the kmem pages. In this patch, make
page_memcg() and page_memcg_rcu() no longer apply to the kmem pages.
We do not change the behavior of the page_memcg_check(), it is also
applicable for the kmem pages.

In the end, there are 3 helpers to get the memcg associated with a page.
Usage is as follows.

  1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
     pages).

     - page_memcg()
     - page_memcg_rcu()

  2) Get the memory cgroup associated with a page. It has to be used in
     cases when it's not known if a page has an associated memory cgroup
     pointer or an object cgroups vector. Returns NULL for slab pages or
     uncharged pages. Otherwise, returns memory cgroup for charged pages
     (e.g. the kmem pages, the LRU pages).

     - page_memcg_check()

In some place, we use page_memcg() to check whether the page is charged.
Now introduce page_memcg_charged() helper to do that.

This is a preparation for reparenting the kmem pages.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/memcontrol.h | 33 +++++++++++++++++++++++++++------
 mm/memcontrol.c            | 23 +++++++++++++----------
 mm/page_alloc.c            |  4 ++--
 3 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e6dc793d587d..83cbcdcfcc92 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -358,14 +358,26 @@ enum page_memcg_data_flags {
 
 #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
 
+/* Return true for charged page, otherwise false. */
+static inline bool page_memcg_charged(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+
+	return !!memcg_data;
+}
+
 /*
- * page_memcg - get the memory cgroup associated with a page
+ * page_memcg - get the memory cgroup associated with a non-kmem page
  * @page: a pointer to the page struct
  *
  * Returns a pointer to the memory cgroup associated with the page,
  * or NULL. This function assumes that the page is known to have a
  * proper memory cgroup pointer. It's not safe to call this function
- * against some type of pages, e.g. slab pages or ex-slab pages.
+ * against some type of pages, e.g. slab pages, kmem pages or ex-slab
+ * pages.
  *
  * Any of the following ensures page and memcg binding stability:
  * - the page lock
@@ -378,27 +390,31 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
 	unsigned long memcg_data = page->memcg_data;
 
 	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
 	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
 
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
- * page_memcg_rcu - locklessly get the memory cgroup associated with a page
+ * page_memcg_rcu - locklessly get the memory cgroup associated with a non-kmem page
  * @page: a pointer to the page struct
  *
  * Returns a pointer to the memory cgroup associated with the page,
  * or NULL. This function assumes that the page is known to have a
  * proper memory cgroup pointer. It's not safe to call this function
- * against some type of pages, e.g. slab pages or ex-slab pages.
+ * against some type of pages, e.g. slab pages, kmem pages or ex-slab
+ * pages.
  */
 static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
 {
+	unsigned long memcg_data = READ_ONCE(page->memcg_data);
+
 	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
 	WARN_ON_ONCE(!rcu_read_lock_held());
 
-	return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
-				     ~MEMCG_DATA_FLAGS_MASK);
+	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
 /*
@@ -1072,6 +1088,11 @@ void mem_cgroup_split_huge_fixup(struct page *head);
 
 struct mem_cgroup;
 
+static inline bool page_memcg_charged(struct page *page)
+{
+	return false;
+}
+
 static inline struct mem_cgroup *page_memcg(struct page *page)
 {
 	return NULL;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fc22da9805fb..e1dc73ceb98a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
 			     int val)
 {
 	struct page *head = compound_head(page); /* rmap on tail pages */
-	struct mem_cgroup *memcg = page_memcg(head);
+	struct mem_cgroup *memcg;
 	pg_data_t *pgdat = page_pgdat(page);
 	struct lruvec *lruvec;
 
+	memcg = page_memcg_check(head);
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
 	if (!memcg) {
 		__mod_node_page_state(pgdat, idx, val);
@@ -3166,12 +3167,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
  */
 void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
-	struct mem_cgroup *memcg = page_memcg(page);
+	struct mem_cgroup *memcg;
 	unsigned int nr_pages = 1 << order;
 
-	if (!memcg)
+	if (!page_memcg_charged(page))
 		return;
 
+	memcg = page_memcg_check(page);
 	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
 	__memcg_kmem_uncharge(memcg, nr_pages);
 	page->memcg_data = 0;
@@ -6827,24 +6829,25 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 {
 	unsigned long nr_pages;
+	struct mem_cgroup *memcg;
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
-	if (!page_memcg(page))
+	if (!page_memcg_charged(page))
 		return;
 
 	/*
 	 * Nobody should be changing or seriously looking at
-	 * page_memcg(page) at this point, we have fully
-	 * exclusive access to the page.
+	 * page memcg at this point, we have fully exclusive
+	 * access to the page.
 	 */
-
-	if (ug->memcg != page_memcg(page)) {
+	memcg = page_memcg_check(page);
+	if (ug->memcg != memcg) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
 			uncharge_gather_clear(ug);
 		}
-		ug->memcg = page_memcg(page);
+		ug->memcg = memcg;
 
 		/* pairs with css_put in uncharge_batch */
 		css_get(&ug->memcg->css);
@@ -6877,7 +6880,7 @@ void mem_cgroup_uncharge(struct page *page)
 		return;
 
 	/* Don't touch page->lru of any random page, pre-check: */
-	if (!page_memcg(page))
+	if (!page_memcg_charged(page))
 		return;
 
 	uncharge_gather_clear(&ug);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f10966e3b4a5..bcb58ae15e24 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1124,7 +1124,7 @@ static inline bool page_expected_state(struct page *page,
 	if (unlikely((unsigned long)page->mapping |
 			page_ref_count(page) |
 #ifdef CONFIG_MEMCG
-			(unsigned long)page_memcg(page) |
+			page_memcg_charged(page) |
 #endif
 			(page->flags & check_flags)))
 		return false;
@@ -1149,7 +1149,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
 			bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
 	}
 #ifdef CONFIG_MEMCG
-	if (unlikely(page_memcg(page)))
+	if (unlikely(page_memcg_charged(page)))
 		bad_reason = "page still charged to cgroup";
 #endif
 	return bad_reason;
-- 
2.11.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-09 10:07 [PATCH v3 0/4] Use obj_cgroup APIs to charge kmem pages Muchun Song
  2021-03-09 10:07 ` [PATCH v3 1/4] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Muchun Song
  2021-03-09 10:07 ` [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
@ 2021-03-09 10:07 ` Muchun Song
  2021-03-10 19:53   ` Roman Gushchin
                     ` (2 more replies)
  2021-03-09 10:07 ` [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Muchun Song
  3 siblings, 3 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-09 10:07 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

Since Roman series "The new cgroup slab memory controller" applied. All
slab objects are charged via the new APIs of obj_cgroup. The new APIs
introduce a struct obj_cgroup to charge slab objects. It prevents
long-living objects from pinning the original memory cgroup in the memory.
But there are still some corner objects (e.g. allocations larger than
order-1 page on SLUB) which are not charged via the new APIs. Those
objects (include the pages which are allocated from buddy allocator
directly) are charged as kmem pages which still hold a reference to
the memory cgroup.

This patch aims to charge the kmem pages by using the new APIs of
obj_cgroup. Finally, the page->memcg_data of the kmem page points to
an object cgroup. We can use the page_objcg() to get the object
cgroup associated with a kmem page. Or we can use page_memcg_check()
to get the memory cgroup associated with a kmem page, but caller must
ensure that the returned memcg won't be released (e.g. acquire the
rcu_read_lock or css_set_lock).

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/memcontrol.h |  63 ++++++++++++++++++------
 mm/memcontrol.c            | 119 ++++++++++++++++++++++++++++++---------------
 2 files changed, 128 insertions(+), 54 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 83cbcdcfcc92..07c449af9c0f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -370,6 +370,18 @@ static inline bool page_memcg_charged(struct page *page)
 }
 
 /*
+ * After the initialization objcg->memcg is always pointing at
+ * a valid memcg, but can be atomically swapped to the parent memcg.
+ *
+ * The caller must ensure that the returned memcg won't be released:
+ * e.g. acquire the rcu_read_lock or css_set_lock.
+ */
+static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
+{
+	return READ_ONCE(objcg->memcg);
+}
+
+/*
  * page_memcg - get the memory cgroup associated with a non-kmem page
  * @page: a pointer to the page struct
  *
@@ -422,15 +434,19 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
  * @page: a pointer to the page struct
  *
  * Returns a pointer to the memory cgroup associated with the page,
- * or NULL. This function unlike page_memcg() can take any  page
+ * or NULL. This function unlike page_memcg() can take any page
  * as an argument. It has to be used in cases when it's not known if a page
- * has an associated memory cgroup pointer or an object cgroups vector.
+ * has an associated memory cgroup pointer or an object cgroups vector or
+ * an object cgroup.
  *
  * Any of the following ensures page and memcg binding stability:
  * - the page lock
  * - LRU isolation
  * - lock_page_memcg()
  * - exclusive reference
+ *
+ * Should be called under rcu lock which can protect memcg associated with a
+ * kmem page from being released.
  */
 static inline struct mem_cgroup *page_memcg_check(struct page *page)
 {
@@ -443,6 +459,13 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	if (memcg_data & MEMCG_DATA_OBJCGS)
 		return NULL;
 
+	if (memcg_data & MEMCG_DATA_KMEM) {
+		struct obj_cgroup *objcg;
+
+		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+		return obj_cgroup_memcg(objcg);
+	}
+
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
@@ -501,6 +524,25 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
+/*
+ * page_objcg - get the object cgroup associated with a kmem page
+ * @page: a pointer to the page struct
+ *
+ * Returns a pointer to the object cgroup associated with the kmem page,
+ * or NULL. This function assumes that the page is known to have an
+ * associated object cgroup. It's only safe to call this function
+ * against kmem pages (PageMemcgKmem() returns true).
+ */
+static inline struct obj_cgroup *page_objcg(struct page *page)
+{
+	unsigned long memcg_data = page->memcg_data;
+
+	VM_BUG_ON_PAGE(PageSlab(page), page);
+	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
+	VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
+
+	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
+}
 #else
 static inline struct obj_cgroup **page_objcgs(struct page *page)
 {
@@ -511,6 +553,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
 {
 	return NULL;
 }
+
+static inline struct obj_cgroup *page_objcg(struct page *page)
+{
+	return NULL;
+}
 #endif
 
 static __always_inline bool memcg_stat_item_in_bytes(int idx)
@@ -729,18 +776,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 	percpu_ref_put(&objcg->refcnt);
 }
 
-/*
- * After the initialization objcg->memcg is always pointing at
- * a valid memcg, but can be atomically swapped to the parent memcg.
- *
- * The caller must ensure that the returned memcg won't be released:
- * e.g. acquire the rcu_read_lock or css_set_lock.
- */
-static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
-{
-	return READ_ONCE(objcg->memcg);
-}
-
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 	if (memcg)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e1dc73ceb98a..38376f9d6659 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -859,15 +859,26 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
 	pg_data_t *pgdat = page_pgdat(page);
 	struct lruvec *lruvec;
 
-	memcg = page_memcg_check(head);
-	/* Untracked pages have no memcg, no lruvec. Update only the node */
-	if (!memcg) {
-		__mod_node_page_state(pgdat, idx, val);
-		return;
+	if (PageMemcgKmem(head)) {
+		rcu_read_lock();
+		memcg = obj_cgroup_memcg(page_objcg(page));
+	} else {
+		memcg = page_memcg(head);
+		/*
+		 * Untracked pages have no memcg, no lruvec. Update only the
+		 * node.
+		 */
+		if (!memcg) {
+			__mod_node_page_state(pgdat, idx, val);
+			return;
+		}
 	}
 
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 	__mod_lruvec_state(lruvec, idx, val);
+
+	if (PageMemcgKmem(head))
+		rcu_read_unlock();
 }
 EXPORT_SYMBOL(__mod_lruvec_page_state);
 
@@ -2906,6 +2917,20 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
 	page->memcg_data = (unsigned long)memcg;
 }
 
+static inline struct mem_cgroup *obj_cgroup_memcg_get(struct obj_cgroup *objcg)
+{
+	struct mem_cgroup *memcg;
+
+	rcu_read_lock();
+retry:
+	memcg = obj_cgroup_memcg(objcg);
+	if (unlikely(!css_tryget(&memcg->css)))
+		goto retry;
+	rcu_read_unlock();
+
+	return memcg;
+}
+
 #ifdef CONFIG_MEMCG_KMEM
 int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 				 gfp_t gfp, bool new_page)
@@ -3071,15 +3096,8 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
 	struct mem_cgroup *memcg;
 	int ret;
 
-	rcu_read_lock();
-retry:
-	memcg = obj_cgroup_memcg(objcg);
-	if (unlikely(!css_tryget(&memcg->css)))
-		goto retry;
-	rcu_read_unlock();
-
+	memcg = obj_cgroup_memcg_get(objcg);
 	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
-
 	css_put(&memcg->css);
 
 	return ret;
@@ -3144,18 +3162,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
  */
 int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
 {
-	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 	int ret = 0;
 
-	memcg = get_mem_cgroup_from_current();
-	if (memcg && !mem_cgroup_is_root(memcg)) {
-		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
+	objcg = get_obj_cgroup_from_current();
+	if (objcg) {
+		ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
 		if (!ret) {
-			page->memcg_data = (unsigned long)memcg |
+			page->memcg_data = (unsigned long)objcg |
 				MEMCG_DATA_KMEM;
 			return 0;
 		}
-		css_put(&memcg->css);
+		obj_cgroup_put(objcg);
 	}
 	return ret;
 }
@@ -3167,17 +3185,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
  */
 void __memcg_kmem_uncharge_page(struct page *page, int order)
 {
-	struct mem_cgroup *memcg;
+	struct obj_cgroup *objcg;
 	unsigned int nr_pages = 1 << order;
 
 	if (!page_memcg_charged(page))
 		return;
 
-	memcg = page_memcg_check(page);
-	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
-	__memcg_kmem_uncharge(memcg, nr_pages);
+	objcg = page_objcg(page);
+	obj_cgroup_uncharge_pages(objcg, nr_pages);
 	page->memcg_data = 0;
-	css_put(&memcg->css);
+	obj_cgroup_put(objcg);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -6806,11 +6823,23 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug)
 static void uncharge_batch(const struct uncharge_gather *ug)
 {
 	unsigned long flags;
+	unsigned long nr_pages;
 
-	if (!mem_cgroup_is_root(ug->memcg)) {
-		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
+	/*
+	 * The kmem pages can be reparented to the root memcg, in
+	 * order to prevent the memory counter of root memcg from
+	 * increasing indefinitely. We should decrease the memory
+	 * counter when unchange.
+	 */
+	if (mem_cgroup_is_root(ug->memcg))
+		nr_pages = ug->nr_kmem;
+	else
+		nr_pages = ug->nr_pages;
+
+	if (nr_pages) {
+		page_counter_uncharge(&ug->memcg->memory, nr_pages);
 		if (do_memsw_account())
-			page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
+			page_counter_uncharge(&ug->memcg->memsw, nr_pages);
 		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
 			page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
 		memcg_oom_recover(ug->memcg);
@@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
 
 static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 {
-	unsigned long nr_pages;
+	unsigned long nr_pages, nr_kmem;
 	struct mem_cgroup *memcg;
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
@@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	if (!page_memcg_charged(page))
 		return;
 
+	nr_pages = compound_nr(page);
 	/*
 	 * Nobody should be changing or seriously looking at
-	 * page memcg at this point, we have fully exclusive
-	 * access to the page.
+	 * page memcg or objcg at this point, we have fully
+	 * exclusive access to the page.
 	 */
-	memcg = page_memcg_check(page);
+	if (PageMemcgKmem(page)) {
+		struct obj_cgroup *objcg;
+
+		objcg = page_objcg(page);
+		memcg = obj_cgroup_memcg_get(objcg);
+
+		page->memcg_data = 0;
+		obj_cgroup_put(objcg);
+		nr_kmem = nr_pages;
+	} else {
+		memcg = page_memcg(page);
+		page->memcg_data = 0;
+		nr_kmem = 0;
+	}
+
 	if (ug->memcg != memcg) {
 		if (ug->memcg) {
 			uncharge_batch(ug);
 			uncharge_gather_clear(ug);
 		}
 		ug->memcg = memcg;
+		ug->dummy_page = page;
 
 		/* pairs with css_put in uncharge_batch */
 		css_get(&ug->memcg->css);
 	}
 
-	nr_pages = compound_nr(page);
 	ug->nr_pages += nr_pages;
+	ug->nr_kmem += nr_kmem;
+	ug->pgpgout += !nr_kmem;
 
-	if (PageMemcgKmem(page))
-		ug->nr_kmem += nr_pages;
-	else
-		ug->pgpgout++;
-
-	ug->dummy_page = page;
-	page->memcg_data = 0;
-	css_put(&ug->memcg->css);
+	css_put(&memcg->css);
 }
 
 /**
-- 
2.11.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM
  2021-03-09 10:07 [PATCH v3 0/4] Use obj_cgroup APIs to charge kmem pages Muchun Song
                   ` (2 preceding siblings ...)
  2021-03-09 10:07 ` [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
@ 2021-03-09 10:07 ` Muchun Song
  2021-03-10 19:30   ` Roman Gushchin
                     ` (2 more replies)
  3 siblings, 3 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-09 10:07 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: linux-kernel, linux-mm, duanxiongchun, Muchun Song

The page only can be marked as kmem when CONFIG_MEMCG_KMEM is enabled.
So move PageMemcgKmem() to the scope of the CONFIG_MEMCG_KMEM.

As a bonus, on !CONFIG_MEMCG_KMEM build some code can be compiled out.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 include/linux/memcontrol.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 07c449af9c0f..d3ca8c8e7fc3 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -469,6 +469,7 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
 	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 
+#ifdef CONFIG_MEMCG_KMEM
 /*
  * PageMemcgKmem - check if the page has MemcgKmem flag set
  * @page: a pointer to the page struct
@@ -483,7 +484,6 @@ static inline bool PageMemcgKmem(struct page *page)
 	return page->memcg_data & MEMCG_DATA_KMEM;
 }
 
-#ifdef CONFIG_MEMCG_KMEM
 /*
  * page_objcgs - get the object cgroups vector associated with a page
  * @page: a pointer to the page struct
@@ -544,6 +544,11 @@ static inline struct obj_cgroup *page_objcg(struct page *page)
 	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
 }
 #else
+static inline bool PageMemcgKmem(struct page *page)
+{
+	return false;
+}
+
 static inline struct obj_cgroup **page_objcgs(struct page *page)
 {
 	return NULL;
-- 
2.11.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM
  2021-03-09 10:07 ` [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Muchun Song
@ 2021-03-10 19:30   ` Roman Gushchin
  2021-03-12  3:26   ` Shakeel Butt
  2021-03-12 19:24   ` Johannes Weiner
  2 siblings, 0 replies; 28+ messages in thread
From: Roman Gushchin @ 2021-03-10 19:30 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Tue, Mar 09, 2021 at 06:07:17PM +0800, Muchun Song wrote:
> The page only can be marked as kmem when CONFIG_MEMCG_KMEM is enabled.
> So move PageMemcgKmem() to the scope of the CONFIG_MEMCG_KMEM.
> 
> As a bonus, on !CONFIG_MEMCG_KMEM build some code can be compiled out.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Nice!

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

> ---
>  include/linux/memcontrol.h | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 07c449af9c0f..d3ca8c8e7fc3 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -469,6 +469,7 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
>  	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
>  
> +#ifdef CONFIG_MEMCG_KMEM
>  /*
>   * PageMemcgKmem - check if the page has MemcgKmem flag set
>   * @page: a pointer to the page struct
> @@ -483,7 +484,6 @@ static inline bool PageMemcgKmem(struct page *page)
>  	return page->memcg_data & MEMCG_DATA_KMEM;
>  }
>  
> -#ifdef CONFIG_MEMCG_KMEM
>  /*
>   * page_objcgs - get the object cgroups vector associated with a page
>   * @page: a pointer to the page struct
> @@ -544,6 +544,11 @@ static inline struct obj_cgroup *page_objcg(struct page *page)
>  	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
>  #else
> +static inline bool PageMemcgKmem(struct page *page)
> +{
> +	return false;
> +}
> +
>  static inline struct obj_cgroup **page_objcgs(struct page *page)
>  {
>  	return NULL;
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-09 10:07 ` [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
@ 2021-03-10 19:53   ` Roman Gushchin
  2021-03-11  6:50     ` [External] " Muchun Song
  2021-03-10 22:05   ` Johannes Weiner
  2021-03-11 17:48   ` kernel test robot
  2 siblings, 1 reply; 28+ messages in thread
From: Roman Gushchin @ 2021-03-10 19:53 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote:
> Since Roman series "The new cgroup slab memory controller" applied. All
> slab objects are charged via the new APIs of obj_cgroup. The new APIs
> introduce a struct obj_cgroup to charge slab objects. It prevents
> long-living objects from pinning the original memory cgroup in the memory.
> But there are still some corner objects (e.g. allocations larger than
> order-1 page on SLUB) which are not charged via the new APIs. Those
> objects (include the pages which are allocated from buddy allocator
> directly) are charged as kmem pages which still hold a reference to
> the memory cgroup.
> 
> This patch aims to charge the kmem pages by using the new APIs of
> obj_cgroup. Finally, the page->memcg_data of the kmem page points to
> an object cgroup. We can use the page_objcg() to get the object
> cgroup associated with a kmem page. Or we can use page_memcg_check()
> to get the memory cgroup associated with a kmem page, but caller must
> ensure that the returned memcg won't be released (e.g. acquire the
> rcu_read_lock or css_set_lock).
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/memcontrol.h |  63 ++++++++++++++++++------
>  mm/memcontrol.c            | 119 ++++++++++++++++++++++++++++++---------------
>  2 files changed, 128 insertions(+), 54 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 83cbcdcfcc92..07c449af9c0f 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -370,6 +370,18 @@ static inline bool page_memcg_charged(struct page *page)
>  }
>  
>  /*
> + * After the initialization objcg->memcg is always pointing at
> + * a valid memcg, but can be atomically swapped to the parent memcg.
> + *
> + * The caller must ensure that the returned memcg won't be released:
> + * e.g. acquire the rcu_read_lock or css_set_lock.
> + */
> +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
> +{
> +	return READ_ONCE(objcg->memcg);
> +}
> +
> +/*
>   * page_memcg - get the memory cgroup associated with a non-kmem page
>   * @page: a pointer to the page struct
>   *
> @@ -422,15 +434,19 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
>   * @page: a pointer to the page struct
>   *
>   * Returns a pointer to the memory cgroup associated with the page,
> - * or NULL. This function unlike page_memcg() can take any  page
> + * or NULL. This function unlike page_memcg() can take any page
>   * as an argument. It has to be used in cases when it's not known if a page
> - * has an associated memory cgroup pointer or an object cgroups vector.
> + * has an associated memory cgroup pointer or an object cgroups vector or
> + * an object cgroup.
>   *
>   * Any of the following ensures page and memcg binding stability:
>   * - the page lock
>   * - LRU isolation
>   * - lock_page_memcg()
>   * - exclusive reference
> + *
> + * Should be called under rcu lock which can protect memcg associated with a
> + * kmem page from being released.

How about this:

For a non-kmem page any of the following ensures page and memcg binding stability:
- the page lock
- LRU isolation
- lock_page_memcg()
- exclusive reference

For a kmem page a caller should hold an rcu read lock to protect memcg associated
with a kmem page from being released.

>   */
>  static inline struct mem_cgroup *page_memcg_check(struct page *page)
>  {
> @@ -443,6 +459,13 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
>  	if (memcg_data & MEMCG_DATA_OBJCGS)
>  		return NULL;
>  
> +	if (memcg_data & MEMCG_DATA_KMEM) {
> +		struct obj_cgroup *objcg;
> +
> +		objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> +		return obj_cgroup_memcg(objcg);
> +	}
> +
>  	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
>  
> @@ -501,6 +524,25 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
>  	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
>  
> +/*
> + * page_objcg - get the object cgroup associated with a kmem page
> + * @page: a pointer to the page struct
> + *
> + * Returns a pointer to the object cgroup associated with the kmem page,
> + * or NULL. This function assumes that the page is known to have an
> + * associated object cgroup. It's only safe to call this function
> + * against kmem pages (PageMemcgKmem() returns true).
> + */
> +static inline struct obj_cgroup *page_objcg(struct page *page)
> +{
> +	unsigned long memcg_data = page->memcg_data;
> +
> +	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> +	VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
> +
> +	return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> +}
>  #else
>  static inline struct obj_cgroup **page_objcgs(struct page *page)
>  {
> @@ -511,6 +553,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
>  {
>  	return NULL;
>  }
> +
> +static inline struct obj_cgroup *page_objcg(struct page *page)
> +{
> +	return NULL;
> +}
>  #endif
>  
>  static __always_inline bool memcg_stat_item_in_bytes(int idx)
> @@ -729,18 +776,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
>  	percpu_ref_put(&objcg->refcnt);
>  }
>  
> -/*
> - * After the initialization objcg->memcg is always pointing at
> - * a valid memcg, but can be atomically swapped to the parent memcg.
> - *
> - * The caller must ensure that the returned memcg won't be released:
> - * e.g. acquire the rcu_read_lock or css_set_lock.
> - */
> -static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
> -{
> -	return READ_ONCE(objcg->memcg);
> -}
> -
>  static inline void mem_cgroup_put(struct mem_cgroup *memcg)
>  {
>  	if (memcg)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e1dc73ceb98a..38376f9d6659 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -859,15 +859,26 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
>  	pg_data_t *pgdat = page_pgdat(page);
>  	struct lruvec *lruvec;
>  
> -	memcg = page_memcg_check(head);
> -	/* Untracked pages have no memcg, no lruvec. Update only the node */
> -	if (!memcg) {
> -		__mod_node_page_state(pgdat, idx, val);
> -		return;
> +	if (PageMemcgKmem(head)) {
> +		rcu_read_lock();
> +		memcg = obj_cgroup_memcg(page_objcg(page));
> +	} else {
> +		memcg = page_memcg(head);
> +		/*
> +		 * Untracked pages have no memcg, no lruvec. Update only the
> +		 * node.
> +		 */
> +		if (!memcg) {
> +			__mod_node_page_state(pgdat, idx, val);
> +			return;
> +		}
>  	}
>  
>  	lruvec = mem_cgroup_lruvec(memcg, pgdat);
>  	__mod_lruvec_state(lruvec, idx, val);
> +
> +	if (PageMemcgKmem(head))
> +		rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(__mod_lruvec_page_state);
>  
> @@ -2906,6 +2917,20 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
>  	page->memcg_data = (unsigned long)memcg;
>  }
>  
> +static inline struct mem_cgroup *obj_cgroup_memcg_get(struct obj_cgroup *objcg)

I'd prefer get_obj_cgroup_memcg(), if you don't mind.

> +{
> +	struct mem_cgroup *memcg;
> +
> +	rcu_read_lock();
> +retry:
> +	memcg = obj_cgroup_memcg(objcg);
> +	if (unlikely(!css_tryget(&memcg->css)))
> +		goto retry;
> +	rcu_read_unlock();
> +
> +	return memcg;
> +}
> +
>  #ifdef CONFIG_MEMCG_KMEM
>  int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
>  				 gfp_t gfp, bool new_page)
> @@ -3071,15 +3096,8 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
>  	struct mem_cgroup *memcg;
>  	int ret;
>  
> -	rcu_read_lock();
> -retry:
> -	memcg = obj_cgroup_memcg(objcg);
> -	if (unlikely(!css_tryget(&memcg->css)))
> -		goto retry;
> -	rcu_read_unlock();
> -
> +	memcg = obj_cgroup_memcg_get(objcg);
>  	ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
> -
>  	css_put(&memcg->css);
>  
>  	return ret;
> @@ -3144,18 +3162,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
>   */
>  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
>  {
> -	struct mem_cgroup *memcg;
> +	struct obj_cgroup *objcg;
>  	int ret = 0;
>  
> -	memcg = get_mem_cgroup_from_current();
> -	if (memcg && !mem_cgroup_is_root(memcg)) {
> -		ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
> +	objcg = get_obj_cgroup_from_current();
> +	if (objcg) {
> +		ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
>  		if (!ret) {
> -			page->memcg_data = (unsigned long)memcg |
> +			page->memcg_data = (unsigned long)objcg |
>  				MEMCG_DATA_KMEM;
>  			return 0;
>  		}
> -		css_put(&memcg->css);
> +		obj_cgroup_put(objcg);
>  	}
>  	return ret;
>  }
> @@ -3167,17 +3185,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
>   */
>  void __memcg_kmem_uncharge_page(struct page *page, int order)
>  {
> -	struct mem_cgroup *memcg;
> +	struct obj_cgroup *objcg;
>  	unsigned int nr_pages = 1 << order;
>  
>  	if (!page_memcg_charged(page))
>  		return;
>  
> -	memcg = page_memcg_check(page);
> -	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
> -	__memcg_kmem_uncharge(memcg, nr_pages);
> +	objcg = page_objcg(page);
> +	obj_cgroup_uncharge_pages(objcg, nr_pages);
>  	page->memcg_data = 0;
> -	css_put(&memcg->css);
> +	obj_cgroup_put(objcg);
>  }
>  
>  static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
> @@ -6806,11 +6823,23 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug)
>  static void uncharge_batch(const struct uncharge_gather *ug)
>  {
>  	unsigned long flags;
> +	unsigned long nr_pages;
>  
> -	if (!mem_cgroup_is_root(ug->memcg)) {
> -		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
> +	/*
> +	 * The kmem pages can be reparented to the root memcg, in
> +	 * order to prevent the memory counter of root memcg from
> +	 * increasing indefinitely. We should decrease the memory
> +	 * counter when unchange.

I guess the correct syntax is
"The kmem pages can be reparented to the root memcg. In
order to prevent the memory counter of root memcg from
increasing indefinitely, we should decrease the memory
counter when unchange."

> +	 */
> +	if (mem_cgroup_is_root(ug->memcg))
> +		nr_pages = ug->nr_kmem;
> +	else
> +		nr_pages = ug->nr_pages;
> +
> +	if (nr_pages) {
> +		page_counter_uncharge(&ug->memcg->memory, nr_pages);
>  		if (do_memsw_account())
> -			page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
> +			page_counter_uncharge(&ug->memcg->memsw, nr_pages);
>  		if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
>  			page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
>  		memcg_oom_recover(ug->memcg);
> @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>  
>  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  {
> -	unsigned long nr_pages;
> +	unsigned long nr_pages, nr_kmem;
>  	struct mem_cgroup *memcg;
>  
>  	VM_BUG_ON_PAGE(PageLRU(page), page);
> @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  	if (!page_memcg_charged(page))
>  		return;
>  
> +	nr_pages = compound_nr(page);
>  	/*
>  	 * Nobody should be changing or seriously looking at
> -	 * page memcg at this point, we have fully exclusive
> -	 * access to the page.
> +	 * page memcg or objcg at this point, we have fully
> +	 * exclusive access to the page.
>  	 */
> -	memcg = page_memcg_check(page);
> +	if (PageMemcgKmem(page)) {
> +		struct obj_cgroup *objcg;
> +
> +		objcg = page_objcg(page);
> +		memcg = obj_cgroup_memcg_get(objcg);
> +
> +		page->memcg_data = 0;
> +		obj_cgroup_put(objcg);
> +		nr_kmem = nr_pages;
> +	} else {
> +		memcg = page_memcg(page);
> +		page->memcg_data = 0;
> +		nr_kmem = 0;
> +	}
> +
>  	if (ug->memcg != memcg) {
>  		if (ug->memcg) {
>  			uncharge_batch(ug);
>  			uncharge_gather_clear(ug);
>  		}
>  		ug->memcg = memcg;
> +		ug->dummy_page = page;
>  
>  		/* pairs with css_put in uncharge_batch */
>  		css_get(&ug->memcg->css);
>  	}
>  
> -	nr_pages = compound_nr(page);
>  	ug->nr_pages += nr_pages;
> +	ug->nr_kmem += nr_kmem;
> +	ug->pgpgout += !nr_kmem;
>  
> -	if (PageMemcgKmem(page))
> -		ug->nr_kmem += nr_pages;
> -	else
> -		ug->pgpgout++;
> -
> -	ug->dummy_page = page;
> -	page->memcg_data = 0;
> -	css_put(&ug->memcg->css);
> +	css_put(&memcg->css);
>  }
>  
>  /**
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-09 10:07 ` [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
@ 2021-03-10 19:57   ` Roman Gushchin
  2021-03-11  6:45     ` [External] " Muchun Song
  2021-03-11 13:12   ` Johannes Weiner
  2021-03-12  3:22   ` Shakeel Butt
  2 siblings, 1 reply; 28+ messages in thread
From: Roman Gushchin @ 2021-03-10 19:57 UTC (permalink / raw)
  To: Muchun Song
  Cc: hannes, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Tue, Mar 09, 2021 at 06:07:15PM +0800, Muchun Song wrote:
> We want to reuse the obj_cgroup APIs to charge the kmem pages.
> If we do that, we should store an object cgroup pointer to
> page->memcg_data for the kmem pages.
> 
> Finally, page->memcg_data can have 3 different meanings.
> 
>   1) For the slab pages, page->memcg_data points to an object cgroups
>      vector.
> 
>   2) For the kmem pages (exclude the slab pages), page->memcg_data
>      points to an object cgroup.
> 
>   3) For the user pages (e.g. the LRU pages), page->memcg_data points
>      to a memory cgroup.
> 
> Currently we always get the memory cgroup associated with a page via
> page_memcg() or page_memcg_rcu(). page_memcg_check() is special, it
> has to be used in cases when it's not known if a page has an
> associated memory cgroup pointer or an object cgroups vector. Because
> the page->memcg_data of the kmem page is not pointing to a memory
> cgroup in the later patch, the page_memcg() and page_memcg_rcu()
> cannot be applicable for the kmem pages. In this patch, make
> page_memcg() and page_memcg_rcu() no longer apply to the kmem pages.
> We do not change the behavior of the page_memcg_check(), it is also
> applicable for the kmem pages.
> 
> In the end, there are 3 helpers to get the memcg associated with a page.
> Usage is as follows.
> 
>   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
>      pages).
> 
>      - page_memcg()
>      - page_memcg_rcu()
> 
>   2) Get the memory cgroup associated with a page. It has to be used in
>      cases when it's not known if a page has an associated memory cgroup
>      pointer or an object cgroups vector. Returns NULL for slab pages or
>      uncharged pages. Otherwise, returns memory cgroup for charged pages
>      (e.g. the kmem pages, the LRU pages).
> 
>      - page_memcg_check()
> 
> In some place, we use page_memcg() to check whether the page is charged.
> Now introduce page_memcg_charged() helper to do that.
> 
> This is a preparation for reparenting the kmem pages.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  include/linux/memcontrol.h | 33 +++++++++++++++++++++++++++------
>  mm/memcontrol.c            | 23 +++++++++++++----------
>  mm/page_alloc.c            |  4 ++--
>  3 files changed, 42 insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e6dc793d587d..83cbcdcfcc92 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -358,14 +358,26 @@ enum page_memcg_data_flags {
>  
>  #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
>  
> +/* Return true for charged page, otherwise false. */
> +static inline bool page_memcg_charged(struct page *page)
> +{
> +	unsigned long memcg_data = page->memcg_data;
> +
> +	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> +
> +	return !!memcg_data;
> +}
> +
>  /*
> - * page_memcg - get the memory cgroup associated with a page
> + * page_memcg - get the memory cgroup associated with a non-kmem page
>   * @page: a pointer to the page struct
>   *
>   * Returns a pointer to the memory cgroup associated with the page,
>   * or NULL. This function assumes that the page is known to have a
>   * proper memory cgroup pointer. It's not safe to call this function
> - * against some type of pages, e.g. slab pages or ex-slab pages.
> + * against some type of pages, e.g. slab pages, kmem pages or ex-slab
> + * pages.
>   *
>   * Any of the following ensures page and memcg binding stability:
>   * - the page lock
> @@ -378,27 +390,31 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
>  	unsigned long memcg_data = page->memcg_data;
>  
>  	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
>  	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
>  
>  	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
>  
>  /*
> - * page_memcg_rcu - locklessly get the memory cgroup associated with a page
> + * page_memcg_rcu - locklessly get the memory cgroup associated with a non-kmem page
>   * @page: a pointer to the page struct
>   *
>   * Returns a pointer to the memory cgroup associated with the page,
>   * or NULL. This function assumes that the page is known to have a
>   * proper memory cgroup pointer. It's not safe to call this function
> - * against some type of pages, e.g. slab pages or ex-slab pages.
> + * against some type of pages, e.g. slab pages, kmem pages or ex-slab
> + * pages.
>   */
>  static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
>  {
> +	unsigned long memcg_data = READ_ONCE(page->memcg_data);
> +
>  	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
>  	WARN_ON_ONCE(!rcu_read_lock_held());
>  
> -	return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
> -				     ~MEMCG_DATA_FLAGS_MASK);
> +	return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }
>  
>  /*
> @@ -1072,6 +1088,11 @@ void mem_cgroup_split_huge_fixup(struct page *head);
>  
>  struct mem_cgroup;
>  
> +static inline bool page_memcg_charged(struct page *page)
> +{
> +	return false;
> +}
> +
>  static inline struct mem_cgroup *page_memcg(struct page *page)
>  {
>  	return NULL;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index fc22da9805fb..e1dc73ceb98a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
>  			     int val)
>  {
>  	struct page *head = compound_head(page); /* rmap on tail pages */
> -	struct mem_cgroup *memcg = page_memcg(head);
> +	struct mem_cgroup *memcg;
>  	pg_data_t *pgdat = page_pgdat(page);
>  	struct lruvec *lruvec;
>  
> +	memcg = page_memcg_check(head);

In general, this and the next patch look good to me (aside from some small things,
commented separately).

But I wonder if it's better to have two separate versions of __mod_lruvec_page_state()
for kmem and non-kmem pages, rather then rely on PageMemcgKmem flag. It's a hot path,
so if we can have fewer conditions here, that would be nice.
I take a brief look (and could be wrong), but it seems like we know in advance
which version should be used.

Thanks!

>  	/* Untracked pages have no memcg, no lruvec. Update only the node */
>  	if (!memcg) {
>  		__mod_node_page_state(pgdat, idx, val);
> @@ -3166,12 +3167,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
>   */
>  void __memcg_kmem_uncharge_page(struct page *page, int order)
>  {
> -	struct mem_cgroup *memcg = page_memcg(page);
> +	struct mem_cgroup *memcg;
>  	unsigned int nr_pages = 1 << order;
>  
> -	if (!memcg)
> +	if (!page_memcg_charged(page))
>  		return;
>  
> +	memcg = page_memcg_check(page);
>  	VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
>  	__memcg_kmem_uncharge(memcg, nr_pages);
>  	page->memcg_data = 0;
> @@ -6827,24 +6829,25 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  {
>  	unsigned long nr_pages;
> +	struct mem_cgroup *memcg;
>  
>  	VM_BUG_ON_PAGE(PageLRU(page), page);
>  
> -	if (!page_memcg(page))
> +	if (!page_memcg_charged(page))
>  		return;
>  
>  	/*
>  	 * Nobody should be changing or seriously looking at
> -	 * page_memcg(page) at this point, we have fully
> -	 * exclusive access to the page.
> +	 * page memcg at this point, we have fully exclusive
> +	 * access to the page.
>  	 */
> -
> -	if (ug->memcg != page_memcg(page)) {
> +	memcg = page_memcg_check(page);
> +	if (ug->memcg != memcg) {
>  		if (ug->memcg) {
>  			uncharge_batch(ug);
>  			uncharge_gather_clear(ug);
>  		}
> -		ug->memcg = page_memcg(page);
> +		ug->memcg = memcg;
>  
>  		/* pairs with css_put in uncharge_batch */
>  		css_get(&ug->memcg->css);
> @@ -6877,7 +6880,7 @@ void mem_cgroup_uncharge(struct page *page)
>  		return;
>  
>  	/* Don't touch page->lru of any random page, pre-check: */
> -	if (!page_memcg(page))
> +	if (!page_memcg_charged(page))
>  		return;
>  
>  	uncharge_gather_clear(&ug);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f10966e3b4a5..bcb58ae15e24 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1124,7 +1124,7 @@ static inline bool page_expected_state(struct page *page,
>  	if (unlikely((unsigned long)page->mapping |
>  			page_ref_count(page) |
>  #ifdef CONFIG_MEMCG
> -			(unsigned long)page_memcg(page) |
> +			page_memcg_charged(page) |
>  #endif
>  			(page->flags & check_flags)))
>  		return false;
> @@ -1149,7 +1149,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>  			bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
>  	}
>  #ifdef CONFIG_MEMCG
> -	if (unlikely(page_memcg(page)))
> +	if (unlikely(page_memcg_charged(page)))
>  		bad_reason = "page still charged to cgroup";
>  #endif
>  	return bad_reason;
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-09 10:07 ` [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
  2021-03-10 19:53   ` Roman Gushchin
@ 2021-03-10 22:05   ` Johannes Weiner
  2021-03-12  9:22     ` [External] " Muchun Song
  2021-03-11 17:48   ` kernel test robot
  2 siblings, 1 reply; 28+ messages in thread
From: Johannes Weiner @ 2021-03-10 22:05 UTC (permalink / raw)
  To: Muchun Song
  Cc: guro, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

Hello Munchun,

On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote:
> @@ -6806,11 +6823,23 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug)
>  static void uncharge_batch(const struct uncharge_gather *ug)
>  {
>  	unsigned long flags;
> +	unsigned long nr_pages;
>  
> -	if (!mem_cgroup_is_root(ug->memcg)) {
> -		page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
> +	/*
> +	 * The kmem pages can be reparented to the root memcg, in
> +	 * order to prevent the memory counter of root memcg from
> +	 * increasing indefinitely. We should decrease the memory
> +	 * counter when unchange.
> +	 */
> +	if (mem_cgroup_is_root(ug->memcg))
> +		nr_pages = ug->nr_kmem;
> +	else
> +		nr_pages = ug->nr_pages;

Correct or not, I find this unreadable. We're uncharging nr_kmem on
the root, and nr_pages against leaf groups?

It implies several things that might not be immediately obvious to the
reader of this function. Namely, that nr_kmem is a subset of nr_pages.
Or that we don't *want* to account LRU pages for the root cgroup.

The old code followed a very simple pattern: the root memcg's page
counters aren't touched.

This is no longer true: we modify them depending on very specific
circumstances. But that's too clever for the stupid uncharge_batch()
which is only supposed to flush a number of accumulators into their
corresponding page counters.

This distinction really needs to be moved down to uncharge_page() now.

> @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>  
>  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  {
> -	unsigned long nr_pages;
> +	unsigned long nr_pages, nr_kmem;
>  	struct mem_cgroup *memcg;
>  
>  	VM_BUG_ON_PAGE(PageLRU(page), page);
> @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  	if (!page_memcg_charged(page))
>  		return;
>  
> +	nr_pages = compound_nr(page);
>  	/*
>  	 * Nobody should be changing or seriously looking at
> -	 * page memcg at this point, we have fully exclusive
> -	 * access to the page.
> +	 * page memcg or objcg at this point, we have fully
> +	 * exclusive access to the page.
>  	 */
> -	memcg = page_memcg_check(page);
> +	if (PageMemcgKmem(page)) {
> +		struct obj_cgroup *objcg;
> +
> +		objcg = page_objcg(page);
> +		memcg = obj_cgroup_memcg_get(objcg);
> +
> +		page->memcg_data = 0;
> +		obj_cgroup_put(objcg);
> +		nr_kmem = nr_pages;
> +	} else {
> +		memcg = page_memcg(page);
> +		page->memcg_data = 0;
> +		nr_kmem = 0;
> +	}

Why is all this moved above the uncharge_batch() call?

It separates the pointer manipulations from the refcounting, which
makes the code very difficult to follow.

> +
>  	if (ug->memcg != memcg) {
>  		if (ug->memcg) {
>  			uncharge_batch(ug);
>  			uncharge_gather_clear(ug);
>  		}
>  		ug->memcg = memcg;
> +		ug->dummy_page = page;

Why this change?

>  		/* pairs with css_put in uncharge_batch */
>  		css_get(&ug->memcg->css);
>  	}
>  
> -	nr_pages = compound_nr(page);
>  	ug->nr_pages += nr_pages;
> +	ug->nr_kmem += nr_kmem;
> +	ug->pgpgout += !nr_kmem;

Oof.

Yes, this pgpgout line is an equivalent transformation for counting
LRU compound pages. But unless you already know that, it's completely
impossible to understand what the intent here is.

Please avoid clever tricks like this. If you need to check whether the
page is kmem, test PageMemcgKmem() instead of abusing the counters as
boolean flags. This is supposed to be read by human beings, too.

> -	if (PageMemcgKmem(page))
> -		ug->nr_kmem += nr_pages;
> -	else
> -		ug->pgpgout++;
> -
> -	ug->dummy_page = page;
> -	page->memcg_data = 0;
> -	css_put(&ug->memcg->css);
> +	css_put(&memcg->css);

Sorry, these two functions are no longer readable after your changes.

Please retain the following sequence as discrete steps:

1. look up memcg from the page
2. flush existing batch if memcg changed
3. add page's various counts to the current batch
4. clear page->memcg and decrease the referece count to whatever it was pointing to

And as per above, step 3. is where we should check whether to uncharge
the memcg's page counter at all:

	if (PageMemcgKmem(page)) {
		ug->nr_pages += nr_pages;
		ug->nr_kmem += nr_pages;
	} else {
		/* LRU pages aren't accounted at the root level */
		if (!mem_cgroup_is_root(memcg))
			ug->nr_pages += nr_pages;
		ug->pgpgout++;
	}

In fact, it might be a good idea to rename ug->nr_pages to
ug->nr_memory to highlight how it maps to the page_counter.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-10 19:57   ` Roman Gushchin
@ 2021-03-11  6:45     ` Muchun Song
  0 siblings, 0 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-11  6:45 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Thu, Mar 11, 2021 at 3:58 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Tue, Mar 09, 2021 at 06:07:15PM +0800, Muchun Song wrote:
> > We want to reuse the obj_cgroup APIs to charge the kmem pages.
> > If we do that, we should store an object cgroup pointer to
> > page->memcg_data for the kmem pages.
> >
> > Finally, page->memcg_data can have 3 different meanings.
> >
> >   1) For the slab pages, page->memcg_data points to an object cgroups
> >      vector.
> >
> >   2) For the kmem pages (exclude the slab pages), page->memcg_data
> >      points to an object cgroup.
> >
> >   3) For the user pages (e.g. the LRU pages), page->memcg_data points
> >      to a memory cgroup.
> >
> > Currently we always get the memory cgroup associated with a page via
> > page_memcg() or page_memcg_rcu(). page_memcg_check() is special, it
> > has to be used in cases when it's not known if a page has an
> > associated memory cgroup pointer or an object cgroups vector. Because
> > the page->memcg_data of the kmem page is not pointing to a memory
> > cgroup in the later patch, the page_memcg() and page_memcg_rcu()
> > cannot be applicable for the kmem pages. In this patch, make
> > page_memcg() and page_memcg_rcu() no longer apply to the kmem pages.
> > We do not change the behavior of the page_memcg_check(), it is also
> > applicable for the kmem pages.
> >
> > In the end, there are 3 helpers to get the memcg associated with a page.
> > Usage is as follows.
> >
> >   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
> >      pages).
> >
> >      - page_memcg()
> >      - page_memcg_rcu()
> >
> >   2) Get the memory cgroup associated with a page. It has to be used in
> >      cases when it's not known if a page has an associated memory cgroup
> >      pointer or an object cgroups vector. Returns NULL for slab pages or
> >      uncharged pages. Otherwise, returns memory cgroup for charged pages
> >      (e.g. the kmem pages, the LRU pages).
> >
> >      - page_memcg_check()
> >
> > In some place, we use page_memcg() to check whether the page is charged.
> > Now introduce page_memcg_charged() helper to do that.
> >
> > This is a preparation for reparenting the kmem pages.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/memcontrol.h | 33 +++++++++++++++++++++++++++------
> >  mm/memcontrol.c            | 23 +++++++++++++----------
> >  mm/page_alloc.c            |  4 ++--
> >  3 files changed, 42 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index e6dc793d587d..83cbcdcfcc92 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -358,14 +358,26 @@ enum page_memcg_data_flags {
> >
> >  #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
> >
> > +/* Return true for charged page, otherwise false. */
> > +static inline bool page_memcg_charged(struct page *page)
> > +{
> > +     unsigned long memcg_data = page->memcg_data;
> > +
> > +     VM_BUG_ON_PAGE(PageSlab(page), page);
> > +     VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> > +
> > +     return !!memcg_data;
> > +}
> > +
> >  /*
> > - * page_memcg - get the memory cgroup associated with a page
> > + * page_memcg - get the memory cgroup associated with a non-kmem page
> >   * @page: a pointer to the page struct
> >   *
> >   * Returns a pointer to the memory cgroup associated with the page,
> >   * or NULL. This function assumes that the page is known to have a
> >   * proper memory cgroup pointer. It's not safe to call this function
> > - * against some type of pages, e.g. slab pages or ex-slab pages.
> > + * against some type of pages, e.g. slab pages, kmem pages or ex-slab
> > + * pages.
> >   *
> >   * Any of the following ensures page and memcg binding stability:
> >   * - the page lock
> > @@ -378,27 +390,31 @@ static inline struct mem_cgroup *page_memcg(struct page *page)
> >       unsigned long memcg_data = page->memcg_data;
> >
> >       VM_BUG_ON_PAGE(PageSlab(page), page);
> > +     VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
> >       VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> >
> >       return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> >  }
> >
> >  /*
> > - * page_memcg_rcu - locklessly get the memory cgroup associated with a page
> > + * page_memcg_rcu - locklessly get the memory cgroup associated with a non-kmem page
> >   * @page: a pointer to the page struct
> >   *
> >   * Returns a pointer to the memory cgroup associated with the page,
> >   * or NULL. This function assumes that the page is known to have a
> >   * proper memory cgroup pointer. It's not safe to call this function
> > - * against some type of pages, e.g. slab pages or ex-slab pages.
> > + * against some type of pages, e.g. slab pages, kmem pages or ex-slab
> > + * pages.
> >   */
> >  static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
> >  {
> > +     unsigned long memcg_data = READ_ONCE(page->memcg_data);
> > +
> >       VM_BUG_ON_PAGE(PageSlab(page), page);
> > +     VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
> >       WARN_ON_ONCE(!rcu_read_lock_held());
> >
> > -     return (struct mem_cgroup *)(READ_ONCE(page->memcg_data) &
> > -                                  ~MEMCG_DATA_FLAGS_MASK);
> > +     return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> >  }
> >
> >  /*
> > @@ -1072,6 +1088,11 @@ void mem_cgroup_split_huge_fixup(struct page *head);
> >
> >  struct mem_cgroup;
> >
> > +static inline bool page_memcg_charged(struct page *page)
> > +{
> > +     return false;
> > +}
> > +
> >  static inline struct mem_cgroup *page_memcg(struct page *page)
> >  {
> >       return NULL;
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index fc22da9805fb..e1dc73ceb98a 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
> >                            int val)
> >  {
> >       struct page *head = compound_head(page); /* rmap on tail pages */
> > -     struct mem_cgroup *memcg = page_memcg(head);
> > +     struct mem_cgroup *memcg;
> >       pg_data_t *pgdat = page_pgdat(page);
> >       struct lruvec *lruvec;
> >
> > +     memcg = page_memcg_check(head);
>
> In general, this and the next patch look good to me (aside from some small things,
> commented separately).
>
> But I wonder if it's better to have two separate versions of __mod_lruvec_page_state()
> for kmem and non-kmem pages, rather then rely on PageMemcgKmem flag. It's a hot path,
> so if we can have fewer conditions here, that would be nice.
> I take a brief look (and could be wrong), but it seems like we know in advance
> which version should be used.

Right. The user should know whether the page is kmem.
IIUC,  __mod_lruvec_kmem_state is the version of the
kmem. It is enough to reuse it. And make __mod_lruvec_page_state()
only suitable for non-kmem page.

>
> Thanks!
>
> >       /* Untracked pages have no memcg, no lruvec. Update only the node */
> >       if (!memcg) {
> >               __mod_node_page_state(pgdat, idx, val);
> > @@ -3166,12 +3167,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
> >   */
> >  void __memcg_kmem_uncharge_page(struct page *page, int order)
> >  {
> > -     struct mem_cgroup *memcg = page_memcg(page);
> > +     struct mem_cgroup *memcg;
> >       unsigned int nr_pages = 1 << order;
> >
> > -     if (!memcg)
> > +     if (!page_memcg_charged(page))
> >               return;
> >
> > +     memcg = page_memcg_check(page);
> >       VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
> >       __memcg_kmem_uncharge(memcg, nr_pages);
> >       page->memcg_data = 0;
> > @@ -6827,24 +6829,25 @@ static void uncharge_batch(const struct uncharge_gather *ug)
> >  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> >  {
> >       unsigned long nr_pages;
> > +     struct mem_cgroup *memcg;
> >
> >       VM_BUG_ON_PAGE(PageLRU(page), page);
> >
> > -     if (!page_memcg(page))
> > +     if (!page_memcg_charged(page))
> >               return;
> >
> >       /*
> >        * Nobody should be changing or seriously looking at
> > -      * page_memcg(page) at this point, we have fully
> > -      * exclusive access to the page.
> > +      * page memcg at this point, we have fully exclusive
> > +      * access to the page.
> >        */
> > -
> > -     if (ug->memcg != page_memcg(page)) {
> > +     memcg = page_memcg_check(page);
> > +     if (ug->memcg != memcg) {
> >               if (ug->memcg) {
> >                       uncharge_batch(ug);
> >                       uncharge_gather_clear(ug);
> >               }
> > -             ug->memcg = page_memcg(page);
> > +             ug->memcg = memcg;
> >
> >               /* pairs with css_put in uncharge_batch */
> >               css_get(&ug->memcg->css);
> > @@ -6877,7 +6880,7 @@ void mem_cgroup_uncharge(struct page *page)
> >               return;
> >
> >       /* Don't touch page->lru of any random page, pre-check: */
> > -     if (!page_memcg(page))
> > +     if (!page_memcg_charged(page))
> >               return;
> >
> >       uncharge_gather_clear(&ug);
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f10966e3b4a5..bcb58ae15e24 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1124,7 +1124,7 @@ static inline bool page_expected_state(struct page *page,
> >       if (unlikely((unsigned long)page->mapping |
> >                       page_ref_count(page) |
> >  #ifdef CONFIG_MEMCG
> > -                     (unsigned long)page_memcg(page) |
> > +                     page_memcg_charged(page) |
> >  #endif
> >                       (page->flags & check_flags)))
> >               return false;
> > @@ -1149,7 +1149,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
> >                       bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
> >       }
> >  #ifdef CONFIG_MEMCG
> > -     if (unlikely(page_memcg(page)))
> > +     if (unlikely(page_memcg_charged(page)))
> >               bad_reason = "page still charged to cgroup";
> >  #endif
> >       return bad_reason;
> > --
> > 2.11.0
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-10 19:53   ` Roman Gushchin
@ 2021-03-11  6:50     ` Muchun Song
  0 siblings, 0 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-11  6:50 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Thu, Mar 11, 2021 at 3:53 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote:
> > Since Roman series "The new cgroup slab memory controller" applied. All
> > slab objects are charged via the new APIs of obj_cgroup. The new APIs
> > introduce a struct obj_cgroup to charge slab objects. It prevents
> > long-living objects from pinning the original memory cgroup in the memory.
> > But there are still some corner objects (e.g. allocations larger than
> > order-1 page on SLUB) which are not charged via the new APIs. Those
> > objects (include the pages which are allocated from buddy allocator
> > directly) are charged as kmem pages which still hold a reference to
> > the memory cgroup.
> >
> > This patch aims to charge the kmem pages by using the new APIs of
> > obj_cgroup. Finally, the page->memcg_data of the kmem page points to
> > an object cgroup. We can use the page_objcg() to get the object
> > cgroup associated with a kmem page. Or we can use page_memcg_check()
> > to get the memory cgroup associated with a kmem page, but caller must
> > ensure that the returned memcg won't be released (e.g. acquire the
> > rcu_read_lock or css_set_lock).
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  include/linux/memcontrol.h |  63 ++++++++++++++++++------
> >  mm/memcontrol.c            | 119 ++++++++++++++++++++++++++++++---------------
> >  2 files changed, 128 insertions(+), 54 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 83cbcdcfcc92..07c449af9c0f 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -370,6 +370,18 @@ static inline bool page_memcg_charged(struct page *page)
> >  }
> >
> >  /*
> > + * After the initialization objcg->memcg is always pointing at
> > + * a valid memcg, but can be atomically swapped to the parent memcg.
> > + *
> > + * The caller must ensure that the returned memcg won't be released:
> > + * e.g. acquire the rcu_read_lock or css_set_lock.
> > + */
> > +static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
> > +{
> > +     return READ_ONCE(objcg->memcg);
> > +}
> > +
> > +/*
> >   * page_memcg - get the memory cgroup associated with a non-kmem page
> >   * @page: a pointer to the page struct
> >   *
> > @@ -422,15 +434,19 @@ static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
> >   * @page: a pointer to the page struct
> >   *
> >   * Returns a pointer to the memory cgroup associated with the page,
> > - * or NULL. This function unlike page_memcg() can take any  page
> > + * or NULL. This function unlike page_memcg() can take any page
> >   * as an argument. It has to be used in cases when it's not known if a page
> > - * has an associated memory cgroup pointer or an object cgroups vector.
> > + * has an associated memory cgroup pointer or an object cgroups vector or
> > + * an object cgroup.
> >   *
> >   * Any of the following ensures page and memcg binding stability:
> >   * - the page lock
> >   * - LRU isolation
> >   * - lock_page_memcg()
> >   * - exclusive reference
> > + *
> > + * Should be called under rcu lock which can protect memcg associated with a
> > + * kmem page from being released.
>
> How about this:
>
> For a non-kmem page any of the following ensures page and memcg binding stability:
> - the page lock
> - LRU isolation
> - lock_page_memcg()
> - exclusive reference
>
> For a kmem page a caller should hold an rcu read lock to protect memcg associated
> with a kmem page from being released.

OK. I will use this. Thanks Roman.

>
> >   */
> >  static inline struct mem_cgroup *page_memcg_check(struct page *page)
> >  {
> > @@ -443,6 +459,13 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page)
> >       if (memcg_data & MEMCG_DATA_OBJCGS)
> >               return NULL;
> >
> > +     if (memcg_data & MEMCG_DATA_KMEM) {
> > +             struct obj_cgroup *objcg;
> > +
> > +             objcg = (void *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> > +             return obj_cgroup_memcg(objcg);
> > +     }
> > +
> >       return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> >  }
> >
> > @@ -501,6 +524,25 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
> >       return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> >  }
> >
> > +/*
> > + * page_objcg - get the object cgroup associated with a kmem page
> > + * @page: a pointer to the page struct
> > + *
> > + * Returns a pointer to the object cgroup associated with the kmem page,
> > + * or NULL. This function assumes that the page is known to have an
> > + * associated object cgroup. It's only safe to call this function
> > + * against kmem pages (PageMemcgKmem() returns true).
> > + */
> > +static inline struct obj_cgroup *page_objcg(struct page *page)
> > +{
> > +     unsigned long memcg_data = page->memcg_data;
> > +
> > +     VM_BUG_ON_PAGE(PageSlab(page), page);
> > +     VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> > +     VM_BUG_ON_PAGE(!(memcg_data & MEMCG_DATA_KMEM), page);
> > +
> > +     return (struct obj_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
> > +}
> >  #else
> >  static inline struct obj_cgroup **page_objcgs(struct page *page)
> >  {
> > @@ -511,6 +553,11 @@ static inline struct obj_cgroup **page_objcgs_check(struct page *page)
> >  {
> >       return NULL;
> >  }
> > +
> > +static inline struct obj_cgroup *page_objcg(struct page *page)
> > +{
> > +     return NULL;
> > +}
> >  #endif
> >
> >  static __always_inline bool memcg_stat_item_in_bytes(int idx)
> > @@ -729,18 +776,6 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
> >       percpu_ref_put(&objcg->refcnt);
> >  }
> >
> > -/*
> > - * After the initialization objcg->memcg is always pointing at
> > - * a valid memcg, but can be atomically swapped to the parent memcg.
> > - *
> > - * The caller must ensure that the returned memcg won't be released:
> > - * e.g. acquire the rcu_read_lock or css_set_lock.
> > - */
> > -static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
> > -{
> > -     return READ_ONCE(objcg->memcg);
> > -}
> > -
> >  static inline void mem_cgroup_put(struct mem_cgroup *memcg)
> >  {
> >       if (memcg)
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index e1dc73ceb98a..38376f9d6659 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -859,15 +859,26 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
> >       pg_data_t *pgdat = page_pgdat(page);
> >       struct lruvec *lruvec;
> >
> > -     memcg = page_memcg_check(head);
> > -     /* Untracked pages have no memcg, no lruvec. Update only the node */
> > -     if (!memcg) {
> > -             __mod_node_page_state(pgdat, idx, val);
> > -             return;
> > +     if (PageMemcgKmem(head)) {
> > +             rcu_read_lock();
> > +             memcg = obj_cgroup_memcg(page_objcg(page));
> > +     } else {
> > +             memcg = page_memcg(head);
> > +             /*
> > +              * Untracked pages have no memcg, no lruvec. Update only the
> > +              * node.
> > +              */
> > +             if (!memcg) {
> > +                     __mod_node_page_state(pgdat, idx, val);
> > +                     return;
> > +             }
> >       }
> >
> >       lruvec = mem_cgroup_lruvec(memcg, pgdat);
> >       __mod_lruvec_state(lruvec, idx, val);
> > +
> > +     if (PageMemcgKmem(head))
> > +             rcu_read_unlock();
> >  }
> >  EXPORT_SYMBOL(__mod_lruvec_page_state);
> >
> > @@ -2906,6 +2917,20 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
> >       page->memcg_data = (unsigned long)memcg;
> >  }
> >
> > +static inline struct mem_cgroup *obj_cgroup_memcg_get(struct obj_cgroup *objcg)
>
> I'd prefer get_obj_cgroup_memcg(), if you don't mind.

LGTM, will do.

>
> > +{
> > +     struct mem_cgroup *memcg;
> > +
> > +     rcu_read_lock();
> > +retry:
> > +     memcg = obj_cgroup_memcg(objcg);
> > +     if (unlikely(!css_tryget(&memcg->css)))
> > +             goto retry;
> > +     rcu_read_unlock();
> > +
> > +     return memcg;
> > +}
> > +
> >  #ifdef CONFIG_MEMCG_KMEM
> >  int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
> >                                gfp_t gfp, bool new_page)
> > @@ -3071,15 +3096,8 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> >       struct mem_cgroup *memcg;
> >       int ret;
> >
> > -     rcu_read_lock();
> > -retry:
> > -     memcg = obj_cgroup_memcg(objcg);
> > -     if (unlikely(!css_tryget(&memcg->css)))
> > -             goto retry;
> > -     rcu_read_unlock();
> > -
> > +     memcg = obj_cgroup_memcg_get(objcg);
> >       ret = __memcg_kmem_charge(memcg, gfp, nr_pages);
> > -
> >       css_put(&memcg->css);
> >
> >       return ret;
> > @@ -3144,18 +3162,18 @@ static void __memcg_kmem_uncharge(struct mem_cgroup *memcg, unsigned int nr_page
> >   */
> >  int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
> >  {
> > -     struct mem_cgroup *memcg;
> > +     struct obj_cgroup *objcg;
> >       int ret = 0;
> >
> > -     memcg = get_mem_cgroup_from_current();
> > -     if (memcg && !mem_cgroup_is_root(memcg)) {
> > -             ret = __memcg_kmem_charge(memcg, gfp, 1 << order);
> > +     objcg = get_obj_cgroup_from_current();
> > +     if (objcg) {
> > +             ret = obj_cgroup_charge_pages(objcg, gfp, 1 << order);
> >               if (!ret) {
> > -                     page->memcg_data = (unsigned long)memcg |
> > +                     page->memcg_data = (unsigned long)objcg |
> >                               MEMCG_DATA_KMEM;
> >                       return 0;
> >               }
> > -             css_put(&memcg->css);
> > +             obj_cgroup_put(objcg);
> >       }
> >       return ret;
> >  }
> > @@ -3167,17 +3185,16 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
> >   */
> >  void __memcg_kmem_uncharge_page(struct page *page, int order)
> >  {
> > -     struct mem_cgroup *memcg;
> > +     struct obj_cgroup *objcg;
> >       unsigned int nr_pages = 1 << order;
> >
> >       if (!page_memcg_charged(page))
> >               return;
> >
> > -     memcg = page_memcg_check(page);
> > -     VM_BUG_ON_PAGE(mem_cgroup_is_root(memcg), page);
> > -     __memcg_kmem_uncharge(memcg, nr_pages);
> > +     objcg = page_objcg(page);
> > +     obj_cgroup_uncharge_pages(objcg, nr_pages);
> >       page->memcg_data = 0;
> > -     css_put(&memcg->css);
> > +     obj_cgroup_put(objcg);
> >  }
> >
> >  static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
> > @@ -6806,11 +6823,23 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug)
> >  static void uncharge_batch(const struct uncharge_gather *ug)
> >  {
> >       unsigned long flags;
> > +     unsigned long nr_pages;
> >
> > -     if (!mem_cgroup_is_root(ug->memcg)) {
> > -             page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
> > +     /*
> > +      * The kmem pages can be reparented to the root memcg, in
> > +      * order to prevent the memory counter of root memcg from
> > +      * increasing indefinitely. We should decrease the memory
> > +      * counter when unchange.
>
> I guess the correct syntax is
> "The kmem pages can be reparented to the root memcg. In
> order to prevent the memory counter of root memcg from
> increasing indefinitely, we should decrease the memory
> counter when unchange."

Right. I will combine your and Johannes suggestions about
how to rework the code here.

>
> > +      */
> > +     if (mem_cgroup_is_root(ug->memcg))
> > +             nr_pages = ug->nr_kmem;
> > +     else
> > +             nr_pages = ug->nr_pages;
> > +
> > +     if (nr_pages) {
> > +             page_counter_uncharge(&ug->memcg->memory, nr_pages);
> >               if (do_memsw_account())
> > -                     page_counter_uncharge(&ug->memcg->memsw, ug->nr_pages);
> > +                     page_counter_uncharge(&ug->memcg->memsw, nr_pages);
> >               if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
> >                       page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
> >               memcg_oom_recover(ug->memcg);
> > @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
> >
> >  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> >  {
> > -     unsigned long nr_pages;
> > +     unsigned long nr_pages, nr_kmem;
> >       struct mem_cgroup *memcg;
> >
> >       VM_BUG_ON_PAGE(PageLRU(page), page);
> > @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> >       if (!page_memcg_charged(page))
> >               return;
> >
> > +     nr_pages = compound_nr(page);
> >       /*
> >        * Nobody should be changing or seriously looking at
> > -      * page memcg at this point, we have fully exclusive
> > -      * access to the page.
> > +      * page memcg or objcg at this point, we have fully
> > +      * exclusive access to the page.
> >        */
> > -     memcg = page_memcg_check(page);
> > +     if (PageMemcgKmem(page)) {
> > +             struct obj_cgroup *objcg;
> > +
> > +             objcg = page_objcg(page);
> > +             memcg = obj_cgroup_memcg_get(objcg);
> > +
> > +             page->memcg_data = 0;
> > +             obj_cgroup_put(objcg);
> > +             nr_kmem = nr_pages;
> > +     } else {
> > +             memcg = page_memcg(page);
> > +             page->memcg_data = 0;
> > +             nr_kmem = 0;
> > +     }
> > +
> >       if (ug->memcg != memcg) {
> >               if (ug->memcg) {
> >                       uncharge_batch(ug);
> >                       uncharge_gather_clear(ug);
> >               }
> >               ug->memcg = memcg;
> > +             ug->dummy_page = page;
> >
> >               /* pairs with css_put in uncharge_batch */
> >               css_get(&ug->memcg->css);
> >       }
> >
> > -     nr_pages = compound_nr(page);
> >       ug->nr_pages += nr_pages;
> > +     ug->nr_kmem += nr_kmem;
> > +     ug->pgpgout += !nr_kmem;
> >
> > -     if (PageMemcgKmem(page))
> > -             ug->nr_kmem += nr_pages;
> > -     else
> > -             ug->pgpgout++;
> > -
> > -     ug->dummy_page = page;
> > -     page->memcg_data = 0;
> > -     css_put(&ug->memcg->css);
> > +     css_put(&memcg->css);
> >  }
> >
> >  /**
> > --
> > 2.11.0
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/4] mm: memcontrol: introduce obj_cgroup_{un}charge_pages
  2021-03-09 10:07 ` [PATCH v3 1/4] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Muchun Song
@ 2021-03-11 12:30   ` Johannes Weiner
  2021-03-11 18:56   ` Shakeel Butt
  1 sibling, 0 replies; 28+ messages in thread
From: Johannes Weiner @ 2021-03-11 12:30 UTC (permalink / raw)
  To: Muchun Song
  Cc: guro, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Tue, Mar 09, 2021 at 06:07:14PM +0800, Muchun Song wrote:
> We know that the unit of slab object charging is bytes, the unit of
> kmem page charging is PAGE_SIZE. If we want to reuse obj_cgroup APIs
> to charge the kmem pages, we should pass PAGE_SIZE (as third parameter)
> to obj_cgroup_charge(). Because the size is already PAGE_SIZE, we can
> skip touch the objcg stock. And obj_cgroup_{un}charge_pages() are
> introduced to charge in units of page level.
> 
> In the later patch, we also can reuse those two helpers to charge or
> uncharge a number of kernel pages to a object cgroup. This is just
> a code movement without any functional changes.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Roman Gushchin <guro@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-09 10:07 ` [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
  2021-03-10 19:57   ` Roman Gushchin
@ 2021-03-11 13:12   ` Johannes Weiner
  2021-03-12  7:14     ` [External] " Muchun Song
  2021-03-12  3:22   ` Shakeel Butt
  2 siblings, 1 reply; 28+ messages in thread
From: Johannes Weiner @ 2021-03-11 13:12 UTC (permalink / raw)
  To: Muchun Song
  Cc: guro, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Tue, Mar 09, 2021 at 06:07:15PM +0800, Muchun Song wrote:
> We want to reuse the obj_cgroup APIs to charge the kmem pages.
> If we do that, we should store an object cgroup pointer to
> page->memcg_data for the kmem pages.
> 
> Finally, page->memcg_data can have 3 different meanings.
> 
>   1) For the slab pages, page->memcg_data points to an object cgroups
>      vector.
> 
>   2) For the kmem pages (exclude the slab pages), page->memcg_data
>      points to an object cgroup.
> 
>   3) For the user pages (e.g. the LRU pages), page->memcg_data points
>      to a memory cgroup.
> 
> Currently we always get the memory cgroup associated with a page via
> page_memcg() or page_memcg_rcu(). page_memcg_check() is special, it
> has to be used in cases when it's not known if a page has an
> associated memory cgroup pointer or an object cgroups vector. Because
> the page->memcg_data of the kmem page is not pointing to a memory
> cgroup in the later patch, the page_memcg() and page_memcg_rcu()
> cannot be applicable for the kmem pages. In this patch, make
> page_memcg() and page_memcg_rcu() no longer apply to the kmem pages.
> We do not change the behavior of the page_memcg_check(), it is also
> applicable for the kmem pages.
> 
> In the end, there are 3 helpers to get the memcg associated with a page.
> Usage is as follows.
> 
>   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
>      pages).
> 
>      - page_memcg()
>      - page_memcg_rcu()
> 
>   2) Get the memory cgroup associated with a page. It has to be used in
>      cases when it's not known if a page has an associated memory cgroup
>      pointer or an object cgroups vector. Returns NULL for slab pages or
>      uncharged pages. Otherwise, returns memory cgroup for charged pages
>      (e.g. the kmem pages, the LRU pages).
> 
>      - page_memcg_check()
> 
> In some place, we use page_memcg() to check whether the page is charged.
> Now introduce page_memcg_charged() helper to do that.
> 
> This is a preparation for reparenting the kmem pages.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

I'm pretty excited about the direction this series is taking us. The
direct/raw pinning of memcgs from objects and allocations of various
lifetimes has been causing chronic problems with dying cgroups piling
up, consuming memory, and gumming up the works in everything that
needs to iterate the cgroup tree (page reclaim comes to mind).

The more allocation types we can convert to objcg, the better.

This patch in particular looks mostly good to me too. Some comments
inline:

> ---
>  include/linux/memcontrol.h | 33 +++++++++++++++++++++++++++------
>  mm/memcontrol.c            | 23 +++++++++++++----------
>  mm/page_alloc.c            |  4 ++--
>  3 files changed, 42 insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e6dc793d587d..83cbcdcfcc92 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -358,14 +358,26 @@ enum page_memcg_data_flags {
>  
>  #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
>  
> +/* Return true for charged page, otherwise false. */
> +static inline bool page_memcg_charged(struct page *page)
> +{
> +	unsigned long memcg_data = page->memcg_data;
> +
> +	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> +
> +	return !!memcg_data;
> +}

This is mosntly used right before a page_memcg_check(), which makes it
somewhat redundant except for the VM_BUG_ON_PAGE() for slab pages.

But it's also a bit of a confusing name: slab pages are charged too,
but this function would crash if you called it on one.

In light of this, and in light of what I wrote above about hopefully
converting more and more allocations from raw memcg pins to
reparentable objcg, it would be bettor to have

	page_memcg() for 1:1 page-memcg mappings, i.e. LRU & kmem
	page_objcg() for 1:n page-memcg mappings, i.e. slab pages
	page_memcg_check() for the very rare ambiguous cases
	drop page_memcg_rcu() since page_memcg() is now rcu-safe

If we wanted to optimize, we could identify places that could do a
page_memcg_raw() that does page->memcg_data & ~MEMCG_DATA_FLAGS_MASK -
without READ_ONCE and without the kmem branch. However, I think the
stat functions are probably the hottest path when it comes to that,
and they now already include the kmem branch*.

* Roman mentioned splitting up the stats interface to optimize that,
  but I think we should be careful optimizing prematurely here. It's a
  bit of a maintainability concern, and it would also get in the way
  of easily converting more allocation types to objcg.

> @@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
>  			     int val)
>  {
>  	struct page *head = compound_head(page); /* rmap on tail pages */
> -	struct mem_cgroup *memcg = page_memcg(head);
> +	struct mem_cgroup *memcg;
>  	pg_data_t *pgdat = page_pgdat(page);
>  	struct lruvec *lruvec;
>  
> +	memcg = page_memcg_check(head);

With the proposed variants above, this should be page_memcg() and
actually warn/crash when we pass a slab page to this function.

> @@ -3166,12 +3167,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
>   */
>  void __memcg_kmem_uncharge_page(struct page *page, int order)
>  {
> -	struct mem_cgroup *memcg = page_memcg(page);
> +	struct mem_cgroup *memcg;
>  	unsigned int nr_pages = 1 << order;
>  
> -	if (!memcg)
> +	if (!page_memcg_charged(page))
>  		return;
>  
> +	memcg = page_memcg_check(page);

This would remain unchanged:

	memcg = page_memcg(page);
	if (!memcg)
		return;

> @@ -6827,24 +6829,25 @@ static void uncharge_batch(const struct uncharge_gather *ug)
>  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
>  {
>  	unsigned long nr_pages;
> +	struct mem_cgroup *memcg;
>  
>  	VM_BUG_ON_PAGE(PageLRU(page), page);
>  
> -	if (!page_memcg(page))
> +	if (!page_memcg_charged(page))
>  		return;
>  
>  	/*
>  	 * Nobody should be changing or seriously looking at
> -	 * page_memcg(page) at this point, we have fully
> -	 * exclusive access to the page.
> +	 * page memcg at this point, we have fully exclusive
> +	 * access to the page.
>  	 */
> -
> -	if (ug->memcg != page_memcg(page)) {
> +	memcg = page_memcg_check(page);

Same situation here:

	memcg = page_memcg(page);
	if (!memcg)
		return;

> @@ -6877,7 +6880,7 @@ void mem_cgroup_uncharge(struct page *page)
>  		return;
>  
>  	/* Don't touch page->lru of any random page, pre-check: */
> -	if (!page_memcg(page))
> +	if (!page_memcg_charged(page))
>  		return;

Same:

	if (!page_memcg(page))
		return;

>  	uncharge_gather_clear(&ug);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f10966e3b4a5..bcb58ae15e24 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1124,7 +1124,7 @@ static inline bool page_expected_state(struct page *page,
>  	if (unlikely((unsigned long)page->mapping |
>  			page_ref_count(page) |
>  #ifdef CONFIG_MEMCG
> -			(unsigned long)page_memcg(page) |
> +			page_memcg_charged(page) |

Actually, I think we might want to just check the raw

			page->memcg_data

here, as neither lru, nor kmem, nor slab page should have anything
left in there by the time the page is freed.

> @@ -1149,7 +1149,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
>  			bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
>  	}
>  #ifdef CONFIG_MEMCG
> -	if (unlikely(page_memcg(page)))
> +	if (unlikely(page_memcg_charged(page)))
>  		bad_reason = "page still charged to cgroup";

Same here.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-09 10:07 ` [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
  2021-03-10 19:53   ` Roman Gushchin
  2021-03-10 22:05   ` Johannes Weiner
@ 2021-03-11 17:48   ` kernel test robot
  2 siblings, 0 replies; 28+ messages in thread
From: kernel test robot @ 2021-03-11 17:48 UTC (permalink / raw)
  To: Muchun Song, guro, hannes, mhocko, akpm, shakeelb, vdavydov.dev
  Cc: kbuild-all, linux-kernel, linux-mm, duanxiongchun, Muchun Song

[-- Attachment #1: Type: text/plain, Size: 6175 bytes --]

Hi Muchun,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.12-rc2 next-20210311]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Muchun-Song/Use-obj_cgroup-APIs-to-charge-kmem-pages/20210309-181121
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 144c79ef33536b4ecb4951e07dbc1f2b7fa99d32
config: i386-randconfig-s002-20210311 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-262-g5e674421-dirty
        # https://github.com/0day-ci/linux/commit/202c922730a115143cf9e15ab26633f247e00229
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Muchun-Song/Use-obj_cgroup-APIs-to-charge-kmem-pages/20210309-181121
        git checkout 202c922730a115143cf9e15ab26633f247e00229
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
   mm/memcontrol.c:4194:21: sparse: sparse: incompatible types in comparison expression (different address spaces):
   mm/memcontrol.c:4194:21: sparse:    struct mem_cgroup_threshold_ary [noderef] __rcu *
   mm/memcontrol.c:4194:21: sparse:    struct mem_cgroup_threshold_ary *
   mm/memcontrol.c:4196:21: sparse: sparse: incompatible types in comparison expression (different address spaces):
   mm/memcontrol.c:4196:21: sparse:    struct mem_cgroup_threshold_ary [noderef] __rcu *
   mm/memcontrol.c:4196:21: sparse:    struct mem_cgroup_threshold_ary *
   mm/memcontrol.c:4352:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
   mm/memcontrol.c:4352:9: sparse:    struct mem_cgroup_threshold_ary [noderef] __rcu *
   mm/memcontrol.c:4352:9: sparse:    struct mem_cgroup_threshold_ary *
   mm/memcontrol.c:4446:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
   mm/memcontrol.c:4446:9: sparse:    struct mem_cgroup_threshold_ary [noderef] __rcu *
   mm/memcontrol.c:4446:9: sparse:    struct mem_cgroup_threshold_ary *
   mm/memcontrol.c:6002:23: sparse: sparse: incompatible types in comparison expression (different address spaces):
   mm/memcontrol.c:6002:23: sparse:    struct task_struct [noderef] __rcu *
   mm/memcontrol.c:6002:23: sparse:    struct task_struct *
>> mm/memcontrol.c:854:6: sparse: sparse: context imbalance in '__mod_lruvec_page_state' - different lock contexts for basic block
   mm/memcontrol.c: note: in included file:
   include/linux/memcontrol.h:707:9: sparse: sparse: context imbalance in 'lock_page_lruvec' - wrong count at exit
   include/linux/memcontrol.h:707:9: sparse: sparse: context imbalance in 'lock_page_lruvec_irq' - wrong count at exit
   include/linux/memcontrol.h:707:9: sparse: sparse: context imbalance in 'lock_page_lruvec_irqsave' - wrong count at exit
   mm/memcontrol.c:2137:19: sparse: sparse: context imbalance in 'lock_page_memcg' - wrong count at exit
   mm/memcontrol.c:2199:17: sparse: sparse: context imbalance in '__unlock_page_memcg' - unexpected unlock
   mm/memcontrol.c:5853:28: sparse: sparse: context imbalance in 'mem_cgroup_count_precharge_pte_range' - unexpected unlock
   mm/memcontrol.c:6047:36: sparse: sparse: context imbalance in 'mem_cgroup_move_charge_pte_range' - unexpected unlock

vim +/__mod_lruvec_page_state +854 mm/memcontrol.c

eedc4e5a142cc3 Roman Gushchin 2020-08-06  853  
c47d5032ed3002 Shakeel Butt   2020-12-14 @854  void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
c47d5032ed3002 Shakeel Butt   2020-12-14  855  			     int val)
c47d5032ed3002 Shakeel Butt   2020-12-14  856  {
c47d5032ed3002 Shakeel Butt   2020-12-14  857  	struct page *head = compound_head(page); /* rmap on tail pages */
286d6cff5d9973 Muchun Song    2021-03-09  858  	struct mem_cgroup *memcg;
c47d5032ed3002 Shakeel Butt   2020-12-14  859  	pg_data_t *pgdat = page_pgdat(page);
c47d5032ed3002 Shakeel Butt   2020-12-14  860  	struct lruvec *lruvec;
c47d5032ed3002 Shakeel Butt   2020-12-14  861  
202c922730a115 Muchun Song    2021-03-09  862  	if (PageMemcgKmem(head)) {
202c922730a115 Muchun Song    2021-03-09  863  		rcu_read_lock();
202c922730a115 Muchun Song    2021-03-09  864  		memcg = obj_cgroup_memcg(page_objcg(page));
202c922730a115 Muchun Song    2021-03-09  865  	} else {
202c922730a115 Muchun Song    2021-03-09  866  		memcg = page_memcg(head);
202c922730a115 Muchun Song    2021-03-09  867  		/*
202c922730a115 Muchun Song    2021-03-09  868  		 * Untracked pages have no memcg, no lruvec. Update only the
202c922730a115 Muchun Song    2021-03-09  869  		 * node.
202c922730a115 Muchun Song    2021-03-09  870  		 */
d635a69dd4981c Linus Torvalds 2020-12-15  871  		if (!memcg) {
c47d5032ed3002 Shakeel Butt   2020-12-14  872  			__mod_node_page_state(pgdat, idx, val);
c47d5032ed3002 Shakeel Butt   2020-12-14  873  			return;
c47d5032ed3002 Shakeel Butt   2020-12-14  874  		}
202c922730a115 Muchun Song    2021-03-09  875  	}
c47d5032ed3002 Shakeel Butt   2020-12-14  876  
d635a69dd4981c Linus Torvalds 2020-12-15  877  	lruvec = mem_cgroup_lruvec(memcg, pgdat);
c47d5032ed3002 Shakeel Butt   2020-12-14  878  	__mod_lruvec_state(lruvec, idx, val);
202c922730a115 Muchun Song    2021-03-09  879  
202c922730a115 Muchun Song    2021-03-09  880  	if (PageMemcgKmem(head))
202c922730a115 Muchun Song    2021-03-09  881  		rcu_read_unlock();
c47d5032ed3002 Shakeel Butt   2020-12-14  882  }
f0c0c115fb8194 Shakeel Butt   2020-12-14  883  EXPORT_SYMBOL(__mod_lruvec_page_state);
c47d5032ed3002 Shakeel Butt   2020-12-14  884  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27946 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/4] mm: memcontrol: introduce obj_cgroup_{un}charge_pages
  2021-03-09 10:07 ` [PATCH v3 1/4] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Muchun Song
  2021-03-11 12:30   ` Johannes Weiner
@ 2021-03-11 18:56   ` Shakeel Butt
  1 sibling, 0 replies; 28+ messages in thread
From: Shakeel Butt @ 2021-03-11 18:56 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Johannes Weiner, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux MM, Xiongchun duan

On Tue, Mar 9, 2021 at 2:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> We know that the unit of slab object charging is bytes, the unit of
> kmem page charging is PAGE_SIZE. If we want to reuse obj_cgroup APIs
> to charge the kmem pages, we should pass PAGE_SIZE (as third parameter)
> to obj_cgroup_charge(). Because the size is already PAGE_SIZE, we can
> skip touch the objcg stock. And obj_cgroup_{un}charge_pages() are
> introduced to charge in units of page level.
>
> In the later patch, we also can reuse those two helpers to charge or
> uncharge a number of kernel pages to a object cgroup. This is just
> a code movement without any functional changes.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Roman Gushchin <guro@fb.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-09 10:07 ` [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
  2021-03-10 19:57   ` Roman Gushchin
  2021-03-11 13:12   ` Johannes Weiner
@ 2021-03-12  3:22   ` Shakeel Butt
  2021-03-12  5:02     ` [External] " Muchun Song
  2 siblings, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2021-03-12  3:22 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Johannes Weiner, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux MM, Xiongchun duan

On Tue, Mar 9, 2021 at 2:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> We want to reuse the obj_cgroup APIs to charge the kmem pages.
> If we do that, we should store an object cgroup pointer to
> page->memcg_data for the kmem pages.
>
> Finally, page->memcg_data can have 3 different meanings.

replace 'can' with 'will'

Other than that I think Roman and Johannes have already given very
good feedback.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM
  2021-03-09 10:07 ` [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Muchun Song
  2021-03-10 19:30   ` Roman Gushchin
@ 2021-03-12  3:26   ` Shakeel Butt
  2021-03-12 19:24   ` Johannes Weiner
  2 siblings, 0 replies; 28+ messages in thread
From: Shakeel Butt @ 2021-03-12  3:26 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Johannes Weiner, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux MM, Xiongchun duan

On Tue, Mar 9, 2021 at 2:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> The page only can be marked as kmem when CONFIG_MEMCG_KMEM is enabled.
> So move PageMemcgKmem() to the scope of the CONFIG_MEMCG_KMEM.
>
> As a bonus, on !CONFIG_MEMCG_KMEM build some code can be compiled out.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-12  3:22   ` Shakeel Butt
@ 2021-03-12  5:02     ` Muchun Song
  0 siblings, 0 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-12  5:02 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Roman Gushchin, Johannes Weiner, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux MM, Xiongchun duan

On Fri, Mar 12, 2021 at 11:22 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Tue, Mar 9, 2021 at 2:09 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > We want to reuse the obj_cgroup APIs to charge the kmem pages.
> > If we do that, we should store an object cgroup pointer to
> > page->memcg_data for the kmem pages.
> >
> > Finally, page->memcg_data can have 3 different meanings.
>
> replace 'can' with 'will'

Will do. Thanks.

>
> Other than that I think Roman and Johannes have already given very
> good feedback.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-11 13:12   ` Johannes Weiner
@ 2021-03-12  7:14     ` Muchun Song
  2021-03-12 19:23       ` Johannes Weiner
  0 siblings, 1 reply; 28+ messages in thread
From: Muchun Song @ 2021-03-12  7:14 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, Michal Hocko, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Thu, Mar 11, 2021 at 9:12 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Tue, Mar 09, 2021 at 06:07:15PM +0800, Muchun Song wrote:
> > We want to reuse the obj_cgroup APIs to charge the kmem pages.
> > If we do that, we should store an object cgroup pointer to
> > page->memcg_data for the kmem pages.
> >
> > Finally, page->memcg_data can have 3 different meanings.
> >
> >   1) For the slab pages, page->memcg_data points to an object cgroups
> >      vector.
> >
> >   2) For the kmem pages (exclude the slab pages), page->memcg_data
> >      points to an object cgroup.
> >
> >   3) For the user pages (e.g. the LRU pages), page->memcg_data points
> >      to a memory cgroup.
> >
> > Currently we always get the memory cgroup associated with a page via
> > page_memcg() or page_memcg_rcu(). page_memcg_check() is special, it
> > has to be used in cases when it's not known if a page has an
> > associated memory cgroup pointer or an object cgroups vector. Because
> > the page->memcg_data of the kmem page is not pointing to a memory
> > cgroup in the later patch, the page_memcg() and page_memcg_rcu()
> > cannot be applicable for the kmem pages. In this patch, make
> > page_memcg() and page_memcg_rcu() no longer apply to the kmem pages.
> > We do not change the behavior of the page_memcg_check(), it is also
> > applicable for the kmem pages.
> >
> > In the end, there are 3 helpers to get the memcg associated with a page.
> > Usage is as follows.
> >
> >   1) Get the memory cgroup associated with a non-kmem page (e.g. the LRU
> >      pages).
> >
> >      - page_memcg()
> >      - page_memcg_rcu()
> >
> >   2) Get the memory cgroup associated with a page. It has to be used in
> >      cases when it's not known if a page has an associated memory cgroup
> >      pointer or an object cgroups vector. Returns NULL for slab pages or
> >      uncharged pages. Otherwise, returns memory cgroup for charged pages
> >      (e.g. the kmem pages, the LRU pages).
> >
> >      - page_memcg_check()
> >
> > In some place, we use page_memcg() to check whether the page is charged.
> > Now introduce page_memcg_charged() helper to do that.
> >
> > This is a preparation for reparenting the kmem pages.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>
> I'm pretty excited about the direction this series is taking us. The
> direct/raw pinning of memcgs from objects and allocations of various
> lifetimes has been causing chronic problems with dying cgroups piling
> up, consuming memory, and gumming up the works in everything that
> needs to iterate the cgroup tree (page reclaim comes to mind).
>
> The more allocation types we can convert to objcg, the better.
>
> This patch in particular looks mostly good to me too. Some comments
> inline:

Hi Johannes,

Very thanks for your suggestions. But I have some questions as below.


>
> > ---
> >  include/linux/memcontrol.h | 33 +++++++++++++++++++++++++++------
> >  mm/memcontrol.c            | 23 +++++++++++++----------
> >  mm/page_alloc.c            |  4 ++--
> >  3 files changed, 42 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index e6dc793d587d..83cbcdcfcc92 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -358,14 +358,26 @@ enum page_memcg_data_flags {
> >
> >  #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
> >
> > +/* Return true for charged page, otherwise false. */
> > +static inline bool page_memcg_charged(struct page *page)
> > +{
> > +     unsigned long memcg_data = page->memcg_data;
> > +
> > +     VM_BUG_ON_PAGE(PageSlab(page), page);
> > +     VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> > +
> > +     return !!memcg_data;
> > +}
>
> This is mosntly used right before a page_memcg_check(), which makes it
> somewhat redundant except for the VM_BUG_ON_PAGE() for slab pages.

Should I rename page_memcg_charged to page_memcg_raw?
And use page_memcg_raw to check whether the page is charged.

static inline bool page_memcg_charged(struct page *page)
{
        return page->memcg_data;
}

>
> But it's also a bit of a confusing name: slab pages are charged too,
> but this function would crash if you called it on one.
>
> In light of this, and in light of what I wrote above about hopefully
> converting more and more allocations from raw memcg pins to
> reparentable objcg, it would be bettor to have
>
>         page_memcg() for 1:1 page-memcg mappings, i.e. LRU & kmem

Sorry. I do not get the point. Because in the next patch, the kmem
page will use objcg to charge memory. So the page_memcg()
should not be suitable for the kmem pages. So I add a VM_BUG_ON
in the page_memcg() to catch invalid usage.

So I changed some page_memcg() calling to page_memcg_check()
in this patch, but you suggest using page_memcg(). I am very
confused. Are you worried about the extra overhead brought by calling
page_memcg_rcu()? In the next patch, I will remove page_memcg_check()
calling and use objcg APIs.

>         page_objcg() for 1:n page-memcg mappings, i.e. slab pages
>         page_memcg_check() for the very rare ambiguous cases
>         drop page_memcg_rcu() since page_memcg() is now rcu-safe
                                         ^^^
                                      page_memcg_check()

Here you mean page_memcg_check()? Right? I see a READ_ONCE
in page_memcg_check(), but page_memcg() doesn't.


>
> If we wanted to optimize, we could identify places that could do a
> page_memcg_raw() that does page->memcg_data & ~MEMCG_DATA_FLAGS_MASK -
> without READ_ONCE and without the kmem branch. However, I think the
> stat functions are probably the hottest path when it comes to that,
> and they now already include the kmem branch*.
>
> * Roman mentioned splitting up the stats interface to optimize that,
>   but I think we should be careful optimizing prematurely here. It's a
>   bit of a maintainability concern, and it would also get in the way
>   of easily converting more allocation types to objcg.
>
> > @@ -855,10 +855,11 @@ void __mod_lruvec_page_state(struct page *page, enum node_stat_item idx,
> >                            int val)
> >  {
> >       struct page *head = compound_head(page); /* rmap on tail pages */
> > -     struct mem_cgroup *memcg = page_memcg(head);
> > +     struct mem_cgroup *memcg;
> >       pg_data_t *pgdat = page_pgdat(page);
> >       struct lruvec *lruvec;
> >
> > +     memcg = page_memcg_check(head);
>
> With the proposed variants above, this should be page_memcg() and
> actually warn/crash when we pass a slab page to this function.
>
> > @@ -3166,12 +3167,13 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order)
> >   */
> >  void __memcg_kmem_uncharge_page(struct page *page, int order)
> >  {
> > -     struct mem_cgroup *memcg = page_memcg(page);
> > +     struct mem_cgroup *memcg;
> >       unsigned int nr_pages = 1 << order;
> >
> > -     if (!memcg)
> > +     if (!page_memcg_charged(page))
> >               return;
> >
> > +     memcg = page_memcg_check(page);
>
> This would remain unchanged:
>
>         memcg = page_memcg(page);
>         if (!memcg)
>                 return;
>
> > @@ -6827,24 +6829,25 @@ static void uncharge_batch(const struct uncharge_gather *ug)
> >  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> >  {
> >       unsigned long nr_pages;
> > +     struct mem_cgroup *memcg;
> >
> >       VM_BUG_ON_PAGE(PageLRU(page), page);
> >
> > -     if (!page_memcg(page))
> > +     if (!page_memcg_charged(page))
> >               return;
> >
> >       /*
> >        * Nobody should be changing or seriously looking at
> > -      * page_memcg(page) at this point, we have fully
> > -      * exclusive access to the page.
> > +      * page memcg at this point, we have fully exclusive
> > +      * access to the page.
> >        */
> > -
> > -     if (ug->memcg != page_memcg(page)) {
> > +     memcg = page_memcg_check(page);
>
> Same situation here:
>
>         memcg = page_memcg(page);
>         if (!memcg)
>                 return;
>
> > @@ -6877,7 +6880,7 @@ void mem_cgroup_uncharge(struct page *page)
> >               return;
> >
> >       /* Don't touch page->lru of any random page, pre-check: */
> > -     if (!page_memcg(page))
> > +     if (!page_memcg_charged(page))
> >               return;
>
> Same:
>
>         if (!page_memcg(page))
>                 return;
>
> >       uncharge_gather_clear(&ug);
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index f10966e3b4a5..bcb58ae15e24 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1124,7 +1124,7 @@ static inline bool page_expected_state(struct page *page,
> >       if (unlikely((unsigned long)page->mapping |
> >                       page_ref_count(page) |
> >  #ifdef CONFIG_MEMCG
> > -                     (unsigned long)page_memcg(page) |
> > +                     page_memcg_charged(page) |
>
> Actually, I think we might want to just check the raw
>
>                         page->memcg_data
>
> here, as neither lru, nor kmem, nor slab page should have anything
> left in there by the time the page is freed.
>
> > @@ -1149,7 +1149,7 @@ static const char *page_bad_reason(struct page *page, unsigned long flags)
> >                       bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set";
> >       }
> >  #ifdef CONFIG_MEMCG
> > -     if (unlikely(page_memcg(page)))
> > +     if (unlikely(page_memcg_charged(page)))
> >               bad_reason = "page still charged to cgroup";
>
> Same here.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-10 22:05   ` Johannes Weiner
@ 2021-03-12  9:22     ` Muchun Song
  2021-03-12 15:59       ` Johannes Weiner
  0 siblings, 1 reply; 28+ messages in thread
From: Muchun Song @ 2021-03-12  9:22 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, Michal Hocko, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Thu, Mar 11, 2021 at 6:05 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Hello Munchun,
>
> On Tue, Mar 09, 2021 at 06:07:16PM +0800, Muchun Song wrote:
> > @@ -6806,11 +6823,23 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug)
> >  static void uncharge_batch(const struct uncharge_gather *ug)
> >  {
> >       unsigned long flags;
> > +     unsigned long nr_pages;
> >
> > -     if (!mem_cgroup_is_root(ug->memcg)) {
> > -             page_counter_uncharge(&ug->memcg->memory, ug->nr_pages);
> > +     /*
> > +      * The kmem pages can be reparented to the root memcg, in
> > +      * order to prevent the memory counter of root memcg from
> > +      * increasing indefinitely. We should decrease the memory
> > +      * counter when unchange.
> > +      */
> > +     if (mem_cgroup_is_root(ug->memcg))
> > +             nr_pages = ug->nr_kmem;
> > +     else
> > +             nr_pages = ug->nr_pages;
>
> Correct or not, I find this unreadable. We're uncharging nr_kmem on
> the root, and nr_pages against leaf groups?
>
> It implies several things that might not be immediately obvious to the
> reader of this function. Namely, that nr_kmem is a subset of nr_pages.
> Or that we don't *want* to account LRU pages for the root cgroup.
>
> The old code followed a very simple pattern: the root memcg's page
> counters aren't touched.
>
> This is no longer true: we modify them depending on very specific
> circumstances. But that's too clever for the stupid uncharge_batch()
> which is only supposed to flush a number of accumulators into their
> corresponding page counters.
>
> This distinction really needs to be moved down to uncharge_page() now.

OK. I will rework the code here to make it readable.

>
> > @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
> >
> >  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> >  {
> > -     unsigned long nr_pages;
> > +     unsigned long nr_pages, nr_kmem;
> >       struct mem_cgroup *memcg;
> >
> >       VM_BUG_ON_PAGE(PageLRU(page), page);
> > @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> >       if (!page_memcg_charged(page))
> >               return;
> >
> > +     nr_pages = compound_nr(page);
> >       /*
> >        * Nobody should be changing or seriously looking at
> > -      * page memcg at this point, we have fully exclusive
> > -      * access to the page.
> > +      * page memcg or objcg at this point, we have fully
> > +      * exclusive access to the page.
> >        */
> > -     memcg = page_memcg_check(page);
> > +     if (PageMemcgKmem(page)) {
> > +             struct obj_cgroup *objcg;
> > +
> > +             objcg = page_objcg(page);
> > +             memcg = obj_cgroup_memcg_get(objcg);
> > +
> > +             page->memcg_data = 0;
> > +             obj_cgroup_put(objcg);
> > +             nr_kmem = nr_pages;
> > +     } else {
> > +             memcg = page_memcg(page);
> > +             page->memcg_data = 0;
> > +             nr_kmem = 0;
> > +     }
>
> Why is all this moved above the uncharge_batch() call?

Before calling obj_cgroup_put(), we need set page->memcg_data
to zero. So I move "page->memcg_data = 0" to here.

>
> It separates the pointer manipulations from the refcounting, which
> makes the code very difficult to follow.
>
> > +
> >       if (ug->memcg != memcg) {
> >               if (ug->memcg) {
> >                       uncharge_batch(ug);
> >                       uncharge_gather_clear(ug);
> >               }
> >               ug->memcg = memcg;
> > +             ug->dummy_page = page;
>
> Why this change?

Just like ug->memcg, we do not need to set
ug->dummy_page in every loop.


>
> >               /* pairs with css_put in uncharge_batch */
> >               css_get(&ug->memcg->css);
> >       }
> >
> > -     nr_pages = compound_nr(page);
> >       ug->nr_pages += nr_pages;
> > +     ug->nr_kmem += nr_kmem;
> > +     ug->pgpgout += !nr_kmem;
>
> Oof.
>
> Yes, this pgpgout line is an equivalent transformation for counting
> LRU compound pages. But unless you already know that, it's completely
> impossible to understand what the intent here is.
>
> Please avoid clever tricks like this. If you need to check whether the
> page is kmem, test PageMemcgKmem() instead of abusing the counters as
> boolean flags. This is supposed to be read by human beings, too.

Got it.

>
> > -     if (PageMemcgKmem(page))
> > -             ug->nr_kmem += nr_pages;
> > -     else
> > -             ug->pgpgout++;
> > -
> > -     ug->dummy_page = page;
> > -     page->memcg_data = 0;
> > -     css_put(&ug->memcg->css);
> > +     css_put(&memcg->css);
>
> Sorry, these two functions are no longer readable after your changes.
>
> Please retain the following sequence as discrete steps:
>
> 1. look up memcg from the page
> 2. flush existing batch if memcg changed
> 3. add page's various counts to the current batch
> 4. clear page->memcg and decrease the referece count to whatever it was pointing to

Got it.

>
> And as per above, step 3. is where we should check whether to uncharge
> the memcg's page counter at all:
>
>         if (PageMemcgKmem(page)) {
>                 ug->nr_pages += nr_pages;
>                 ug->nr_kmem += nr_pages;
>         } else {
>                 /* LRU pages aren't accounted at the root level */
>                 if (!mem_cgroup_is_root(memcg))
>                         ug->nr_pages += nr_pages;
>                 ug->pgpgout++;
>         }
>
> In fact, it might be a good idea to rename ug->nr_pages to
> ug->nr_memory to highlight how it maps to the page_counter.

I will rework the code in the next version. Thanks for your
suggestions.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-12  9:22     ` [External] " Muchun Song
@ 2021-03-12 15:59       ` Johannes Weiner
  2021-03-12 16:46         ` Muchun Song
  0 siblings, 1 reply; 28+ messages in thread
From: Johannes Weiner @ 2021-03-12 15:59 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Michal Hocko, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Fri, Mar 12, 2021 at 05:22:55PM +0800, Muchun Song wrote:
> On Thu, Mar 11, 2021 at 6:05 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
> > >
> > >  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> > >  {
> > > -     unsigned long nr_pages;
> > > +     unsigned long nr_pages, nr_kmem;
> > >       struct mem_cgroup *memcg;
> > >
> > >       VM_BUG_ON_PAGE(PageLRU(page), page);
> > > @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> > >       if (!page_memcg_charged(page))
> > >               return;
> > >
> > > +     nr_pages = compound_nr(page);
> > >       /*
> > >        * Nobody should be changing or seriously looking at
> > > -      * page memcg at this point, we have fully exclusive
> > > -      * access to the page.
> > > +      * page memcg or objcg at this point, we have fully
> > > +      * exclusive access to the page.
> > >        */
> > > -     memcg = page_memcg_check(page);
> > > +     if (PageMemcgKmem(page)) {
> > > +             struct obj_cgroup *objcg;
> > > +
> > > +             objcg = page_objcg(page);
> > > +             memcg = obj_cgroup_memcg_get(objcg);
> > > +
> > > +             page->memcg_data = 0;
> > > +             obj_cgroup_put(objcg);
> > > +             nr_kmem = nr_pages;
> > > +     } else {
> > > +             memcg = page_memcg(page);
> > > +             page->memcg_data = 0;
> > > +             nr_kmem = 0;
> > > +     }
> >
> > Why is all this moved above the uncharge_batch() call?
> 
> Before calling obj_cgroup_put(), we need set page->memcg_data
> to zero. So I move "page->memcg_data = 0" to here.

Yeah, it makes sense to keep those together, but we can move them both
down to after the uncharge, right?

> > It separates the pointer manipulations from the refcounting, which
> > makes the code very difficult to follow.
> >
> > > +
> > >       if (ug->memcg != memcg) {
> > >               if (ug->memcg) {
> > >                       uncharge_batch(ug);
> > >                       uncharge_gather_clear(ug);
> > >               }
> > >               ug->memcg = memcg;
> > > +             ug->dummy_page = page;
> >
> > Why this change?
> 
> Just like ug->memcg, we do not need to set
> ug->dummy_page in every loop.

Ah, okay. That's a reasonable change, it's just confusing because I
thought this was a requirement for the new code to work. But I didn't
see how it relied on that, and it made me think I'm not understanding
your code ;) It's better to split that into a separate patch.

> I will rework the code in the next version.

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages
  2021-03-12 15:59       ` Johannes Weiner
@ 2021-03-12 16:46         ` Muchun Song
  0 siblings, 0 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-12 16:46 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, Michal Hocko, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Fri, Mar 12, 2021 at 11:59 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, Mar 12, 2021 at 05:22:55PM +0800, Muchun Song wrote:
> > On Thu, Mar 11, 2021 at 6:05 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > @@ -6828,7 +6857,7 @@ static void uncharge_batch(const struct uncharge_gather *ug)
> > > >
> > > >  static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> > > >  {
> > > > -     unsigned long nr_pages;
> > > > +     unsigned long nr_pages, nr_kmem;
> > > >       struct mem_cgroup *memcg;
> > > >
> > > >       VM_BUG_ON_PAGE(PageLRU(page), page);
> > > > @@ -6836,34 +6865,44 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
> > > >       if (!page_memcg_charged(page))
> > > >               return;
> > > >
> > > > +     nr_pages = compound_nr(page);
> > > >       /*
> > > >        * Nobody should be changing or seriously looking at
> > > > -      * page memcg at this point, we have fully exclusive
> > > > -      * access to the page.
> > > > +      * page memcg or objcg at this point, we have fully
> > > > +      * exclusive access to the page.
> > > >        */
> > > > -     memcg = page_memcg_check(page);
> > > > +     if (PageMemcgKmem(page)) {
> > > > +             struct obj_cgroup *objcg;
> > > > +
> > > > +             objcg = page_objcg(page);
> > > > +             memcg = obj_cgroup_memcg_get(objcg);
> > > > +
> > > > +             page->memcg_data = 0;
> > > > +             obj_cgroup_put(objcg);
> > > > +             nr_kmem = nr_pages;
> > > > +     } else {
> > > > +             memcg = page_memcg(page);
> > > > +             page->memcg_data = 0;
> > > > +             nr_kmem = 0;
> > > > +     }
> > >
> > > Why is all this moved above the uncharge_batch() call?
> >
> > Before calling obj_cgroup_put(), we need set page->memcg_data
> > to zero. So I move "page->memcg_data = 0" to here.
>
> Yeah, it makes sense to keep those together, but we can move them both
> down to after the uncharge, right?

Right. I am doing this.

>
> > > It separates the pointer manipulations from the refcounting, which
> > > makes the code very difficult to follow.
> > >
> > > > +
> > > >       if (ug->memcg != memcg) {
> > > >               if (ug->memcg) {
> > > >                       uncharge_batch(ug);
> > > >                       uncharge_gather_clear(ug);
> > > >               }
> > > >               ug->memcg = memcg;
> > > > +             ug->dummy_page = page;
> > >
> > > Why this change?
> >
> > Just like ug->memcg, we do not need to set
> > ug->dummy_page in every loop.
>
> Ah, okay. That's a reasonable change, it's just confusing because I
> thought this was a requirement for the new code to work. But I didn't
> see how it relied on that, and it made me think I'm not understanding
> your code ;) It's better to split that into a separate patch.

Sorry for confusing you. I will split that into a separate patch.
Thanks.

>
> > I will rework the code in the next version.
>
> Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-12  7:14     ` [External] " Muchun Song
@ 2021-03-12 19:23       ` Johannes Weiner
  2021-03-12 22:42         ` Shakeel Butt
  2021-03-14 13:56         ` Muchun Song
  0 siblings, 2 replies; 28+ messages in thread
From: Johannes Weiner @ 2021-03-12 19:23 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Michal Hocko, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

Hello Muchun,

On Fri, Mar 12, 2021 at 03:14:07PM +0800, Muchun Song wrote:
> On Thu, Mar 11, 2021 at 9:12 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > @@ -358,14 +358,26 @@ enum page_memcg_data_flags {
> > >
> > >  #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
> > >
> > > +/* Return true for charged page, otherwise false. */
> > > +static inline bool page_memcg_charged(struct page *page)
> > > +{
> > > +     unsigned long memcg_data = page->memcg_data;
> > > +
> > > +     VM_BUG_ON_PAGE(PageSlab(page), page);
> > > +     VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> > > +
> > > +     return !!memcg_data;
> > > +}
> >
> > This is mosntly used right before a page_memcg_check(), which makes it
> > somewhat redundant except for the VM_BUG_ON_PAGE() for slab pages.
> 
> Should I rename page_memcg_charged to page_memcg_raw?
> And use page_memcg_raw to check whether the page is charged.
> 
> static inline bool page_memcg_charged(struct page *page)
> {
>         return page->memcg_data;
> }

You can just directly access page->memcg_data in places where you'd
use this helper. I think it's only the two places in mm/page_alloc.c,
and they already have CONFIG_MEMCG in place, so raw access works.

> > But it's also a bit of a confusing name: slab pages are charged too,
> > but this function would crash if you called it on one.
> >
> > In light of this, and in light of what I wrote above about hopefully
> > converting more and more allocations from raw memcg pins to
> > reparentable objcg, it would be bettor to have
> >
> >         page_memcg() for 1:1 page-memcg mappings, i.e. LRU & kmem
> 
> Sorry. I do not get the point. Because in the next patch, the kmem
> page will use objcg to charge memory. So the page_memcg()
> should not be suitable for the kmem pages. So I add a VM_BUG_ON
> in the page_memcg() to catch invalid usage.
> 
> So I changed some page_memcg() calling to page_memcg_check()
> in this patch, but you suggest using page_memcg().

It would be better if page_memcg() worked on LRU and kmem pages. I'm
proposing to change its implementation.

The reason is that page_memcg_check() allows everything and does no
sanity checking. You need page_memcg_charged() for the sanity checks
that it's LRU or kmem, but that's a bit difficult to understand, and
it's possible people will add more callsites to page_memcg_check()
without the page_memcg_charged() checks. It makes the code less safe.

We should discourage page_memcg_check() and make page_memcg() more
useful instead.

> I am very confused. Are you worried about the extra overhead brought
> by calling page_memcg_rcu()? In the next patch, I will remove
> page_memcg_check() calling and use objcg APIs.

I'm just worried about the usability of the interface. It should be
easy to use, and make it obvious if there is a user bug.

For example, in your next patch, mod_lruvec_page_state does this:

       if (PageMemcgKmem(head)) {
	       rcu_read_lock();
	       memcg = obj_cgroup_memcg(page_objcg(page));
       } else {
	       memcg = page_memcg(head);
	       /*
		* Untracked pages have no memcg, no lruvec. Update only the
		* node.
		*/
	       if (!memcg) {
		       __mod_node_page_state(pgdat, idx, val);
		       return;
	       }
	}

	lruvec = mem_cgroup_lruvec(memcg, pgdat);
	__mod_lruvec_state(lruvec, idx, val);

       if (PageMemcgKmem(head))
	       rcu_read_unlock();

I'm proposing to implement page_memcg() in a way where you can do this:

	rcu_read_lock();
	memcg = page_memcg(page);
	if (!memcg) {
		rcu_read_unlock();
		__mod_node_page_state();
		return;
	}
	lruvec = mem_cgroup_lruvec(memcg, pgdat);
	__mod_lruvec_state(lruvec, idx, val);
	rcu_read_unlock();

[ page_memcg() is:

	if (PageMemcgKmem(page))
		return obj_cgroup_memcg(__page_objcg(page));
	else
		return __page_memcg(page);

  and __page_objcg() and __page_memcg() do the raw page->memcg_data
  translation and the VM_BUG_ON_PAGE() checks for MEMCG_DATA_* ]

This is a lot simpler and less error prone.

It does take rcu_read_lock() for LRU pages too, which strictly it
doesn't need to right now. But it's cheap enough (and actually saves a
branch).

Longer term we most likely need it there anyway. The issue you are
describing in the cover letter - allocations pinning memcgs for a long
time - it exists at a larger scale and is causing recurring problems
in the real world: page cache doesn't get reclaimed for a long time,
or is used by the second, third, fourth, ... instance of the same job
that was restarted into a new cgroup every time. Unreclaimable dying
cgroups pile up, waste memory, and make page reclaim very inefficient.

We likely have to convert LRU pages and most other raw memcg pins to
the objcg direction to fix this problem, and then the page->memcg
lookup will always require the rcu read lock anyway.

Finally, a universal page_memcg() should also make uncharge_page()
simpler. Instead of obj_cgroup_memcg_get(), you could use the new
page_memcg() to implement a universal get_mem_cgroup_from_page():

	rcu_read_lock();
retry:
	memcg = page_memcg(page);
	if (unlikely(!css_tryget(&memcg->css)))
		goto retry;
	rcu_read_unlock();
	return memcg;

and then uncharge_page() becomes something like this:

	/* Look up page's memcg & prepare the batch */
	memcg = get_mem_cgroup_from_page(page);
	if (!memcg)
		return;
	if (ug->memcg != memcg) {
		...
		css_get(&memcg->css); /* batch ref, put in uncharge_batch() */
	}
	mem_cgroup_put(memcg);

	/* Add page to batch */
	nr_pages = compound_nr(page);
	...

	/* Clear the page->memcg link */
	if (PageMemcgKmem(page))
		obj_cgroup_put(__page_objcg(page));
	else
		css_put(__page_memcg(&memcg->css));
	page->memcg_data = 0;

Does that sound reasonable?

PS: We have several page_memcg() callsites that could use the raw
__page_memcg() directly for now. Is it worth switching them and saving
the branch? I think probably not, because these paths aren't hot, and
as per above, we should switch them to objcg sooner or later anyway.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM
  2021-03-09 10:07 ` [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Muchun Song
  2021-03-10 19:30   ` Roman Gushchin
  2021-03-12  3:26   ` Shakeel Butt
@ 2021-03-12 19:24   ` Johannes Weiner
  2 siblings, 0 replies; 28+ messages in thread
From: Johannes Weiner @ 2021-03-12 19:24 UTC (permalink / raw)
  To: Muchun Song
  Cc: guro, mhocko, akpm, shakeelb, vdavydov.dev, linux-kernel,
	linux-mm, duanxiongchun

On Tue, Mar 09, 2021 at 06:07:17PM +0800, Muchun Song wrote:
> The page only can be marked as kmem when CONFIG_MEMCG_KMEM is enabled.
> So move PageMemcgKmem() to the scope of the CONFIG_MEMCG_KMEM.
> 
> As a bonus, on !CONFIG_MEMCG_KMEM build some code can be compiled out.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-12 19:23       ` Johannes Weiner
@ 2021-03-12 22:42         ` Shakeel Butt
  2021-03-12 23:07           ` Johannes Weiner
  2021-03-14 13:56         ` Muchun Song
  1 sibling, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2021-03-12 22:42 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Muchun Song, Roman Gushchin, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

Hi Johannes,

On Fri, Mar 12, 2021 at 11:23 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
[...]
>
> Longer term we most likely need it there anyway. The issue you are
> describing in the cover letter - allocations pinning memcgs for a long
> time - it exists at a larger scale and is causing recurring problems
> in the real world: page cache doesn't get reclaimed for a long time,
> or is used by the second, third, fourth, ... instance of the same job
> that was restarted into a new cgroup every time. Unreclaimable dying
> cgroups pile up, waste memory, and make page reclaim very inefficient.
>

For the scenario described above, do we really want to reparent the
page cache pages? Shouldn't we recharge the pages to the second,
third, fourth and so on, memcgs? My concern is that we will see a big
chunk of page cache pages charged to root and will only get reclaimed
on global pressure.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-12 22:42         ` Shakeel Butt
@ 2021-03-12 23:07           ` Johannes Weiner
  2021-03-12 23:18             ` Shakeel Butt
  0 siblings, 1 reply; 28+ messages in thread
From: Johannes Weiner @ 2021-03-12 23:07 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Muchun Song, Roman Gushchin, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Fri, Mar 12, 2021 at 02:42:45PM -0800, Shakeel Butt wrote:
> Hi Johannes,
> 
> On Fri, Mar 12, 2021 at 11:23 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> [...]
> >
> > Longer term we most likely need it there anyway. The issue you are
> > describing in the cover letter - allocations pinning memcgs for a long
> > time - it exists at a larger scale and is causing recurring problems
> > in the real world: page cache doesn't get reclaimed for a long time,
> > or is used by the second, third, fourth, ... instance of the same job
> > that was restarted into a new cgroup every time. Unreclaimable dying
> > cgroups pile up, waste memory, and make page reclaim very inefficient.
> >
> 
> For the scenario described above, do we really want to reparent the
> page cache pages? Shouldn't we recharge the pages to the second,
> third, fourth and so on, memcgs? My concern is that we will see a big
> chunk of page cache pages charged to root and will only get reclaimed
> on global pressure.

Sorry, I'm proposing to reparent to the ancestor, not root. It's an
optimization, not a change in user-visible behavior.

As far as the user can tell, the pages already belong to the parent
after deletion: they'll show up in the parent's stats, naturally, and
they will get reclaimed as part of the parent being reclaimed.

The dead cgroup doesn't even have its own limit anymore after
.css_reset() has run. And we already physically reparent slab objects
in memcg_reparent_objcgs() and memcg_drain_all_list_lrus().

I'm just saying we should do the same thing for LRU pages.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-12 23:07           ` Johannes Weiner
@ 2021-03-12 23:18             ` Shakeel Butt
  0 siblings, 0 replies; 28+ messages in thread
From: Shakeel Butt @ 2021-03-12 23:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Muchun Song, Roman Gushchin, Michal Hocko, Andrew Morton,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Fri, Mar 12, 2021 at 3:07 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, Mar 12, 2021 at 02:42:45PM -0800, Shakeel Butt wrote:
> > Hi Johannes,
> >
> > On Fri, Mar 12, 2021 at 11:23 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > [...]
> > >
> > > Longer term we most likely need it there anyway. The issue you are
> > > describing in the cover letter - allocations pinning memcgs for a long
> > > time - it exists at a larger scale and is causing recurring problems
> > > in the real world: page cache doesn't get reclaimed for a long time,
> > > or is used by the second, third, fourth, ... instance of the same job
> > > that was restarted into a new cgroup every time. Unreclaimable dying
> > > cgroups pile up, waste memory, and make page reclaim very inefficient.
> > >
> >
> > For the scenario described above, do we really want to reparent the
> > page cache pages? Shouldn't we recharge the pages to the second,
> > third, fourth and so on, memcgs? My concern is that we will see a big
> > chunk of page cache pages charged to root and will only get reclaimed
> > on global pressure.
>
> Sorry, I'm proposing to reparent to the ancestor, not root. It's an
> optimization, not a change in user-visible behavior.
>
> As far as the user can tell, the pages already belong to the parent
> after deletion: they'll show up in the parent's stats, naturally, and
> they will get reclaimed as part of the parent being reclaimed.
>
> The dead cgroup doesn't even have its own limit anymore after
> .css_reset() has run. And we already physically reparent slab objects
> in memcg_reparent_objcgs() and memcg_drain_all_list_lrus().
>
> I'm just saying we should do the same thing for LRU pages.

I understand the proposal and I agree it makes total sense when a job
is recycling sub-job/sub-container.

I was talking about the (recycling of the) top level cgroups. Though
for that to be an issue, I suppose the file system has to be shared
between the jobs on the system. I was wondering if a page cache
reaches the root memcg on multiple reparenting, should the next access
cause that page to be charged to the accessor?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page
  2021-03-12 19:23       ` Johannes Weiner
  2021-03-12 22:42         ` Shakeel Butt
@ 2021-03-14 13:56         ` Muchun Song
  1 sibling, 0 replies; 28+ messages in thread
From: Muchun Song @ 2021-03-14 13:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, Michal Hocko, Andrew Morton, Shakeel Butt,
	Vladimir Davydov, LKML, Linux Memory Management List,
	Xiongchun duan

On Sat, Mar 13, 2021 at 3:23 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Hello Muchun,
>
> On Fri, Mar 12, 2021 at 03:14:07PM +0800, Muchun Song wrote:
> > On Thu, Mar 11, 2021 at 9:12 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > > @@ -358,14 +358,26 @@ enum page_memcg_data_flags {
> > > >
> > > >  #define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
> > > >
> > > > +/* Return true for charged page, otherwise false. */
> > > > +static inline bool page_memcg_charged(struct page *page)
> > > > +{
> > > > +     unsigned long memcg_data = page->memcg_data;
> > > > +
> > > > +     VM_BUG_ON_PAGE(PageSlab(page), page);
> > > > +     VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_OBJCGS, page);
> > > > +
> > > > +     return !!memcg_data;
> > > > +}
> > >
> > > This is mosntly used right before a page_memcg_check(), which makes it
> > > somewhat redundant except for the VM_BUG_ON_PAGE() for slab pages.
> >
> > Should I rename page_memcg_charged to page_memcg_raw?
> > And use page_memcg_raw to check whether the page is charged.
> >
> > static inline bool page_memcg_charged(struct page *page)
> > {
> >         return page->memcg_data;
> > }
>
> You can just directly access page->memcg_data in places where you'd
> use this helper. I think it's only the two places in mm/page_alloc.c,
> and they already have CONFIG_MEMCG in place, so raw access works.

OK.

>
> > > But it's also a bit of a confusing name: slab pages are charged too,
> > > but this function would crash if you called it on one.
> > >
> > > In light of this, and in light of what I wrote above about hopefully
> > > converting more and more allocations from raw memcg pins to
> > > reparentable objcg, it would be bettor to have
> > >
> > >         page_memcg() for 1:1 page-memcg mappings, i.e. LRU & kmem
> >
> > Sorry. I do not get the point. Because in the next patch, the kmem
> > page will use objcg to charge memory. So the page_memcg()
> > should not be suitable for the kmem pages. So I add a VM_BUG_ON
> > in the page_memcg() to catch invalid usage.
> >
> > So I changed some page_memcg() calling to page_memcg_check()
> > in this patch, but you suggest using page_memcg().
>
> It would be better if page_memcg() worked on LRU and kmem pages. I'm
> proposing to change its implementation.
>
> The reason is that page_memcg_check() allows everything and does no
> sanity checking. You need page_memcg_charged() for the sanity checks
> that it's LRU or kmem, but that's a bit difficult to understand, and
> it's possible people will add more callsites to page_memcg_check()
> without the page_memcg_charged() checks. It makes the code less safe.
>
> We should discourage page_memcg_check() and make page_memcg() more
> useful instead.
>
> > I am very confused. Are you worried about the extra overhead brought
> > by calling page_memcg_rcu()? In the next patch, I will remove
> > page_memcg_check() calling and use objcg APIs.
>
> I'm just worried about the usability of the interface. It should be
> easy to use, and make it obvious if there is a user bug.
>
> For example, in your next patch, mod_lruvec_page_state does this:
>
>        if (PageMemcgKmem(head)) {
>                rcu_read_lock();
>                memcg = obj_cgroup_memcg(page_objcg(page));
>        } else {
>                memcg = page_memcg(head);
>                /*
>                 * Untracked pages have no memcg, no lruvec. Update only the
>                 * node.
>                 */
>                if (!memcg) {
>                        __mod_node_page_state(pgdat, idx, val);
>                        return;
>                }
>         }
>
>         lruvec = mem_cgroup_lruvec(memcg, pgdat);
>         __mod_lruvec_state(lruvec, idx, val);
>
>        if (PageMemcgKmem(head))
>                rcu_read_unlock();
>
> I'm proposing to implement page_memcg() in a way where you can do this:
>
>         rcu_read_lock();
>         memcg = page_memcg(page);
>         if (!memcg) {
>                 rcu_read_unlock();
>                 __mod_node_page_state();
>                 return;
>         }
>         lruvec = mem_cgroup_lruvec(memcg, pgdat);
>         __mod_lruvec_state(lruvec, idx, val);
>         rcu_read_unlock();
>
> [ page_memcg() is:
>
>         if (PageMemcgKmem(page))
>                 return obj_cgroup_memcg(__page_objcg(page));
>         else
>                 return __page_memcg(page);
>
>   and __page_objcg() and __page_memcg() do the raw page->memcg_data
>   translation and the VM_BUG_ON_PAGE() checks for MEMCG_DATA_* ]

Thanks for your suggestions. I will rework the code like this.

>
> This is a lot simpler and less error prone.
>
> It does take rcu_read_lock() for LRU pages too, which strictly it
> doesn't need to right now. But it's cheap enough (and actually saves a
> branch).
>
> Longer term we most likely need it there anyway. The issue you are
> describing in the cover letter - allocations pinning memcgs for a long
> time - it exists at a larger scale and is causing recurring problems
> in the real world: page cache doesn't get reclaimed for a long time,
> or is used by the second, third, fourth, ... instance of the same job
> that was restarted into a new cgroup every time. Unreclaimable dying
> cgroups pile up, waste memory, and make page reclaim very inefficient.
>
> We likely have to convert LRU pages and most other raw memcg pins to
> the objcg direction to fix this problem, and then the page->memcg
> lookup will always require the rcu read lock anyway.

Yeah. I agree with you. I am doing this (it is already on my todo list).

>
> Finally, a universal page_memcg() should also make uncharge_page()
> simpler. Instead of obj_cgroup_memcg_get(), you could use the new
> page_memcg() to implement a universal get_mem_cgroup_from_page():
>
>         rcu_read_lock();
> retry:
>         memcg = page_memcg(page);
>         if (unlikely(!css_tryget(&memcg->css)))
>                 goto retry;
>         rcu_read_unlock();
>         return memcg;
>
> and then uncharge_page() becomes something like this:
>
>         /* Look up page's memcg & prepare the batch */
>         memcg = get_mem_cgroup_from_page(page);
>         if (!memcg)
>                 return;
>         if (ug->memcg != memcg) {
>                 ...
>                 css_get(&memcg->css); /* batch ref, put in uncharge_batch() */
>         }
>         mem_cgroup_put(memcg);
>
>         /* Add page to batch */
>         nr_pages = compound_nr(page);
>         ...
>
>         /* Clear the page->memcg link */
>         if (PageMemcgKmem(page))
>                 obj_cgroup_put(__page_objcg(page));
>         else
>                 css_put(__page_memcg(&memcg->css));
>         page->memcg_data = 0;
>
> Does that sound reasonable?

Make sense to me.

>
> PS: We have several page_memcg() callsites that could use the raw
> __page_memcg() directly for now. Is it worth switching them and saving
> the branch? I think probably not, because these paths aren't hot, and
> as per above, we should switch them to objcg sooner or later anyway.

Got it.

Very thanks for your explanation.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-03-14 13:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-09 10:07 [PATCH v3 0/4] Use obj_cgroup APIs to charge kmem pages Muchun Song
2021-03-09 10:07 ` [PATCH v3 1/4] mm: memcontrol: introduce obj_cgroup_{un}charge_pages Muchun Song
2021-03-11 12:30   ` Johannes Weiner
2021-03-11 18:56   ` Shakeel Butt
2021-03-09 10:07 ` [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page Muchun Song
2021-03-10 19:57   ` Roman Gushchin
2021-03-11  6:45     ` [External] " Muchun Song
2021-03-11 13:12   ` Johannes Weiner
2021-03-12  7:14     ` [External] " Muchun Song
2021-03-12 19:23       ` Johannes Weiner
2021-03-12 22:42         ` Shakeel Butt
2021-03-12 23:07           ` Johannes Weiner
2021-03-12 23:18             ` Shakeel Butt
2021-03-14 13:56         ` Muchun Song
2021-03-12  3:22   ` Shakeel Butt
2021-03-12  5:02     ` [External] " Muchun Song
2021-03-09 10:07 ` [PATCH v3 3/4] mm: memcontrol: use obj_cgroup APIs to charge kmem pages Muchun Song
2021-03-10 19:53   ` Roman Gushchin
2021-03-11  6:50     ` [External] " Muchun Song
2021-03-10 22:05   ` Johannes Weiner
2021-03-12  9:22     ` [External] " Muchun Song
2021-03-12 15:59       ` Johannes Weiner
2021-03-12 16:46         ` Muchun Song
2021-03-11 17:48   ` kernel test robot
2021-03-09 10:07 ` [PATCH v3 4/4] mm: memcontrol: move PageMemcgKmem to the scope of CONFIG_MEMCG_KMEM Muchun Song
2021-03-10 19:30   ` Roman Gushchin
2021-03-12  3:26   ` Shakeel Butt
2021-03-12 19:24   ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).