LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* memcg memory accounting in vmalloc is broken
@ 2021-10-07  8:04 Vasily Averin
  2021-10-07  8:13 ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Vasily Averin @ 2021-10-07  8:04 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Cgroups, Linux MM, linux-kernel, kernel

vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.

Is it known issue perhaps?

thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: memcg memory accounting in vmalloc is broken
  2021-10-07  8:04 memcg memory accounting in vmalloc is broken Vasily Averin
@ 2021-10-07  8:13 ` Michal Hocko
  2021-10-07  8:16   ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2021-10-07  8:13 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups,
	Linux MM, linux-kernel, kernel

On Thu 07-10-21 11:04:40, Vasily Averin wrote:
> vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.
> 
> Is it known issue perhaps?

No, I think this was just overlooked. Definitely doesn't look
intentional to me.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: memcg memory accounting in vmalloc is broken
  2021-10-07  8:13 ` Michal Hocko
@ 2021-10-07  8:16   ` Michal Hocko
  2021-10-07  8:50     ` Vasily Averin
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2021-10-07  8:16 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups,
	Linux MM, linux-kernel, kernel, Mel Gorman, Uladzislau Rezki

Cc Mel and Uladzislau

On Thu 07-10-21 10:13:23, Michal Hocko wrote:
> On Thu 07-10-21 11:04:40, Vasily Averin wrote:
> > vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.
> > 
> > Is it known issue perhaps?
> 
> No, I think this was just overlooked. Definitely doesn't look
> intentional to me.
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: memcg memory accounting in vmalloc is broken
  2021-10-07  8:16   ` Michal Hocko
@ 2021-10-07  8:50     ` Vasily Averin
  2021-10-07 10:08       ` Michal Hocko
                         ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Vasily Averin @ 2021-10-07  8:50 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups,
	Linux MM, linux-kernel, kernel, Mel Gorman, Uladzislau Rezki

On 10/7/21 11:16 AM, Michal Hocko wrote:
> Cc Mel and Uladzislau
> 
> On Thu 07-10-21 10:13:23, Michal Hocko wrote:
>> On Thu 07-10-21 11:04:40, Vasily Averin wrote:
>>> vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.
>>>
>>> Is it known issue perhaps?
>>
>> No, I think this was just overlooked. Definitely doesn't look
>> intentional to me.

I use following patch as a quick fix,
it helps though it is far from ideal and can be optimized.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..e6abe2cac159 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5290,6 +5290,12 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 		page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
 								pcp, pcp_list);
+
+		if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT) && page &&
+		    unlikely(__memcg_kmem_charge_page(page, gfp, 0) != 0)) {
+			__free_pages(page, 0);
+			page = NULL;
+		}
 		if (unlikely(!page)) {
 			/* Try and get at least one page */
 			if (!nr_populated)
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: memcg memory accounting in vmalloc is broken
  2021-10-07  8:50     ` Vasily Averin
@ 2021-10-07 10:08       ` Michal Hocko
  2021-10-07 10:20       ` Mel Gorman
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2021-10-07 10:08 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups,
	Linux MM, linux-kernel, kernel, Mel Gorman, Uladzislau Rezki

On Thu 07-10-21 11:50:44, Vasily Averin wrote:
> On 10/7/21 11:16 AM, Michal Hocko wrote:
> > Cc Mel and Uladzislau
> > 
> > On Thu 07-10-21 10:13:23, Michal Hocko wrote:
> >> On Thu 07-10-21 11:04:40, Vasily Averin wrote:
> >>> vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.
> >>>
> >>> Is it known issue perhaps?
> >>
> >> No, I think this was just overlooked. Definitely doesn't look
> >> intentional to me.
> 
> I use following patch as a quick fix,
> it helps though it is far from ideal and can be optimized.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b37435c274cf..e6abe2cac159 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5290,6 +5290,12 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  
>  		page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
>  								pcp, pcp_list);
> +
> +		if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT) && page &&
> +		    unlikely(__memcg_kmem_charge_page(page, gfp, 0) != 0)) {
> +			__free_pages(page, 0);
> +			page = NULL;
> +		}
>  		if (unlikely(!page)) {
>  			/* Try and get at least one page */
>  			if (!nr_populated)
> -- 
> 2.31.1

Yes, this makes sense to me.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: memcg memory accounting in vmalloc is broken
  2021-10-07  8:50     ` Vasily Averin
  2021-10-07 10:08       ` Michal Hocko
@ 2021-10-07 10:20       ` Mel Gorman
  2021-10-07 14:02         ` Vlastimil Babka
  2021-10-07 14:00       ` Vlastimil Babka
  2021-10-07 19:33       ` Vasily Averin
  3 siblings, 1 reply; 28+ messages in thread
From: Mel Gorman @ 2021-10-07 10:20 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Cgroups, Linux MM, linux-kernel, kernel, Mel Gorman,
	Uladzislau Rezki

On Thu, Oct 07, 2021 at 11:50:44AM +0300, Vasily Averin wrote:
> On 10/7/21 11:16 AM, Michal Hocko wrote:
> > Cc Mel and Uladzislau
> > 
> > On Thu 07-10-21 10:13:23, Michal Hocko wrote:
> >> On Thu 07-10-21 11:04:40, Vasily Averin wrote:
> >>> vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.
> >>>
> >>> Is it known issue perhaps?
> >>
> >> No, I think this was just overlooked. Definitely doesn't look
> >> intentional to me.
> 
> I use following patch as a quick fix,
> it helps though it is far from ideal and can be optimized.

Thanks Vasily.

This papers over the problem but it could certainly be optimized. At
minimum;

1. Test (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) in the
   function preamble and store the result in a bool
2. Avoid the temptation to batch the accounting because if the
   accounting fails, there is no information on how many pages could be
   allocated before the limits were hit. I guess you could pre-charge the
   pages and uncharging the number of pages that failed to be allocated
   but it should be a separate patch.
3. If an allocation fails due to memcg accounting, break
   out of the loop because all remaining bulk allocations are
   also likely to fail.

As it's not vmalloc's fault, I would suggest the patch
have
Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
and
Cc: <stable@vger.kernel.org>

Note the Cc should just be in the patch and not mailed directly to
stable@ as it'll simply trigger a form letter about the patch having to
be merged to mainline first.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: memcg memory accounting in vmalloc is broken
  2021-10-07  8:50     ` Vasily Averin
  2021-10-07 10:08       ` Michal Hocko
  2021-10-07 10:20       ` Mel Gorman
@ 2021-10-07 14:00       ` Vlastimil Babka
  2021-10-07 14:09         ` Michal Hocko
  2021-10-07 19:33       ` Vasily Averin
  3 siblings, 1 reply; 28+ messages in thread
From: Vlastimil Babka @ 2021-10-07 14:00 UTC (permalink / raw)
  To: Vasily Averin, Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups,
	Linux MM, linux-kernel, kernel, Mel Gorman, Uladzislau Rezki

On 10/7/21 10:50, Vasily Averin wrote:
> On 10/7/21 11:16 AM, Michal Hocko wrote:
>> Cc Mel and Uladzislau
>> 
>> On Thu 07-10-21 10:13:23, Michal Hocko wrote:
>>> On Thu 07-10-21 11:04:40, Vasily Averin wrote:
>>>> vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.
>>>>
>>>> Is it known issue perhaps?
>>>
>>> No, I think this was just overlooked. Definitely doesn't look
>>> intentional to me.
> 
> I use following patch as a quick fix,
> it helps though it is far from ideal and can be optimized.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b37435c274cf..e6abe2cac159 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5290,6 +5290,12 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  
>  		page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
>  								pcp, pcp_list);
> +
> +		if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT) && page &&
> +		    unlikely(__memcg_kmem_charge_page(page, gfp, 0) != 0)) {
> +			__free_pages(page, 0);

It's too early for this here, we didn't go through prep_new_page() yet,
minimally the post_alloc_hook() -> set_page_refcounted() part, required for
put_page_testzero() in __free_pages() to work properly, and probably other
stuff.
Either do the full prep+free or a minimal reversion of  __rmqueue_pcplist -
something to just put the page back to pcplist. Probably the former so we
don't introduce subtle errors.

> +			page = NULL;
> +		}
>  		if (unlikely(!page)) {
>  			/* Try and get at least one page */
>  			if (!nr_populated)
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: memcg memory accounting in vmalloc is broken
  2021-10-07 10:20       ` Mel Gorman
@ 2021-10-07 14:02         ` Vlastimil Babka
  0 siblings, 0 replies; 28+ messages in thread
From: Vlastimil Babka @ 2021-10-07 14:02 UTC (permalink / raw)
  To: Mel Gorman, Vasily Averin
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Cgroups, Linux MM, linux-kernel, kernel, Mel Gorman,
	Uladzislau Rezki

On 10/7/21 12:20, Mel Gorman wrote:
> On Thu, Oct 07, 2021 at 11:50:44AM +0300, Vasily Averin wrote:
>> On 10/7/21 11:16 AM, Michal Hocko wrote:
>> > Cc Mel and Uladzislau
>> > 
>> > On Thu 07-10-21 10:13:23, Michal Hocko wrote:
>> >> On Thu 07-10-21 11:04:40, Vasily Averin wrote:
>> >>> vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.
>> >>>
>> >>> Is it known issue perhaps?
>> >>
>> >> No, I think this was just overlooked. Definitely doesn't look
>> >> intentional to me.
>> 
>> I use following patch as a quick fix,
>> it helps though it is far from ideal and can be optimized.
> 
> Thanks Vasily.
> 
> This papers over the problem but it could certainly be optimized. At
> minimum;
> 
> 1. Test (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) in the
>    function preamble and store the result in a bool

memcg_kmem_enabled() is a static key based check so caching that defeats its
purpose and changes it to a normal branch. That leaves gfp & __GFP_ACCOUNT,
which may perhaps still benefit from such caching.

> 2. Avoid the temptation to batch the accounting because if the
>    accounting fails, there is no information on how many pages could be
>    allocated before the limits were hit. I guess you could pre-charge the
>    pages and uncharging the number of pages that failed to be allocated
>    but it should be a separate patch.
> 3. If an allocation fails due to memcg accounting, break
>    out of the loop because all remaining bulk allocations are
>    also likely to fail.
> 
> As it's not vmalloc's fault, I would suggest the patch
> have
> Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
> and
> Cc: <stable@vger.kernel.org>
> 
> Note the Cc should just be in the patch and not mailed directly to
> stable@ as it'll simply trigger a form letter about the patch having to
> be merged to mainline first.
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: memcg memory accounting in vmalloc is broken
  2021-10-07 14:00       ` Vlastimil Babka
@ 2021-10-07 14:09         ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2021-10-07 14:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Vasily Averin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Cgroups, Linux MM, linux-kernel, kernel, Mel Gorman,
	Uladzislau Rezki

On Thu 07-10-21 16:00:21, Vlastimil Babka wrote:
> On 10/7/21 10:50, Vasily Averin wrote:
> > On 10/7/21 11:16 AM, Michal Hocko wrote:
> >> Cc Mel and Uladzislau
> >> 
> >> On Thu 07-10-21 10:13:23, Michal Hocko wrote:
> >>> On Thu 07-10-21 11:04:40, Vasily Averin wrote:
> >>>> vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.
> >>>>
> >>>> Is it known issue perhaps?
> >>>
> >>> No, I think this was just overlooked. Definitely doesn't look
> >>> intentional to me.
> > 
> > I use following patch as a quick fix,
> > it helps though it is far from ideal and can be optimized.
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index b37435c274cf..e6abe2cac159 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5290,6 +5290,12 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> >  
> >  		page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
> >  								pcp, pcp_list);
> > +
> > +		if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT) && page &&
> > +		    unlikely(__memcg_kmem_charge_page(page, gfp, 0) != 0)) {
> > +			__free_pages(page, 0);
> 
> It's too early for this here, we didn't go through prep_new_page() yet,
> minimally the post_alloc_hook() -> set_page_refcounted() part, required for
> put_page_testzero() in __free_pages() to work properly, and probably other
> stuff.
> Either do the full prep+free or a minimal reversion of  __rmqueue_pcplist -
> something to just put the page back to pcplist. Probably the former so we
> don't introduce subtle errors.

Very well spotted!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: memcg memory accounting in vmalloc is broken
  2021-10-07  8:50     ` Vasily Averin
                         ` (2 preceding siblings ...)
  2021-10-07 14:00       ` Vlastimil Babka
@ 2021-10-07 19:33       ` Vasily Averin
  2021-10-08  9:23         ` [PATCH memcg] memcg: enable memory accounting in __alloc_pages_bulk Vasily Averin
  3 siblings, 1 reply; 28+ messages in thread
From: Vasily Averin @ 2021-10-07 19:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Cgroups,
	Linux MM, linux-kernel, kernel, Mel Gorman, Uladzislau Rezki

On 10/7/21 11:50 AM, Vasily Averin wrote:
> On 10/7/21 11:16 AM, Michal Hocko wrote:
>> Cc Mel and Uladzislau
>>
>> On Thu 07-10-21 10:13:23, Michal Hocko wrote:
>>> On Thu 07-10-21 11:04:40, Vasily Averin wrote:
>>>> vmalloc was switched to __alloc_pages_bulk but it does not account the memory to memcg.
>>>>
>>>> Is it known issue perhaps?
>>>
>>> No, I think this was just overlooked. Definitely doesn't look
>>> intentional to me.
> 
> I use following patch as a quick fix,
> it helps though it is far from ideal and can be optimized.

Dear all,
thanks a lot for your comments and suggestions!

Unfortunately current patch version does not work properly.
We cannot charge each page in this cycle, with taken pagesets spinlock,
lockdep reports about possible deadlock.
Trying to use pre-charge, as suggested by Mel Gorman.
 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b37435c274cf..e6abe2cac159 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5290,6 +5290,12 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  
>  		page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
>  								pcp, pcp_list);
> +
> +		if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT) && page &&
> +		    unlikely(__memcg_kmem_charge_page(page, gfp, 0) != 0)) {
> +			__free_pages(page, 0);
> +			page = NULL;
> +		}
>  		if (unlikely(!page)) {
>  			/* Try and get at least one page */
>  			if (!nr_populated)
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH memcg] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-07 19:33       ` Vasily Averin
@ 2021-10-08  9:23         ` Vasily Averin
  2021-10-08 17:35           ` Shakeel Butt
  0 siblings, 1 reply; 28+ messages in thread
From: Vasily Averin @ 2021-10-08  9:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, cgroups,
	linux-mm, linux-kernel, kernel, Mel Gorman, Uladzislau Rezki,
	Vlastimil Babka

Enable memory accounting for bulk page allocator.

Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
Cc: <stable@vger.kernel.org>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
 mm/page_alloc.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..602819a232e5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5172,6 +5172,55 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 	return true;
 }
 
+#ifdef CONFIG_MEMCG_KMEM
+static bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
+					unsigned int nr_pages)
+{
+	struct obj_cgroup *objcg = NULL;
+
+	if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT))
+		return true;
+
+	objcg = get_obj_cgroup_from_current();
+
+	if (objcg && obj_cgroup_charge(objcg, gfp, nr_pages << PAGE_SHIFT)) {
+		obj_cgroup_put(objcg);
+		return false;
+	}
+	obj_cgroup_get_many(objcg, nr_pages);
+	*objcgp = objcg;
+	return true;
+}
+
+static void memcg_bulk_charge_hook(struct obj_cgroup *objcg,
+					struct page *page)
+{
+	page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM;
+}
+
+static void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
+					unsigned int nr_pages)
+{
+	obj_cgroup_uncharge(objcg, nr_pages << PAGE_SHIFT);
+	percpu_ref_put_many(&objcg->refcnt, nr_pages + 1);
+}
+#else
+static bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
+					unsigned int nr_pages)
+{
+	return true;
+}
+
+static void memcg_bulk_charge_hook(struct obj_cgroup *objcgp,
+					struct page *page)
+{
+}
+
+static void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
+					unsigned int nr_pages)
+{
+}
+#endif
 /*
  * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array
  * @gfp: GFP flags for the allocation
@@ -5207,6 +5256,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	gfp_t alloc_gfp;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
 	int nr_populated = 0, nr_account = 0;
+	unsigned int nr_pre_charge = 0;
+	struct obj_cgroup *objcg = NULL;
 
 	/*
 	 * Skip populated array elements to determine if any pages need
@@ -5275,6 +5326,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	if (unlikely(!zone))
 		goto failed;
 
+	nr_pre_charge = nr_pages - nr_populated;
+	if (!memcg_bulk_pre_charge_hook(&objcg, gfp, nr_pre_charge))
+		goto failed;
+
 	/* Attempt the batch allocation */
 	local_lock_irqsave(&pagesets.lock, flags);
 	pcp = this_cpu_ptr(zone->per_cpu_pageset);
@@ -5287,9 +5342,9 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 			nr_populated++;
 			continue;
 		}
-
 		page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
 								pcp, pcp_list);
+
 		if (unlikely(!page)) {
 			/* Try and get at least one page */
 			if (!nr_populated)
@@ -5297,6 +5352,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 			break;
 		}
 		nr_account++;
+		if (objcg)
+			memcg_bulk_charge_hook(objcg, page);
 
 		prep_new_page(page, 0, gfp, 0);
 		if (page_list)
@@ -5310,13 +5367,16 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
 	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
+	if (objcg)
+		memcg_bulk_post_charge_hook(objcg, nr_pre_charge - nr_account);
 
 out:
 	return nr_populated;
 
 failed_irq:
 	local_unlock_irqrestore(&pagesets.lock, flags);
-
+	if (objcg)
+		memcg_bulk_post_charge_hook(objcg, nr_pre_charge);
 failed:
 	page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
 	if (page) {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH memcg] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-08  9:23         ` [PATCH memcg] memcg: enable memory accounting in __alloc_pages_bulk Vasily Averin
@ 2021-10-08 17:35           ` Shakeel Butt
  2021-10-12 10:18             ` [PATCH mm v2] " Vasily Averin
  0 siblings, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2021-10-08 17:35 UTC (permalink / raw)
  To: Vasily Averin, Roman Gushchin
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Cgroups, Linux MM, LKML, kernel, Mel Gorman, Uladzislau Rezki,
	Vlastimil Babka

+Roman

On Fri, Oct 8, 2021 at 2:23 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>
> Enable memory accounting for bulk page allocator.
>
> Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
>  mm/page_alloc.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 62 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b37435c274cf..602819a232e5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5172,6 +5172,55 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
>         return true;
>  }
>

Please move the following memcg functions to memcontrol.[h|c] files.

> +#ifdef CONFIG_MEMCG_KMEM
> +static bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
> +                                       unsigned int nr_pages)
> +{
> +       struct obj_cgroup *objcg = NULL;
> +
> +       if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT))
> +               return true;
> +
> +       objcg = get_obj_cgroup_from_current();
> +
> +       if (objcg && obj_cgroup_charge(objcg, gfp, nr_pages << PAGE_SHIFT)) {

Please use obj_cgroup_charge_pages() when you move this code to memcontrol.c

> +               obj_cgroup_put(objcg);
> +               return false;
> +       }
> +       obj_cgroup_get_many(objcg, nr_pages);
> +       *objcgp = objcg;
> +       return true;
> +}
> +
> +static void memcg_bulk_charge_hook(struct obj_cgroup *objcg,
> +                                       struct page *page)
> +{
> +       page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM;
> +}
> +
> +static void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
> +                                       unsigned int nr_pages)
> +{
> +       obj_cgroup_uncharge(objcg, nr_pages << PAGE_SHIFT);
> +       percpu_ref_put_many(&objcg->refcnt, nr_pages + 1);

Introduce the obj_cgroup_put_many() and you don't need to keep the
extra ref from the pre hook i.e. put the ref in the pre hook.

> +}
> +#else
> +static bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
> +                                       unsigned int nr_pages)
> +{
> +       return true;
> +}
> +
> +static void memcg_bulk_charge_hook(struct obj_cgroup *objcgp,
> +                                       struct page *page)
> +{
> +}
> +
> +static void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
> +                                       unsigned int nr_pages)
> +{
> +}
> +#endif
>  /*
>   * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array
>   * @gfp: GFP flags for the allocation
> @@ -5207,6 +5256,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>         gfp_t alloc_gfp;
>         unsigned int alloc_flags = ALLOC_WMARK_LOW;
>         int nr_populated = 0, nr_account = 0;
> +       unsigned int nr_pre_charge = 0;
> +       struct obj_cgroup *objcg = NULL;
>
>         /*
>          * Skip populated array elements to determine if any pages need
> @@ -5275,6 +5326,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>         if (unlikely(!zone))
>                 goto failed;
>
> +       nr_pre_charge = nr_pages - nr_populated;
> +       if (!memcg_bulk_pre_charge_hook(&objcg, gfp, nr_pre_charge))
> +               goto failed;
> +
>         /* Attempt the batch allocation */
>         local_lock_irqsave(&pagesets.lock, flags);
>         pcp = this_cpu_ptr(zone->per_cpu_pageset);
> @@ -5287,9 +5342,9 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>                         nr_populated++;
>                         continue;
>                 }
> -
>                 page = __rmqueue_pcplist(zone, 0, ac.migratetype, alloc_flags,
>                                                                 pcp, pcp_list);
> +
>                 if (unlikely(!page)) {
>                         /* Try and get at least one page */
>                         if (!nr_populated)
> @@ -5297,6 +5352,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>                         break;
>                 }
>                 nr_account++;
> +               if (objcg)
> +                       memcg_bulk_charge_hook(objcg, page);

Logically this above should be after prep_new_page().

>
>                 prep_new_page(page, 0, gfp, 0);
>                 if (page_list)
> @@ -5310,13 +5367,16 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>
>         __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
>         zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
> +       if (objcg)
> +               memcg_bulk_post_charge_hook(objcg, nr_pre_charge - nr_account);
>
>  out:
>         return nr_populated;
>
>  failed_irq:
>         local_unlock_irqrestore(&pagesets.lock, flags);
> -
> +       if (objcg)
> +               memcg_bulk_post_charge_hook(objcg, nr_pre_charge);
>  failed:
>         page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
>         if (page) {
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH mm v2] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-08 17:35           ` Shakeel Butt
@ 2021-10-12 10:18             ` Vasily Averin
  2021-10-12 13:10               ` Mel Gorman
  0 siblings, 1 reply; 28+ messages in thread
From: Vasily Averin @ 2021-10-12 10:18 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Shakeel Butt
  Cc: Roman Gushchin, Mel Gorman, Uladzislau Rezki, Vlastimil Babka,
	cgroups, linux-mm, linux-kernel, kernel

Enable memory accounting for bulk page allocator.

Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
Cc: <stable@vger.kernel.org>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
v2: modified according to Shakeel Butt's remarks
---
 include/linux/memcontrol.h | 11 +++++++++
 mm/memcontrol.c            | 48 +++++++++++++++++++++++++++++++++++++-
 mm/page_alloc.c            | 14 ++++++++++-
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3096c9a0ee01..990acd70c846 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -810,6 +810,12 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 	percpu_ref_put(&objcg->refcnt);
 }
 
+static inline void obj_cgroup_put_many(struct obj_cgroup *objcg,
+				       unsigned long nr)
+{
+	percpu_ref_put_many(&objcg->refcnt, nr);
+}
+
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 	if (memcg)
@@ -1746,4 +1752,9 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
 
 #endif /* CONFIG_MEMCG_KMEM */
 
+bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
+				unsigned int nr_pages);
+void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page);
+void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
+				 unsigned int nr_pages);
 #endif /* _LINUX_MEMCONTROL_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 87e41c3cac10..16fe3384c12c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3239,7 +3239,53 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
 	refill_obj_stock(objcg, size, true);
 }
 
-#endif /* CONFIG_MEMCG_KMEM */
+bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
+				unsigned int nr_pages)
+{
+	struct obj_cgroup *objcg = NULL;
+
+	if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT))
+		return true;
+
+	objcg = get_obj_cgroup_from_current();
+
+	if (objcg && obj_cgroup_charge_pages(objcg, gfp, nr_pages)) {
+		obj_cgroup_put(objcg);
+		return false;
+	}
+	obj_cgroup_get_many(objcg, nr_pages - 1);
+	*objcgp = objcg;
+	return true;
+}
+
+void memcg_bulk_charge_hook(struct obj_cgroup *objcg, struct page *page)
+{
+	page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM;
+}
+
+void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
+				 unsigned int nr_pages)
+{
+	obj_cgroup_uncharge_pages(objcg, nr_pages);
+	obj_cgroup_put_many(objcg, nr_pages);
+}
+#else /* !CONFIG_MEMCG_KMEM */
+bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
+				unsigned int nr_pages)
+{
+	return true;
+}
+
+void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page)
+{
+}
+
+void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
+				 unsigned int nr_pages)
+{
+}
+#endif
+
 
 /*
  * Because page_memcg(head) is not set on tails, set it now.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..eb37177bf507 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5207,6 +5207,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	gfp_t alloc_gfp;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
 	int nr_populated = 0, nr_account = 0;
+	unsigned int nr_pre_charge = 0;
+	struct obj_cgroup *objcg = NULL;
 
 	/*
 	 * Skip populated array elements to determine if any pages need
@@ -5275,6 +5277,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	if (unlikely(!zone))
 		goto failed;
 
+	nr_pre_charge = nr_pages - nr_populated;
+	if (!memcg_bulk_pre_charge_hook(&objcg, gfp, nr_pre_charge))
+		goto failed;
+
 	/* Attempt the batch allocation */
 	local_lock_irqsave(&pagesets.lock, flags);
 	pcp = this_cpu_ptr(zone->per_cpu_pageset);
@@ -5299,6 +5305,9 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 		nr_account++;
 
 		prep_new_page(page, 0, gfp, 0);
+		if (objcg)
+			memcg_bulk_charge_hook(objcg, page);
+
 		if (page_list)
 			list_add(&page->lru, page_list);
 		else
@@ -5310,13 +5319,16 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
 	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
+	if (objcg)
+		memcg_bulk_post_charge_hook(objcg, nr_pre_charge - nr_account);
 
 out:
 	return nr_populated;
 
 failed_irq:
 	local_unlock_irqrestore(&pagesets.lock, flags);
-
+	if (objcg)
+		memcg_bulk_post_charge_hook(objcg, nr_pre_charge);
 failed:
 	page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
 	if (page) {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v2] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 10:18             ` [PATCH mm v2] " Vasily Averin
@ 2021-10-12 13:10               ` Mel Gorman
  2021-10-12 13:40                 ` Vasily Averin
  0 siblings, 1 reply; 28+ messages in thread
From: Mel Gorman @ 2021-10-12 13:10 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	cgroups, linux-mm, linux-kernel, kernel

On Tue, Oct 12, 2021 at 01:18:39PM +0300, Vasily Averin wrote:
> Enable memory accounting for bulk page allocator.
> 
> Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
> v2: modified according to Shakeel Butt's remarks
> ---
>  include/linux/memcontrol.h | 11 +++++++++
>  mm/memcontrol.c            | 48 +++++++++++++++++++++++++++++++++++++-
>  mm/page_alloc.c            | 14 ++++++++++-
>  3 files changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 3096c9a0ee01..990acd70c846 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -810,6 +810,12 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
>  	percpu_ref_put(&objcg->refcnt);
>  }
>  
> +static inline void obj_cgroup_put_many(struct obj_cgroup *objcg,
> +				       unsigned long nr)
> +{
> +	percpu_ref_put_many(&objcg->refcnt, nr);
> +}
> +
>  static inline void mem_cgroup_put(struct mem_cgroup *memcg)
>  {
>  	if (memcg)
> @@ -1746,4 +1752,9 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
>  
>  #endif /* CONFIG_MEMCG_KMEM */
>  
> +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
> +				unsigned int nr_pages);
> +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page);
> +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
> +				 unsigned int nr_pages);
>  #endif /* _LINUX_MEMCONTROL_H */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 87e41c3cac10..16fe3384c12c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3239,7 +3239,53 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
>  	refill_obj_stock(objcg, size, true);
>  }
>  
> -#endif /* CONFIG_MEMCG_KMEM */
> +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
> +				unsigned int nr_pages)
> +{
> +	struct obj_cgroup *objcg = NULL;
> +
> +	if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT))
> +		return true;
> +
> +	objcg = get_obj_cgroup_from_current();
> +
> +	if (objcg && obj_cgroup_charge_pages(objcg, gfp, nr_pages)) {
> +		obj_cgroup_put(objcg);
> +		return false;
> +	}
> +	obj_cgroup_get_many(objcg, nr_pages - 1);
> +	*objcgp = objcg;
> +	return true;
> +}
> +

This is probably a stupid question but why is it necessary to get many
references instead of taking one reference here and dropping it in
memcg_bulk_post_charge_hook?

> +void memcg_bulk_charge_hook(struct obj_cgroup *objcg, struct page *page)
> +{
> +	page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM;
> +}
> +
> +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
> +				 unsigned int nr_pages)
> +{
> +	obj_cgroup_uncharge_pages(objcg, nr_pages);
> +	obj_cgroup_put_many(objcg, nr_pages);
> +}

And are you sure obj_cgroup_uncharge_pages should be called here? I
thought the pages get uncharged on free.

> +#else /* !CONFIG_MEMCG_KMEM */
> +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
> +				unsigned int nr_pages)
> +{
> +	return true;
> +}
> +
> +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page)
> +{
> +}
> +
> +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
> +				 unsigned int nr_pages)
> +{
> +}
> +#endif
> +
>  
>  /*
>   * Because page_memcg(head) is not set on tails, set it now.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b37435c274cf..eb37177bf507 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5207,6 +5207,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  	gfp_t alloc_gfp;
>  	unsigned int alloc_flags = ALLOC_WMARK_LOW;
>  	int nr_populated = 0, nr_account = 0;
> +	unsigned int nr_pre_charge = 0;
> +	struct obj_cgroup *objcg = NULL;
>  
>  	/*
>  	 * Skip populated array elements to determine if any pages need
> @@ -5275,6 +5277,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  	if (unlikely(!zone))
>  		goto failed;
>  
> +	nr_pre_charge = nr_pages - nr_populated;
> +	if (!memcg_bulk_pre_charge_hook(&objcg, gfp, nr_pre_charge))
> +		goto failed;
> +
>  	/* Attempt the batch allocation */
>  	local_lock_irqsave(&pagesets.lock, flags);
>  	pcp = this_cpu_ptr(zone->per_cpu_pageset);
> @@ -5299,6 +5305,9 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  		nr_account++;
>  
>  		prep_new_page(page, 0, gfp, 0);
> +		if (objcg)
> +			memcg_bulk_charge_hook(objcg, page);
> +
>  		if (page_list)
>  			list_add(&page->lru, page_list);
>  		else
> @@ -5310,13 +5319,16 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  
>  	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
>  	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
> +	if (objcg)
> +		memcg_bulk_post_charge_hook(objcg, nr_pre_charge - nr_account);
>  
>  out:
>  	return nr_populated;
>  
>  failed_irq:
>  	local_unlock_irqrestore(&pagesets.lock, flags);
> -
> +	if (objcg)
> +		memcg_bulk_post_charge_hook(objcg, nr_pre_charge);
>  failed:
>  	page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
>  	if (page) {
> -- 
> 2.31.1
> 

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v2] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 13:10               ` Mel Gorman
@ 2021-10-12 13:40                 ` Vasily Averin
  2021-10-12 14:58                   ` [PATCH mm v3] " Vasily Averin
  2021-10-14  8:02                   ` [PATCH mm v5] " Vasily Averin
  0 siblings, 2 replies; 28+ messages in thread
From: Vasily Averin @ 2021-10-12 13:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	cgroups, linux-mm, linux-kernel, kernel

On 12.10.2021 16:10, Mel Gorman wrote:
> On Tue, Oct 12, 2021 at 01:18:39PM +0300, Vasily Averin wrote:
>> Enable memory accounting for bulk page allocator.
>>
>> Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
>> ---
>> v2: modified according to Shakeel Butt's remarks
>> ---
>>  include/linux/memcontrol.h | 11 +++++++++
>>  mm/memcontrol.c            | 48 +++++++++++++++++++++++++++++++++++++-
>>  mm/page_alloc.c            | 14 ++++++++++-
>>  3 files changed, 71 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 3096c9a0ee01..990acd70c846 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -810,6 +810,12 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
>>  	percpu_ref_put(&objcg->refcnt);
>>  }
>>  
>> +static inline void obj_cgroup_put_many(struct obj_cgroup *objcg,
>> +				       unsigned long nr)
>> +{
>> +	percpu_ref_put_many(&objcg->refcnt, nr);
>> +}
>> +
>>  static inline void mem_cgroup_put(struct mem_cgroup *memcg)
>>  {
>>  	if (memcg)
>> @@ -1746,4 +1752,9 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
>>  
>>  #endif /* CONFIG_MEMCG_KMEM */
>>  
>> +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
>> +				unsigned int nr_pages);
>> +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page);
>> +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
>> +				 unsigned int nr_pages);
>>  #endif /* _LINUX_MEMCONTROL_H */
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 87e41c3cac10..16fe3384c12c 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3239,7 +3239,53 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
>>  	refill_obj_stock(objcg, size, true);
>>  }
>>  
>> -#endif /* CONFIG_MEMCG_KMEM */
>> +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
>> +				unsigned int nr_pages)
>> +{
>> +	struct obj_cgroup *objcg = NULL;
>> +
>> +	if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT))
>> +		return true;
>> +
>> +	objcg = get_obj_cgroup_from_current();
>> +
>> +	if (objcg && obj_cgroup_charge_pages(objcg, gfp, nr_pages)) {
>> +		obj_cgroup_put(objcg);
>> +		return false;
>> +	}
>> +	obj_cgroup_get_many(objcg, nr_pages - 1);
>> +	*objcgp = objcg;
>> +	return true;
>> +}
>> +
> 
> This is probably a stupid question but why is it necessary to get many
> references instead of taking one reference here and dropping it in
> memcg_bulk_post_charge_hook?

Each allocated page keeps the referenece and releases it on free.

>> +void memcg_bulk_charge_hook(struct obj_cgroup *objcg, struct page *page)
>> +{
>> +	page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM;
>> +}
>> +
>> +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
>> +				 unsigned int nr_pages)
>> +{
>> +	obj_cgroup_uncharge_pages(objcg, nr_pages);
>> +	obj_cgroup_put_many(objcg, nr_pages);
>> +}
> 
> And are you sure obj_cgroup_uncharge_pages should be called here? I
> thought the pages get uncharged on free.

Here we decrement counters for non-allocated but pre-charged pages.
If all pre-chared pages are allocated we release noting.
However in this case we will be called with nr_pages = 0,
and I think this time it makes sense to add this check.
I will do it in next patch version.

>> +#else /* !CONFIG_MEMCG_KMEM */
>> +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
>> +				unsigned int nr_pages)
>> +{
>> +	return true;
>> +}
>> +
>> +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page)
>> +{
>> +}
>> +
>> +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
>> +				 unsigned int nr_pages)
>> +{
>> +}
>> +#endif
>> +
>>  
>>  /*
>>   * Because page_memcg(head) is not set on tails, set it now.
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index b37435c274cf..eb37177bf507 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5207,6 +5207,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>>  	gfp_t alloc_gfp;
>>  	unsigned int alloc_flags = ALLOC_WMARK_LOW;
>>  	int nr_populated = 0, nr_account = 0;
>> +	unsigned int nr_pre_charge = 0;
>> +	struct obj_cgroup *objcg = NULL;
>>  
>>  	/*
>>  	 * Skip populated array elements to determine if any pages need
>> @@ -5275,6 +5277,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>>  	if (unlikely(!zone))
>>  		goto failed;
>>  
>> +	nr_pre_charge = nr_pages - nr_populated;
>> +	if (!memcg_bulk_pre_charge_hook(&objcg, gfp, nr_pre_charge))
>> +		goto failed;
>> +
>>  	/* Attempt the batch allocation */
>>  	local_lock_irqsave(&pagesets.lock, flags);
>>  	pcp = this_cpu_ptr(zone->per_cpu_pageset);
>> @@ -5299,6 +5305,9 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>>  		nr_account++;
>>  
>>  		prep_new_page(page, 0, gfp, 0);
>> +		if (objcg)
>> +			memcg_bulk_charge_hook(objcg, page);
>> +
>>  		if (page_list)
>>  			list_add(&page->lru, page_list);
>>  		else
>> @@ -5310,13 +5319,16 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>>  
>>  	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
>>  	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
>> +	if (objcg)
>> +		memcg_bulk_post_charge_hook(objcg, nr_pre_charge - nr_account);
>>  
>>  out:
>>  	return nr_populated;
>>  
>>  failed_irq:
>>  	local_unlock_irqrestore(&pagesets.lock, flags);
>> -
>> +	if (objcg)
>> +		memcg_bulk_post_charge_hook(objcg, nr_pre_charge);
>>  failed:
>>  	page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
>>  	if (page) {
>> -- 
>> 2.31.1
>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 13:40                 ` Vasily Averin
@ 2021-10-12 14:58                   ` Vasily Averin
  2021-10-12 15:19                     ` Shakeel Butt
                                       ` (2 more replies)
  2021-10-14  8:02                   ` [PATCH mm v5] " Vasily Averin
  1 sibling, 3 replies; 28+ messages in thread
From: Vasily Averin @ 2021-10-12 14:58 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Mel Gorman
  Cc: Roman Gushchin, Uladzislau Rezki, Vlastimil Babka, cgroups,
	linux-mm, linux-kernel, kernel

Enable memory accounting for bulk page allocator.

Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
Cc: <stable@vger.kernel.org>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
v3: added comments,
    removed call of post charge hook for nr_pages = 0
v2: modified according to Shakeel Butt's remarks
---
 include/linux/memcontrol.h | 11 +++++++++
 mm/memcontrol.c            | 48 +++++++++++++++++++++++++++++++++++++-
 mm/page_alloc.c            | 21 ++++++++++++++++-
 3 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3096c9a0ee01..990acd70c846 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -810,6 +810,12 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
 	percpu_ref_put(&objcg->refcnt);
 }
 
+static inline void obj_cgroup_put_many(struct obj_cgroup *objcg,
+				       unsigned long nr)
+{
+	percpu_ref_put_many(&objcg->refcnt, nr);
+}
+
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 	if (memcg)
@@ -1746,4 +1752,9 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
 
 #endif /* CONFIG_MEMCG_KMEM */
 
+bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
+				unsigned int nr_pages);
+void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page);
+void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
+				 unsigned int nr_pages);
 #endif /* _LINUX_MEMCONTROL_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 87e41c3cac10..16fe3384c12c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3239,7 +3239,53 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
 	refill_obj_stock(objcg, size, true);
 }
 
-#endif /* CONFIG_MEMCG_KMEM */
+bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
+				unsigned int nr_pages)
+{
+	struct obj_cgroup *objcg = NULL;
+
+	if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT))
+		return true;
+
+	objcg = get_obj_cgroup_from_current();
+
+	if (objcg && obj_cgroup_charge_pages(objcg, gfp, nr_pages)) {
+		obj_cgroup_put(objcg);
+		return false;
+	}
+	obj_cgroup_get_many(objcg, nr_pages - 1);
+	*objcgp = objcg;
+	return true;
+}
+
+void memcg_bulk_charge_hook(struct obj_cgroup *objcg, struct page *page)
+{
+	page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM;
+}
+
+void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
+				 unsigned int nr_pages)
+{
+	obj_cgroup_uncharge_pages(objcg, nr_pages);
+	obj_cgroup_put_many(objcg, nr_pages);
+}
+#else /* !CONFIG_MEMCG_KMEM */
+bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
+				unsigned int nr_pages)
+{
+	return true;
+}
+
+void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page)
+{
+}
+
+void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
+				 unsigned int nr_pages)
+{
+}
+#endif
+
 
 /*
  * Because page_memcg(head) is not set on tails, set it now.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..23a8b55df508 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5207,6 +5207,8 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	gfp_t alloc_gfp;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
 	int nr_populated = 0, nr_account = 0;
+	unsigned int nr_pre_charge = 0;
+	struct obj_cgroup *objcg = NULL;
 
 	/*
 	 * Skip populated array elements to determine if any pages need
@@ -5275,6 +5277,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	if (unlikely(!zone))
 		goto failed;
 
+	nr_pre_charge = nr_pages - nr_populated;
+	/* pre-charge memory and take refcounts for each allocated page */
+	if (!memcg_bulk_pre_charge_hook(&objcg, gfp, nr_pre_charge))
+		goto failed;
+
 	/* Attempt the batch allocation */
 	local_lock_irqsave(&pagesets.lock, flags);
 	pcp = this_cpu_ptr(zone->per_cpu_pageset);
@@ -5299,6 +5306,9 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 		nr_account++;
 
 		prep_new_page(page, 0, gfp, 0);
+		if (objcg)
+			memcg_bulk_charge_hook(objcg, page);
+
 		if (page_list)
 			list_add(&page->lru, page_list);
 		else
@@ -5310,13 +5320,22 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
 	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
+	/*
+	 * Typically nr_pre_charge == nr_account,
+	 * otherwise a few extra pages was pre-charged,
+	 * and must be uncharged now.
+	 */
+	if (objcg && nr_pre_charge - nr_account)
+		memcg_bulk_post_charge_hook(objcg, nr_pre_charge - nr_account);
 
 out:
 	return nr_populated;
 
 failed_irq:
 	local_unlock_irqrestore(&pagesets.lock, flags);
-
+	/* uncharge memory and decrement refcounts for all pre-charged pages */
+	if (objcg)
+		memcg_bulk_post_charge_hook(objcg, nr_pre_charge);
 failed:
 	page = __alloc_pages(gfp, 0, preferred_nid, nodemask);
 	if (page) {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 14:58                   ` [PATCH mm v3] " Vasily Averin
@ 2021-10-12 15:19                     ` Shakeel Butt
  2021-10-12 15:20                     ` Mel Gorman
  2021-10-12 15:36                     ` Michal Hocko
  2 siblings, 0 replies; 28+ messages in thread
From: Shakeel Butt @ 2021-10-12 15:19 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	Cgroups, Linux MM, LKML, kernel

On Tue, Oct 12, 2021 at 7:58 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>
> Enable memory accounting for bulk page allocator.
>
> Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> ---
> v3: added comments,
>     removed call of post charge hook for nr_pages = 0
> v2: modified according to Shakeel Butt's remarks
> ---
>  include/linux/memcontrol.h | 11 +++++++++
>  mm/memcontrol.c            | 48 +++++++++++++++++++++++++++++++++++++-
>  mm/page_alloc.c            | 21 ++++++++++++++++-
>  3 files changed, 78 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 3096c9a0ee01..990acd70c846 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -810,6 +810,12 @@ static inline void obj_cgroup_put(struct obj_cgroup *objcg)
>         percpu_ref_put(&objcg->refcnt);
>  }
>
> +static inline void obj_cgroup_put_many(struct obj_cgroup *objcg,
> +                                      unsigned long nr)
> +{
> +       percpu_ref_put_many(&objcg->refcnt, nr);
> +}
> +
>  static inline void mem_cgroup_put(struct mem_cgroup *memcg)
>  {
>         if (memcg)
> @@ -1746,4 +1752,9 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
>
>  #endif /* CONFIG_MEMCG_KMEM */
>
> +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
> +                               unsigned int nr_pages);
> +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page);
> +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
> +                                unsigned int nr_pages);
>  #endif /* _LINUX_MEMCONTROL_H */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 87e41c3cac10..16fe3384c12c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3239,7 +3239,53 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
>         refill_obj_stock(objcg, size, true);
>  }
>
> -#endif /* CONFIG_MEMCG_KMEM */
> +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
> +                               unsigned int nr_pages)
> +{
> +       struct obj_cgroup *objcg = NULL;

No need to explicitly set it to NULL.

> +
> +       if (!memcg_kmem_enabled() || !(gfp & __GFP_ACCOUNT))
> +               return true;
> +
> +       objcg = get_obj_cgroup_from_current();
> +
> +       if (objcg && obj_cgroup_charge_pages(objcg, gfp, nr_pages)) {
> +               obj_cgroup_put(objcg);
> +               return false;
> +       }

get_obj_cgroup_from_current() can return NULL, so you would need to
return true early for that condition.

> +       obj_cgroup_get_many(objcg, nr_pages - 1);
> +       *objcgp = objcg;
> +       return true;
> +}
> +
> +void memcg_bulk_charge_hook(struct obj_cgroup *objcg, struct page *page)
> +{
> +       page->memcg_data = (unsigned long)objcg | MEMCG_DATA_KMEM;
> +}
> +
> +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
> +                                unsigned int nr_pages)
> +{
> +       obj_cgroup_uncharge_pages(objcg, nr_pages);
> +       obj_cgroup_put_many(objcg, nr_pages);
> +}

You can keep the following '#else' code block in the header file.

> +#else /* !CONFIG_MEMCG_KMEM */
> +bool memcg_bulk_pre_charge_hook(struct obj_cgroup **objcgp, gfp_t gfp,
> +                               unsigned int nr_pages)
> +{
> +       return true;
> +}
> +
> +void memcg_bulk_charge_hook(struct obj_cgroup *objcgp, struct page *page)
> +{
> +}
> +
> +void memcg_bulk_post_charge_hook(struct obj_cgroup *objcg,
> +                                unsigned int nr_pages)
> +{
> +}
> +#endif
> +

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 14:58                   ` [PATCH mm v3] " Vasily Averin
  2021-10-12 15:19                     ` Shakeel Butt
@ 2021-10-12 15:20                     ` Mel Gorman
  2021-10-12 15:36                     ` Michal Hocko
  2 siblings, 0 replies; 28+ messages in thread
From: Mel Gorman @ 2021-10-12 15:20 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	cgroups, linux-mm, linux-kernel, kernel

On Tue, Oct 12, 2021 at 05:58:21PM +0300, Vasily Averin wrote:
> Enable memory accounting for bulk page allocator.
> 
> Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 14:58                   ` [PATCH mm v3] " Vasily Averin
  2021-10-12 15:19                     ` Shakeel Butt
  2021-10-12 15:20                     ` Mel Gorman
@ 2021-10-12 15:36                     ` Michal Hocko
  2021-10-12 16:08                       ` Shakeel Butt
  2021-10-12 18:45                       ` Vasily Averin
  2 siblings, 2 replies; 28+ messages in thread
From: Michal Hocko @ 2021-10-12 15:36 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	cgroups, linux-mm, linux-kernel, kernel

On Tue 12-10-21 17:58:21, Vasily Averin wrote:
> Enable memory accounting for bulk page allocator.

ENOCHANGELOG
 
And I have to say I am not very happy about the solution. It adds a very
tricky code where it splits different charging steps apart.

Would it be just too inefficient to charge page-by-page once all pages
are already taken away from the pcp lists? This bulk should be small so
this shouldn't really cause massive problems. I mean something like

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..8bcd69195ef5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	local_unlock_irqrestore(&pagesets.lock, flags);
 
+	if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) {
+		/* charge pages here */
+	}
+
 	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
 	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 15:36                     ` Michal Hocko
@ 2021-10-12 16:08                       ` Shakeel Butt
  2021-10-12 18:24                         ` Michal Hocko
  2021-10-12 18:45                       ` Vasily Averin
  1 sibling, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2021-10-12 16:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vasily Averin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	Cgroups, Linux MM, LKML, kernel

On Tue, Oct 12, 2021 at 8:36 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 12-10-21 17:58:21, Vasily Averin wrote:
> > Enable memory accounting for bulk page allocator.
>
> ENOCHANGELOG
>
> And I have to say I am not very happy about the solution. It adds a very
> tricky code where it splits different charging steps apart.
>
> Would it be just too inefficient to charge page-by-page once all pages
> are already taken away from the pcp lists? This bulk should be small so
> this shouldn't really cause massive problems. I mean something like
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b37435c274cf..8bcd69195ef5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>
>         local_unlock_irqrestore(&pagesets.lock, flags);
>
> +       if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) {
> +               /* charge pages here */
> +       }

It is not that simple because __alloc_pages_bulk only allocate pages
for empty slots in the page_array provided by the caller.

The failure handling for post charging would be more complicated.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 16:08                       ` Shakeel Butt
@ 2021-10-12 18:24                         ` Michal Hocko
  2021-10-13 16:41                           ` Shakeel Butt
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2021-10-12 18:24 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Vasily Averin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	Cgroups, Linux MM, LKML, kernel

On Tue 12-10-21 09:08:38, Shakeel Butt wrote:
> On Tue, Oct 12, 2021 at 8:36 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 12-10-21 17:58:21, Vasily Averin wrote:
> > > Enable memory accounting for bulk page allocator.
> >
> > ENOCHANGELOG
> >
> > And I have to say I am not very happy about the solution. It adds a very
> > tricky code where it splits different charging steps apart.
> >
> > Would it be just too inefficient to charge page-by-page once all pages
> > are already taken away from the pcp lists? This bulk should be small so
> > this shouldn't really cause massive problems. I mean something like
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index b37435c274cf..8bcd69195ef5 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> >
> >         local_unlock_irqrestore(&pagesets.lock, flags);
> >
> > +       if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) {
> > +               /* charge pages here */
> > +       }
> 
> It is not that simple because __alloc_pages_bulk only allocate pages
> for empty slots in the page_array provided by the caller.
> 
> The failure handling for post charging would be more complicated.

If this is really that complicated (I haven't tried) then it would be
much more simple to completely skip the bulk allocator for __GFP_ACCOUNT
rather than add a tricky code. The bulk allocator is meant to be used
for ultra hot paths and memcg charging along with the reclaim doesn't
really fit into that model anyway. Or are there any actual users who
really need bulk allocator optimization and also need memcg accounting?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 15:36                     ` Michal Hocko
  2021-10-12 16:08                       ` Shakeel Butt
@ 2021-10-12 18:45                       ` Vasily Averin
  1 sibling, 0 replies; 28+ messages in thread
From: Vasily Averin @ 2021-10-12 18:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Vladimir Davydov, Andrew Morton, Shakeel Butt,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	cgroups, linux-mm, linux-kernel, kernel

On 12.10.2021 18:36, Michal Hocko wrote:
> On Tue 12-10-21 17:58:21, Vasily Averin wrote:
>> Enable memory accounting for bulk page allocator.
> 
> ENOCHANGELOG
>  
> And I have to say I am not very happy about the solution. It adds a very
> tricky code where it splits different charging steps apart.
> 
> Would it be just too inefficient to charge page-by-page once all pages
> are already taken away from the pcp lists? This bulk should be small so
> this shouldn't really cause massive problems. I mean something like
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b37435c274cf..8bcd69195ef5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
>  
>  	local_unlock_irqrestore(&pagesets.lock, flags);
>  
> +	if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) {
> +		/* charge pages here */
> +	}
> +
>  	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
>  	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
> 

In general it looks like we can do it.

We can traverse via filled page_array or page_list.
For page_array we need to check is the page already accounted 
(incoming array can contain some pages already, both in the beginning and in middle)
For each taken page we can try to charge it.
If it was charges successfully -- we will process next page in list/array.
When charge fails we need to remove rest of pages from list/array and somehow release them. 
At present I do not understand how to do it correctly -- perhaps just call free_page() ?
Finally, we'll need to adjust nr_account and nr_populated properly.

I'll try to implement this tomorrow.

Thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 18:24                         ` Michal Hocko
@ 2021-10-13 16:41                           ` Shakeel Butt
  2021-10-13 17:16                             ` Michal Hocko
  0 siblings, 1 reply; 28+ messages in thread
From: Shakeel Butt @ 2021-10-13 16:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vasily Averin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	Cgroups, Linux MM, LKML, kernel

On Tue, Oct 12, 2021 at 11:24 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 12-10-21 09:08:38, Shakeel Butt wrote:
> > On Tue, Oct 12, 2021 at 8:36 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Tue 12-10-21 17:58:21, Vasily Averin wrote:
> > > > Enable memory accounting for bulk page allocator.
> > >
> > > ENOCHANGELOG
> > >
> > > And I have to say I am not very happy about the solution. It adds a very
> > > tricky code where it splits different charging steps apart.
> > >
> > > Would it be just too inefficient to charge page-by-page once all pages
> > > are already taken away from the pcp lists? This bulk should be small so
> > > this shouldn't really cause massive problems. I mean something like
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index b37435c274cf..8bcd69195ef5 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> > >
> > >         local_unlock_irqrestore(&pagesets.lock, flags);
> > >
> > > +       if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) {
> > > +               /* charge pages here */
> > > +       }
> >
> > It is not that simple because __alloc_pages_bulk only allocate pages
> > for empty slots in the page_array provided by the caller.
> >
> > The failure handling for post charging would be more complicated.
>
> If this is really that complicated (I haven't tried) then it would be
> much more simple to completely skip the bulk allocator for __GFP_ACCOUNT
> rather than add a tricky code. The bulk allocator is meant to be used
> for ultra hot paths and memcg charging along with the reclaim doesn't
> really fit into that model anyway. Or are there any actual users who
> really need bulk allocator optimization and also need memcg accounting?

Bulk allocator is being used for vmalloc and we have several
kvmalloc() with __GFP_ACCOUNT allocations.

It seems like Vasily has some ideas, so let's wait for his next version.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-13 16:41                           ` Shakeel Butt
@ 2021-10-13 17:16                             ` Michal Hocko
  2021-10-13 17:30                               ` Shakeel Butt
  0 siblings, 1 reply; 28+ messages in thread
From: Michal Hocko @ 2021-10-13 17:16 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Vasily Averin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	Cgroups, Linux MM, LKML, kernel

On Wed 13-10-21 09:41:15, Shakeel Butt wrote:
> On Tue, Oct 12, 2021 at 11:24 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 12-10-21 09:08:38, Shakeel Butt wrote:
> > > On Tue, Oct 12, 2021 at 8:36 AM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Tue 12-10-21 17:58:21, Vasily Averin wrote:
> > > > > Enable memory accounting for bulk page allocator.
> > > >
> > > > ENOCHANGELOG
> > > >
> > > > And I have to say I am not very happy about the solution. It adds a very
> > > > tricky code where it splits different charging steps apart.
> > > >
> > > > Would it be just too inefficient to charge page-by-page once all pages
> > > > are already taken away from the pcp lists? This bulk should be small so
> > > > this shouldn't really cause massive problems. I mean something like
> > > >
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index b37435c274cf..8bcd69195ef5 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -5308,6 +5308,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> > > >
> > > >         local_unlock_irqrestore(&pagesets.lock, flags);
> > > >
> > > > +       if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) {
> > > > +               /* charge pages here */
> > > > +       }
> > >
> > > It is not that simple because __alloc_pages_bulk only allocate pages
> > > for empty slots in the page_array provided by the caller.
> > >
> > > The failure handling for post charging would be more complicated.
> >
> > If this is really that complicated (I haven't tried) then it would be
> > much more simple to completely skip the bulk allocator for __GFP_ACCOUNT
> > rather than add a tricky code. The bulk allocator is meant to be used
> > for ultra hot paths and memcg charging along with the reclaim doesn't
> > really fit into that model anyway. Or are there any actual users who
> > really need bulk allocator optimization and also need memcg accounting?
> 
> Bulk allocator is being used for vmalloc and we have several
> kvmalloc() with __GFP_ACCOUNT allocations.

Do we really need to use bulk allocator for these allocations?
Bulk allocator is an bypass of the page allocator for performance reason
and I can see why that can be useful but considering that the charging
path can imply some heavy lifting is all the code churn to make bulk
allocator memcg aware really worth it? Why cannot we simply skip over
bulk allocator for __GFP_ACCOUNT. That would be a trivial fix.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v3] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-13 17:16                             ` Michal Hocko
@ 2021-10-13 17:30                               ` Shakeel Butt
  0 siblings, 0 replies; 28+ messages in thread
From: Shakeel Butt @ 2021-10-13 17:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vasily Averin, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	Cgroups, Linux MM, LKML, kernel

On Wed, Oct 13, 2021 at 10:16 AM Michal Hocko <mhocko@suse.com> wrote:
>
[...]
> > > If this is really that complicated (I haven't tried) then it would be
> > > much more simple to completely skip the bulk allocator for __GFP_ACCOUNT
> > > rather than add a tricky code. The bulk allocator is meant to be used
> > > for ultra hot paths and memcg charging along with the reclaim doesn't
> > > really fit into that model anyway. Or are there any actual users who
> > > really need bulk allocator optimization and also need memcg accounting?
> >
> > Bulk allocator is being used for vmalloc and we have several
> > kvmalloc() with __GFP_ACCOUNT allocations.
>
> Do we really need to use bulk allocator for these allocations?
> Bulk allocator is an bypass of the page allocator for performance reason
> and I can see why that can be useful but considering that the charging
> path can imply some heavy lifting is all the code churn to make bulk
> allocator memcg aware really worth it? Why cannot we simply skip over
> bulk allocator for __GFP_ACCOUNT. That would be a trivial fix.
> --

Actually that might be the simplest solution and I agree to skip bulk
allocator for __GFP_ACCOUNT allocations.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH mm v5] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-12 13:40                 ` Vasily Averin
  2021-10-12 14:58                   ` [PATCH mm v3] " Vasily Averin
@ 2021-10-14  8:02                   ` Vasily Averin
  2021-10-15 21:34                     ` Andrew Morton
  1 sibling, 1 reply; 28+ messages in thread
From: Vasily Averin @ 2021-10-14  8:02 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner, Vladimir Davydov, Andrew Morton,
	Shakeel Butt, Mel Gorman
  Cc: Roman Gushchin, Uladzislau Rezki, Vlastimil Babka, cgroups,
	linux-mm, linux-kernel, kernel

Bulk page allocator is used in vmalloc where it can be called
with __GFP_ACCOUNT and must charge allocated pages into memory cgroup.

Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
Cc: <stable@vger.kernel.org>
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
v5: remove pre-charge,
    use post-allocation per-page charges according to Michal Hocko's remarks
v4: updated according to Shakeel Butt's remarks,
    fixed wrong location of hooks declaration in memcontrol.h
v3: added comments,
    removed call of post charge hook for nr_pages = 0
v2: modified according to Shakeel Butt's remarks
---
 include/linux/memcontrol.h |  9 +++++++
 mm/memcontrol.c            | 50 ++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c            | 12 +++++++--
 3 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 3096c9a0ee01..6bad3d6efd03 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1692,6 +1692,9 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg)
 
 struct mem_cgroup *mem_cgroup_from_obj(void *p);
 
+int memcg_charge_bulk_pages(gfp_t gfp, int nr_pages,
+			    struct list_head *page_list,
+			    struct page **page_array);
 #else
 static inline bool mem_cgroup_kmem_disabled(void)
 {
@@ -1744,6 +1747,12 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
        return NULL;
 }
 
+int memcg_charge_bulk_pages(gfp_t gfp, int nr_pages,
+			    struct list_head *page_list,
+			    struct page **page_array)
+{
+	return 0;
+}
 #endif /* CONFIG_MEMCG_KMEM */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 87e41c3cac10..568e594179f5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3239,6 +3239,56 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
 	refill_obj_stock(objcg, size, true);
 }
 
+/*
+ * memcg_charge_bulk_pages - Charge pages allocated by bulk allocator
+ * @gfp: GFP flags for the allocation
+ * @nr_pages: The number of pages added into the list or array
+ * @page_list: Optional list of allocated pages
+ * @page_array: Optional array of allocated pages
+ *
+ * Walks through array or list of allocated pages.
+ * For each page tries to charge it.
+ * If charge fails removes page from of array/list, frees it,
+ * and repeat it till end of array/list
+ *
+ * Returns the number of freed pages.
+ */
+int memcg_charge_bulk_pages(gfp_t gfp, int nr_pages,
+			    struct list_head *page_list,
+			    struct page **page_array)
+{
+	struct page *page, *np = NULL;
+	bool charge = true;
+	int i, nr_freed = 0;
+
+	if (page_list)
+		page = list_first_entry(page_list, struct page, lru);
+
+	for (i = 0; i < nr_pages; i++) {
+		if (page_list) {
+			if (np)
+				page = np;
+			np = list_next_entry(page, lru);
+		} else {
+			page = page_array[i];
+		}
+		/* some pages in incoming array can be charged already */
+		if (!page->memcg_data) {
+			if (charge && __memcg_kmem_charge_page(page, gfp, 0))
+				charge = false;
+
+			if (!charge) {
+				if (page_list)
+					list_del(&page->lru);
+				else
+					page_array[i] = NULL;
+				__free_pages(page, 0);
+				nr_freed++;
+			}
+		}
+	}
+	return nr_freed;
+}
 #endif /* CONFIG_MEMCG_KMEM */
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b37435c274cf..71c7f29ff8dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5203,10 +5203,11 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 	struct zoneref *z;
 	struct per_cpu_pages *pcp;
 	struct list_head *pcp_list;
+	LIST_HEAD(tpl);
 	struct alloc_context ac;
 	gfp_t alloc_gfp;
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
-	int nr_populated = 0, nr_account = 0;
+	int nr_populated = 0, nr_account = 0, nr_freed = 0;
 
 	/*
 	 * Skip populated array elements to determine if any pages need
@@ -5300,7 +5301,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 		prep_new_page(page, 0, gfp, 0);
 		if (page_list)
-			list_add(&page->lru, page_list);
+			list_add(&page->lru, &tpl);
 		else
 			page_array[nr_populated] = page;
 		nr_populated++;
@@ -5308,6 +5309,13 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
 
 	local_unlock_irqrestore(&pagesets.lock, flags);
 
+	if (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT) && nr_account)
+       		nr_freed = memcg_charge_bulk_pages(gfp, nr_populated,
+						   page_list ? &tpl : NULL,
+						   page_array);
+	nr_account -= nr_freed;
+	nr_populated -= nr_freed;
+	list_splice(&tpl, page_list);
 	__count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account);
 	zone_statistics(ac.preferred_zoneref->zone, zone, nr_account);
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v5] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-14  8:02                   ` [PATCH mm v5] " Vasily Averin
@ 2021-10-15 21:34                     ` Andrew Morton
  2021-10-16  6:04                       ` Vasily Averin
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2021-10-15 21:34 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Shakeel Butt,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	cgroups, linux-mm, linux-kernel, kernel

On Thu, 14 Oct 2021 11:02:57 +0300 Vasily Averin <vvs@virtuozzo.com> wrote:

> Bulk page allocator is used in vmalloc where it can be called
> with __GFP_ACCOUNT and must charge allocated pages into memory cgroup.

Is this problem serious enough to justify -stable backporting?  Some
words which explaining reasoning for the backport would be helpful.

This patch makes Shakeel's "memcg: page_alloc: skip bulk allocator for
__GFP_ACCOUNT" unnecessary.  Which should we use?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mm v5] memcg: enable memory accounting in __alloc_pages_bulk
  2021-10-15 21:34                     ` Andrew Morton
@ 2021-10-16  6:04                       ` Vasily Averin
  0 siblings, 0 replies; 28+ messages in thread
From: Vasily Averin @ 2021-10-16  6:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Johannes Weiner, Vladimir Davydov, Shakeel Butt,
	Mel Gorman, Roman Gushchin, Uladzislau Rezki, Vlastimil Babka,
	cgroups, linux-mm, linux-kernel, kernel

On 16.10.2021 00:34, Andrew Morton wrote:
> On Thu, 14 Oct 2021 11:02:57 +0300 Vasily Averin <vvs@virtuozzo.com> wrote:
> 
>> Bulk page allocator is used in vmalloc where it can be called
>> with __GFP_ACCOUNT and must charge allocated pages into memory cgroup.
> 
> Is this problem serious enough to justify -stable backporting?  Some
> words which explaining reasoning for the backport would be helpful.
> 
> This patch makes Shakeel's "memcg: page_alloc: skip bulk allocator for
> __GFP_ACCOUNT" unnecessary.  Which should we use?

Please use Shakeel's patch.

My patch at least requires review, so at present it should be delayed.
I've submitted it because it may be useful later.
Moreover  Currently it have minor issue, function in !MEMCG_KMEM branch
in of memcontrol.h should be declare as static inline.

Thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-10-16  6:04 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-07  8:04 memcg memory accounting in vmalloc is broken Vasily Averin
2021-10-07  8:13 ` Michal Hocko
2021-10-07  8:16   ` Michal Hocko
2021-10-07  8:50     ` Vasily Averin
2021-10-07 10:08       ` Michal Hocko
2021-10-07 10:20       ` Mel Gorman
2021-10-07 14:02         ` Vlastimil Babka
2021-10-07 14:00       ` Vlastimil Babka
2021-10-07 14:09         ` Michal Hocko
2021-10-07 19:33       ` Vasily Averin
2021-10-08  9:23         ` [PATCH memcg] memcg: enable memory accounting in __alloc_pages_bulk Vasily Averin
2021-10-08 17:35           ` Shakeel Butt
2021-10-12 10:18             ` [PATCH mm v2] " Vasily Averin
2021-10-12 13:10               ` Mel Gorman
2021-10-12 13:40                 ` Vasily Averin
2021-10-12 14:58                   ` [PATCH mm v3] " Vasily Averin
2021-10-12 15:19                     ` Shakeel Butt
2021-10-12 15:20                     ` Mel Gorman
2021-10-12 15:36                     ` Michal Hocko
2021-10-12 16:08                       ` Shakeel Butt
2021-10-12 18:24                         ` Michal Hocko
2021-10-13 16:41                           ` Shakeel Butt
2021-10-13 17:16                             ` Michal Hocko
2021-10-13 17:30                               ` Shakeel Butt
2021-10-12 18:45                       ` Vasily Averin
2021-10-14  8:02                   ` [PATCH mm v5] " Vasily Averin
2021-10-15 21:34                     ` Andrew Morton
2021-10-16  6:04                       ` Vasily Averin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).