LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch rfc] memcg: correctly order reading PCG_USED and pc->mem_cgroup
@ 2011-01-19 12:03 Johannes Weiner
  2011-01-20  1:06 ` KAMEZAWA Hiroyuki
  2011-01-20  1:52 ` Daisuke Nishimura
  0 siblings, 2 replies; 4+ messages in thread
From: Johannes Weiner @ 2011-01-19 12:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KAMEZAWA Hiroyuki, Daisuke Nishimura, Balbir Singh, linux-mm,
	linux-kernel

The placement of the read-side barrier is confused: the writer first
sets pc->mem_cgroup, then PCG_USED.  The read-side barrier has to be
between testing PCG_USED and reading pc->mem_cgroup.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c |   27 +++++++++------------------
 1 files changed, 9 insertions(+), 18 deletions(-)

I am a bit dumbfounded as to why this has never had any impact.  I see
two scenarios where charging can race with LRU operations:

One is shmem pages on swapoff.  They are on the LRU when charged as
page cache, which could race with isolation/putback.  This seems
sufficiently rare.

The other case is a swap cache page being charged while somebody else
had it isolated.  mem_cgroup_lru_del_before_commit_swapcache() would
see the page isolated and skip it.  The commit then has to race with
putback, which could see PCG_USED but not pc->mem_cgroup, and crash
with a NULL pointer dereference.  This does sound a bit more likely.

Any idea?  Am I missing something?

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5b562b3..db76ef7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -836,13 +836,12 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 		return;
 
 	pc = lookup_page_cgroup(page);
-	/*
-	 * Used bit is set without atomic ops but after smp_wmb().
-	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
-	 */
-	smp_rmb();
 	/* unused or root page is not rotated. */
-	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
+	if (!PageCgroupUsed(pc))
+		return;
+	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
+	smp_rmb();
+	if (mem_cgroup_is_root(pc->mem_cgroup))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -857,14 +856,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	VM_BUG_ON(PageCgroupAcctLRU(pc));
-	/*
-	 * Used bit is set without atomic ops but after smp_wmb().
-	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
-	 */
-	smp_rmb();
 	if (!PageCgroupUsed(pc))
 		return;
-
+	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
+	smp_rmb();
 	mz = page_cgroup_zoneinfo(pc);
 	/* huge page split is done under lru_lock. so, we have no races. */
 	MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page);
@@ -1031,14 +1026,10 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
 		return NULL;
 
 	pc = lookup_page_cgroup(page);
-	/*
-	 * Used bit is set without atomic ops but after smp_wmb().
-	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
-	 */
-	smp_rmb();
 	if (!PageCgroupUsed(pc))
 		return NULL;
-
+	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
+	smp_rmb();
 	mz = page_cgroup_zoneinfo(pc);
 	if (!mz)
 		return NULL;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [patch rfc] memcg: correctly order reading PCG_USED and pc->mem_cgroup
  2011-01-19 12:03 [patch rfc] memcg: correctly order reading PCG_USED and pc->mem_cgroup Johannes Weiner
@ 2011-01-20  1:06 ` KAMEZAWA Hiroyuki
  2011-01-20 10:49   ` Johannes Weiner
  2011-01-20  1:52 ` Daisuke Nishimura
  1 sibling, 1 reply; 4+ messages in thread
From: KAMEZAWA Hiroyuki @ 2011-01-20  1:06 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, linux-mm, linux-kernel

On Wed, 19 Jan 2011 13:03:19 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> The placement of the read-side barrier is confused: the writer first
> sets pc->mem_cgroup, then PCG_USED.  The read-side barrier has to be
> between testing PCG_USED and reading pc->mem_cgroup.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/memcontrol.c |   27 +++++++++------------------
>  1 files changed, 9 insertions(+), 18 deletions(-)
> 
> I am a bit dumbfounded as to why this has never had any impact.  I see
> two scenarios where charging can race with LRU operations:
> 
> One is shmem pages on swapoff.  They are on the LRU when charged as
> page cache, which could race with isolation/putback.  This seems
> sufficiently rare.
> 
> The other case is a swap cache page being charged while somebody else
> had it isolated.  mem_cgroup_lru_del_before_commit_swapcache() would
> see the page isolated and skip it.  The commit then has to race with
> putback, which could see PCG_USED but not pc->mem_cgroup, and crash
> with a NULL pointer dereference.  This does sound a bit more likely.
> 
> Any idea?  Am I missing something?
> 

I think troubles happen only when PCG_USED bit was found but pc->mem_cgroup
is NULL. Hmm.

  set pc->mem_cgroup
  write_barrier
  set USED bit.

  read_barrier
  check USED bit
  access pc->mem_cgroup

So, is there a case which only USED bit can be seen ?
Anyway, your patch is right.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5b562b3..db76ef7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -836,13 +836,12 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  
>  	pc = lookup_page_cgroup(page);
> -	/*
> -	 * Used bit is set without atomic ops but after smp_wmb().
> -	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> -	 */
> -	smp_rmb();
>  	/* unused or root page is not rotated. */
> -	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
> +	if (!PageCgroupUsed(pc))
> +		return;
> +	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
> +	smp_rmb();
> +	if (mem_cgroup_is_root(pc->mem_cgroup))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -857,14 +856,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	VM_BUG_ON(PageCgroupAcctLRU(pc));
> -	/*
> -	 * Used bit is set without atomic ops but after smp_wmb().
> -	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> -	 */
> -	smp_rmb();
>  	if (!PageCgroupUsed(pc))
>  		return;
> -
> +	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
> +	smp_rmb();
>  	mz = page_cgroup_zoneinfo(pc);
>  	/* huge page split is done under lru_lock. so, we have no races. */
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1 << compound_order(page);
> @@ -1031,14 +1026,10 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page)
>  		return NULL;
>  
>  	pc = lookup_page_cgroup(page);
> -	/*
> -	 * Used bit is set without atomic ops but after smp_wmb().
> -	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> -	 */
> -	smp_rmb();
>  	if (!PageCgroupUsed(pc))
>  		return NULL;
> -
> +	/* Ensure pc->mem_cgroup is visible after reading PCG_USED. */
> +	smp_rmb();
>  	mz = page_cgroup_zoneinfo(pc);
>  	if (!mz)
>  		return NULL;
> -- 
> 1.7.3.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch rfc] memcg: correctly order reading PCG_USED and pc->mem_cgroup
  2011-01-19 12:03 [patch rfc] memcg: correctly order reading PCG_USED and pc->mem_cgroup Johannes Weiner
  2011-01-20  1:06 ` KAMEZAWA Hiroyuki
@ 2011-01-20  1:52 ` Daisuke Nishimura
  1 sibling, 0 replies; 4+ messages in thread
From: Daisuke Nishimura @ 2011-01-20  1:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, KAMEZAWA Hiroyuki, Balbir Singh, linux-mm,
	linux-kernel, Daisuke Nishimura

On Wed, 19 Jan 2011 13:03:19 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> The placement of the read-side barrier is confused: the writer first
> sets pc->mem_cgroup, then PCG_USED.  The read-side barrier has to be
> between testing PCG_USED and reading pc->mem_cgroup.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/memcontrol.c |   27 +++++++++------------------
>  1 files changed, 9 insertions(+), 18 deletions(-)
> 
> I am a bit dumbfounded as to why this has never had any impact.  I see
> two scenarios where charging can race with LRU operations:
> 
> One is shmem pages on swapoff.  They are on the LRU when charged as
> page cache, which could race with isolation/putback.  This seems
> sufficiently rare.
> 
> The other case is a swap cache page being charged while somebody else
> had it isolated.  mem_cgroup_lru_del_before_commit_swapcache() would
> see the page isolated and skip it.  The commit then has to race with
> putback, which could see PCG_USED but not pc->mem_cgroup, and crash
> with a NULL pointer dereference.  This does sound a bit more likely.
> 
> Any idea?  Am I missing something?
> 
pc->mem_cgroup is not cleared even when the page is freed, so NULL pointer
dereference can happen only when it's the first time the page is used.
But yes, even if it's not the first time, this means pc->mem_cgroup may be wrong.

Anyway, I welcome this patch.

Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>


Thanks,
Daisuke Nishimura.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch rfc] memcg: correctly order reading PCG_USED and pc->mem_cgroup
  2011-01-20  1:06 ` KAMEZAWA Hiroyuki
@ 2011-01-20 10:49   ` Johannes Weiner
  0 siblings, 0 replies; 4+ messages in thread
From: Johannes Weiner @ 2011-01-20 10:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, Daisuke Nishimura, Balbir Singh, linux-mm, linux-kernel

On Thu, Jan 20, 2011 at 10:06:54AM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 19 Jan 2011 13:03:19 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > The placement of the read-side barrier is confused: the writer first
> > sets pc->mem_cgroup, then PCG_USED.  The read-side barrier has to be
> > between testing PCG_USED and reading pc->mem_cgroup.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/memcontrol.c |   27 +++++++++------------------
> >  1 files changed, 9 insertions(+), 18 deletions(-)
> > 
> > I am a bit dumbfounded as to why this has never had any impact.  I see
> > two scenarios where charging can race with LRU operations:
> > 
> > One is shmem pages on swapoff.  They are on the LRU when charged as
> > page cache, which could race with isolation/putback.  This seems
> > sufficiently rare.
> > 
> > The other case is a swap cache page being charged while somebody else
> > had it isolated.  mem_cgroup_lru_del_before_commit_swapcache() would
> > see the page isolated and skip it.  The commit then has to race with
> > putback, which could see PCG_USED but not pc->mem_cgroup, and crash
> > with a NULL pointer dereference.  This does sound a bit more likely.
> > 
> > Any idea?  Am I missing something?
> > 
> 
> I think troubles happen only when PCG_USED bit was found but pc->mem_cgroup
> is NULL. Hmm.

Correct.  Well, or get linked to the wrong LRU list and subsequently
prevent removal of both cgroups because it's impossible to empty them.

>   set pc->mem_cgroup
>   write_barrier
>   set USED bit.
> 
>   read_barrier
>   check USED bit
>   access pc->mem_cgroup
> 
> So, is there a case which only USED bit can be seen ?

That's what I am not quite sure about.  As said, I think it can happen
when swap cache charging races with reclaim or migration.

When the two loads of the used bit and the memcg pointer get
reordered, it could observe a set PCG_USED and a stale/unset
pc->mem_cgroup.

For example:

swap minor fault:			vmscan:

lookup_swap_cache()
					unlock_page()
lock_page()
mem_cgroup_try_charge_swapin()
					putback_lru_page()
					mem_cgroup_add_lru_list()
					  p = pc->mem_cgroup
  pc->mem_cgroup = FOO
  smp_wmb()
  pc->flags |= PCG_USED
					  f = pc->flags
					  if (!(f & PCG_USED))
						  return
					  *p /* bang */

> Anyway, your patch is right.
> 
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thank you.

	Hannes

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-01-20 10:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-19 12:03 [patch rfc] memcg: correctly order reading PCG_USED and pc->mem_cgroup Johannes Weiner
2011-01-20  1:06 ` KAMEZAWA Hiroyuki
2011-01-20 10:49   ` Johannes Weiner
2011-01-20  1:52 ` Daisuke Nishimura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).