LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Mel Gorman <mgorman@suse.de>,
	Yalin.Wang@sonymobile.com
Subject: Re: [PATCH RFC 1/4] mm: throttle MADV_FREE
Date: Wed, 25 Feb 2015 10:37:48 -0800	[thread overview]
Message-ID: <20150225183748.GA2551@kernel.org> (raw)
In-Reply-To: <20150225071118.GA19115@blaptop>

On Wed, Feb 25, 2015 at 04:11:18PM +0900, Minchan Kim wrote:
> On Wed, Feb 25, 2015 at 09:08:09AM +0900, Minchan Kim wrote:
> > Hi Michal,
> > 
> > On Tue, Feb 24, 2015 at 04:43:18PM +0100, Michal Hocko wrote:
> > > On Tue 24-02-15 17:18:14, Minchan Kim wrote:
> > > > Recently, Shaohua reported that MADV_FREE is much slower than
> > > > MADV_DONTNEED in his MADV_FREE bomb test. The reason is many of
> > > > applications went to stall with direct reclaim since kswapd's
> > > > reclaim speed isn't fast than applications's allocation speed
> > > > so that it causes lots of stall and lock contention.
> > > 
> > > I am not sure I understand this correctly. So the issue is that there is
> > > huge number of MADV_FREE on the LRU and they are not close to the tail
> > > of the list so the reclaim has to do a lot of work before it starts
> > > dropping them?
> > 
> > No, Shaohua already tested deactivating of hinted pages to head/tail
> > of inactive anon LRU and he said it didn't solve his problem.
> > I thought main culprit was scanning/rotating/throttling in
> > direct reclaim path.
> 
> I investigated my workload and found most of slowness came from swapin.
> 
> 1) dontneed: 1,612 swapin
> 2) madvfree: 879,585 swapin
> 
> If we find hinted pages were already swapped out when syscall is called,
> it's pointless to keep the pages in pte. Instead, free the cold page
> because swapin is more expensive than (alloc page + zeroing).
> 
> I tested below quick fix and reduced swapin from 879,585 to 1,878.
> Elapsed time was
> 
> 1) dontneed: 6.10user 233.50system 0:50.44elapsed
> 2) madvfree + below patch: 6.70user 339.14system 1:04.45elapsed
> 
> Although it was not good as throttling, it's better than old and
> it's orthogoral with throttling so I hope to merge this first
> than arguable throttling. Any comments?
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6d0fcb8921c2..d41ae76d3e54 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -274,7 +274,9 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  	spinlock_t *ptl;
>  	pte_t *pte, ptent;
>  	struct page *page;
> +	swp_entry_t entry;
>  	unsigned long next;
> +	int rss = 0;
>  
>  	next = pmd_addr_end(addr, end);
>  	if (pmd_trans_huge(*pmd)) {
> @@ -293,9 +295,19 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  	for (; addr != end; pte++, addr += PAGE_SIZE) {
>  		ptent = *pte;
>  
> -		if (!pte_present(ptent))
> +		if (pte_none(ptent))
>  			continue;
>  
> +		if (!pte_present(ptent)) {
> +			entry = pte_to_swp_entry(ptent);
> +			if (non_swap_entry(entry))
> +				continue;
> +			rss--;
> +			free_swap_and_cache(entry);
> +			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +			continue;
> +		}
> +
>  		page = vm_normal_page(vma, addr, ptent);
>  		if (!page)
>  			continue;
> @@ -326,6 +338,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  		set_pte_at(mm, addr, pte, ptent);
>  		tlb_remove_tlb_entry(tlb, pte, addr);
>  	}
> +
> +	if (rss) {
> +		if (current->mm == mm)
> +			sync_mm_rss(mm);
> +
> +		add_mm_counter(mm, MM_SWAPENTS, rss);
> +	}
> +

This looks make sense, but I'm wondering why it can help and if this can help
real workload.  Let me have an example. Say there is 1G memory, workload uses
800M memory with DONTNEED, there should be no swap. With FREE, workload might
use more than 1G memory and trigger swap. I thought the case (DONTNEED doesn't
trigger swap) is more suitable to evaluate the performance of the patch.

Thanks,
Shaohua


  parent reply	other threads:[~2015-02-25 18:37 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-24  8:18 Minchan Kim
2015-02-24  8:18 ` [PATCH RFC 2/4] mm: change deactivate_page with deactivate_file_page Minchan Kim
2015-02-24  8:18 ` [PATCH RFC 3/4] mm: move lazy free pages to inactive list Minchan Kim
2015-02-24 16:14   ` Michal Hocko
2015-02-25  0:27     ` Minchan Kim
2015-02-25 15:17       ` Michal Hocko
2015-02-24  8:18 ` [PATCH RFC 4/4] mm: support MADV_FREE in swapless system Minchan Kim
2015-02-24 16:51   ` Michal Hocko
2015-02-25  1:41     ` Minchan Kim
2015-02-24 15:43 ` [PATCH RFC 1/4] mm: throttle MADV_FREE Michal Hocko
2015-02-24 22:54   ` Shaohua Li
2015-02-25 14:13     ` Michal Hocko
2015-02-25  0:08   ` Minchan Kim
2015-02-25  7:11     ` Minchan Kim
2015-02-25 15:07       ` Michal Hocko
2015-02-25 18:37       ` Shaohua Li [this message]
2015-02-26  0:42         ` Minchan Kim
2015-02-26 19:04           ` Shaohua Li
2015-02-27  3:37     ` [RFC] mm: change mm_advise_free to clear page dirty Wang, Yalin
2015-02-27  5:28       ` Minchan Kim
2015-02-27  5:48         ` Wang, Yalin
2015-02-27  6:44           ` Minchan Kim
2015-02-27  7:50             ` Wang, Yalin
2015-02-27 13:37               ` Minchan Kim
2015-02-28 13:50                 ` Minchan Kim
2015-03-02  1:59                   ` Wang, Yalin
2015-03-03  0:42                     ` Minchan Kim
2015-02-27 21:02       ` Michal Hocko
2015-02-28  2:11         ` Wang, Yalin
2015-02-28  6:01           ` [RFC V2] " Wang, Yalin
2015-03-02 12:38             ` Michal Hocko
2015-03-03  2:06               ` [RFC V3] " Wang, Yalin
2015-02-28 13:55           ` [RFC] " Minchan Kim
2015-03-02  1:53             ` Wang, Yalin
2015-03-02 12:33           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150225183748.GA2551@kernel.org \
    --to=shli@kernel.org \
    --cc=Yalin.Wang@sonymobile.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --subject='Re: [PATCH RFC 1/4] mm: throttle MADV_FREE' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).