LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@suse.de>, Shaohua Li <shli@kernel.org>,
Yalin.Wang@sonymobile.com
Subject: Re: [PATCH RFC 1/4] mm: throttle MADV_FREE
Date: Wed, 25 Feb 2015 09:08:09 +0900 [thread overview]
Message-ID: <20150225000809.GA6468@blaptop> (raw)
In-Reply-To: <20150224154318.GA14939@dhcp22.suse.cz>
Hi Michal,
On Tue, Feb 24, 2015 at 04:43:18PM +0100, Michal Hocko wrote:
> On Tue 24-02-15 17:18:14, Minchan Kim wrote:
> > Recently, Shaohua reported that MADV_FREE is much slower than
> > MADV_DONTNEED in his MADV_FREE bomb test. The reason is many of
> > applications went to stall with direct reclaim since kswapd's
> > reclaim speed isn't fast than applications's allocation speed
> > so that it causes lots of stall and lock contention.
>
> I am not sure I understand this correctly. So the issue is that there is
> huge number of MADV_FREE on the LRU and they are not close to the tail
> of the list so the reclaim has to do a lot of work before it starts
> dropping them?
No, Shaohua already tested deactivating of hinted pages to head/tail
of inactive anon LRU and he said it didn't solve his problem.
I thought main culprit was scanning/rotating/throttling in
direct reclaim path.
>
> > This patch throttles MADV_FREEing so it works only if there
> > are enough pages in the system which will not trigger backgroud/
> > direct reclaim. Otherwise, MADV_FREE falls back to MADV_DONTNEED
> > because there is no point to delay freeing if we know system
> > is under memory pressure.
>
> Hmm, this is still conforming to the documentation because the kernel is
> free to free pages at its convenience. I am not sure this is a good
> idea, though. Why some MADV_FREE calls should be treated differently?
It's hint for VM to free pages so I think it's okay to free them instantly
sometime if it can save more important thing like system stall.
IOW, madvise is just hint, not a strict rule.
> Wouldn't that lead to hard to predict behavior? E.g. LIFO reused blocks
> would work without long stalls most of the time - except when there is a
> memory pressure.
True.
>
> Comparison to MADV_DONTNEED is not very fair IMHO because the scope of the
> two calls is different.
I agree it's not a apple to apple comparison.
Acutally, MADV_FREE moves the cost from hot path(ie, system call path)
to slow path(ie, reclaim context) so it would be slower if there are
much memory pressure continuously due to a lot overhead of freeing pages
in reclaim context. So, it would be good if kernel detects it nicely
and prevent the situation. This patch aims for that.
>
> > When I test the patch on my 3G machine + 12 CPU + 8G swap,
> > test: 12 processes
> >
> > loop = 5;
> > mmap(512M);
>
> Who is eating the rest of the memory?
As I wrote down, there are 12 processes with below test.
IOW, 512M * 12 = 6G but system RAM is just 3G.
>
> > while (loop--) {
> > memset(512M);
> > madvise(MADV_FREE or MADV_DONTNEED);
> > }
> >
> > 1) dontneed: 6.78user 234.09system 0:48.89elapsed
> > 2) madvfree: 6.03user 401.17system 1:30.67elapsed
> > 3) madvfree + this ptach: 5.68user 113.42system 0:36.52elapsed
> >
> > It's clearly win.
> >
> > Reported-by: Shaohua Li <shli@kernel.org>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>
> I don't know. This looks like a hack with hard to predict consequences
> which might trigger pathological corner cases.
Yeb, it might be. That's why I tagged RFC so hope other guys suggest
better idea.
>
> > ---
> > mm/madvise.c | 13 +++++++++++--
> > 1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6d0fcb8921c2..81bb26ecf064 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -523,8 +523,17 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
> > * XXX: In this implementation, MADV_FREE works like
> > * MADV_DONTNEED on swapless system or full swap.
> > */
> > - if (get_nr_swap_pages() > 0)
> > - return madvise_free(vma, prev, start, end);
> > + if (get_nr_swap_pages() > 0) {
> > + unsigned long threshold;
> > + /*
> > + * If we have trobule with memory pressure(ie,
> > + * under high watermark), free pages instantly.
> > + */
> > + threshold = min_free_kbytes >> (PAGE_SHIFT - 10);
> > + threshold = threshold + (threshold >> 1);
>
> Why threshold += threshold >> 1 ?
I wanted to trigger this logic if we have free pages under high watermark.
>
> > + if (nr_free_pages() > threshold)
> > + return madvise_free(vma, prev, start, end);
> > + }
> > /* passthrough */
> > case MADV_DONTNEED:
> > return madvise_dontneed(vma, prev, start, end);
> > --
> > 1.9.1
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> --
> Michal Hocko
> SUSE Labs
--
Kind regards,
Minchan Kim
next prev parent reply other threads:[~2015-02-25 0:08 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-24 8:18 Minchan Kim
2015-02-24 8:18 ` [PATCH RFC 2/4] mm: change deactivate_page with deactivate_file_page Minchan Kim
2015-02-24 8:18 ` [PATCH RFC 3/4] mm: move lazy free pages to inactive list Minchan Kim
2015-02-24 16:14 ` Michal Hocko
2015-02-25 0:27 ` Minchan Kim
2015-02-25 15:17 ` Michal Hocko
2015-02-24 8:18 ` [PATCH RFC 4/4] mm: support MADV_FREE in swapless system Minchan Kim
2015-02-24 16:51 ` Michal Hocko
2015-02-25 1:41 ` Minchan Kim
2015-02-24 15:43 ` [PATCH RFC 1/4] mm: throttle MADV_FREE Michal Hocko
2015-02-24 22:54 ` Shaohua Li
2015-02-25 14:13 ` Michal Hocko
2015-02-25 0:08 ` Minchan Kim [this message]
2015-02-25 7:11 ` Minchan Kim
2015-02-25 15:07 ` Michal Hocko
2015-02-25 18:37 ` Shaohua Li
2015-02-26 0:42 ` Minchan Kim
2015-02-26 19:04 ` Shaohua Li
2015-02-27 3:37 ` [RFC] mm: change mm_advise_free to clear page dirty Wang, Yalin
2015-02-27 5:28 ` Minchan Kim
2015-02-27 5:48 ` Wang, Yalin
2015-02-27 6:44 ` Minchan Kim
2015-02-27 7:50 ` Wang, Yalin
2015-02-27 13:37 ` Minchan Kim
2015-02-28 13:50 ` Minchan Kim
2015-03-02 1:59 ` Wang, Yalin
2015-03-03 0:42 ` Minchan Kim
2015-02-27 21:02 ` Michal Hocko
2015-02-28 2:11 ` Wang, Yalin
2015-02-28 6:01 ` [RFC V2] " Wang, Yalin
2015-03-02 12:38 ` Michal Hocko
2015-03-03 2:06 ` [RFC V3] " Wang, Yalin
2015-02-28 13:55 ` [RFC] " Minchan Kim
2015-03-02 1:53 ` Wang, Yalin
2015-03-02 12:33 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150225000809.GA6468@blaptop \
--to=minchan@kernel.org \
--cc=Yalin.Wang@sonymobile.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
--cc=shli@kernel.org \
--subject='Re: [PATCH RFC 1/4] mm: throttle MADV_FREE' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).