LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Tejun Heo <tj@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
cgroups@vger.kernel.org
Subject: Re: [PATCH 6/6] mm/vmscan: Don't mess with pgdat->flags in memcg reclaim.
Date: Wed, 21 Mar 2018 12:43:01 +0100 [thread overview]
Message-ID: <20180321114301.GH23100@dhcp22.suse.cz> (raw)
In-Reply-To: <c3405049-222d-a045-4ce5-8e51817d89b6@virtuozzo.com>
On Wed 21-03-18 14:14:35, Andrey Ryabinin wrote:
>
>
> On 03/20/2018 06:29 PM, Michal Hocko wrote:
>
> >> Leave all pgdat->flags manipulations to kswapd. kswapd scans the whole
> >> pgdat, so it's reasonable to leave all decisions about node stat
> >> to kswapd. Also add per-cgroup congestion state to avoid needlessly
> >> burning CPU in cgroup reclaim if heavy congestion is observed.
> >>
> >> Currently there is no need in per-cgroup PGDAT_WRITEBACK and PGDAT_DIRTY
> >> bits since they alter only kswapd behavior.
> >>
> >> The problem could be easily demonstrated by creating heavy congestion
> >> in one cgroup:
> >>
> >> echo "+memory" > /sys/fs/cgroup/cgroup.subtree_control
> >> mkdir -p /sys/fs/cgroup/congester
> >> echo 512M > /sys/fs/cgroup/congester/memory.max
> >> echo $$ > /sys/fs/cgroup/congester/cgroup.procs
> >> /* generate a lot of diry data on slow HDD */
> >> while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done &
> >> ....
> >> while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done &
> >>
> >> and some job in another cgroup:
> >>
> >> mkdir /sys/fs/cgroup/victim
> >> echo 128M > /sys/fs/cgroup/victim/memory.max
> >>
> >> # time cat /dev/sda > /dev/null
> >> real 10m15.054s
> >> user 0m0.487s
> >> sys 1m8.505s
> >>
> >> According to the tracepoint in wait_iff_congested(), the 'cat' spent 50%
> >> of the time sleeping there.
> >>
> >> With the patch, cat don't waste time anymore:
> >>
> >> # time cat /dev/sda > /dev/null
> >> real 5m32.911s
> >> user 0m0.411s
> >> sys 0m56.664s
> >>
> >> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
> >> ---
> >> include/linux/backing-dev.h | 2 +-
> >> include/linux/memcontrol.h | 2 ++
> >> mm/backing-dev.c | 19 ++++------
> >> mm/vmscan.c | 84 ++++++++++++++++++++++++++++++++-------------
> >> 4 files changed, 70 insertions(+), 37 deletions(-)
> >
> > This patch seems overly complicated. Why don't you simply reduce the whole
> > pgdat_flags handling to global_reclaim()?
> >
>
> In that case cgroup2 reclaim wouldn't have any way of throttling if
> cgroup is full of congested dirty pages.
It's been some time since I've looked into the throttling code so pardon
my ignorance. Don't cgroup v2 users get throttled in the write path to
not dirty too many pages in the first place?
In other words, is your patch trying to fix two different things? One is
per-memcg reclaim influencing the global pgdat state, which is clearly
wrng, and cgroup v2 reclaim throttling that is not pgdat based?
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-03-21 11:43 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-15 16:45 [PATCH 1/6] mm/vmscan: Wake up flushers for legacy cgroups too Andrey Ryabinin
2018-03-15 16:45 ` [PATCH 2/6] mm/vmscan: Update stale comments Andrey Ryabinin
2018-03-20 15:00 ` Michal Hocko
2018-03-15 16:45 ` [PATCH 3/6] mm/vmscan: replace mm_vmscan_lru_shrink_inactive with shrink_page_list tracepoint Andrey Ryabinin
2018-03-15 16:45 ` [PATCH 4/6] mm/vmscan: remove redundant current_may_throttle() check Andrey Ryabinin
2018-03-20 15:11 ` Michal Hocko
2018-03-15 16:45 ` [PATCH 5/6] mm/vmscan: Don't change pgdat state on base of a single LRU list state Andrey Ryabinin
2018-03-20 15:25 ` Michal Hocko
2018-03-21 10:40 ` Andrey Ryabinin
2018-03-21 11:32 ` Michal Hocko
2018-03-21 15:57 ` Andrey Ryabinin
2018-03-15 16:45 ` [PATCH 6/6] mm/vmscan: Don't mess with pgdat->flags in memcg reclaim Andrey Ryabinin
2018-03-20 15:29 ` Michal Hocko
2018-03-21 11:14 ` Andrey Ryabinin
2018-03-21 11:43 ` Michal Hocko [this message]
2018-03-21 17:01 ` Andrey Ryabinin
2018-03-15 18:57 ` [PATCH 1/6] mm/vmscan: Wake up flushers for legacy cgroups too Shakeel Butt
2018-03-20 15:00 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180321114301.GH23100@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=aryabinin@virtuozzo.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=tj@kernel.org \
--subject='Re: [PATCH 6/6] mm/vmscan: Don'\''t mess with pgdat->flags in memcg reclaim.' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).