LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Tejun Heo <tj@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org
Subject: Re: [PATCH 6/6] mm/vmscan: Don't mess with pgdat->flags in memcg reclaim.
Date: Wed, 21 Mar 2018 20:01:32 +0300	[thread overview]
Message-ID: <d29cdc69-718c-7c7e-bffb-d716d343a154@virtuozzo.com> (raw)
In-Reply-To: <20180321114301.GH23100@dhcp22.suse.cz>

On 03/21/2018 02:43 PM, Michal Hocko wrote:
> On Wed 21-03-18 14:14:35, Andrey Ryabinin wrote:
>>
>>
>> On 03/20/2018 06:29 PM, Michal Hocko wrote:
>>
>>>> Leave all pgdat->flags manipulations to kswapd. kswapd scans the whole
>>>> pgdat, so it's reasonable to leave all decisions about node stat
>>>> to kswapd. Also add per-cgroup congestion state to avoid needlessly
>>>> burning CPU in cgroup reclaim if heavy congestion is observed.
>>>>
>>>> Currently there is no need in per-cgroup PGDAT_WRITEBACK and PGDAT_DIRTY
>>>> bits since they alter only kswapd behavior.
>>>>
>>>> The problem could be easily demonstrated by creating heavy congestion
>>>> in one cgroup:
>>>>
>>>>     echo "+memory" > /sys/fs/cgroup/cgroup.subtree_control
>>>>     mkdir -p /sys/fs/cgroup/congester
>>>>     echo 512M > /sys/fs/cgroup/congester/memory.max
>>>>     echo $$ > /sys/fs/cgroup/congester/cgroup.procs
>>>>     /* generate a lot of diry data on slow HDD */
>>>>     while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done &
>>>>     ....
>>>>     while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done &
>>>>
>>>> and some job in another cgroup:
>>>>
>>>>     mkdir /sys/fs/cgroup/victim
>>>>     echo 128M > /sys/fs/cgroup/victim/memory.max
>>>>
>>>>     # time cat /dev/sda > /dev/null
>>>>     real    10m15.054s
>>>>     user    0m0.487s
>>>>     sys     1m8.505s
>>>>
>>>> According to the tracepoint in wait_iff_congested(), the 'cat' spent 50%
>>>> of the time sleeping there.
>>>>
>>>> With the patch, cat don't waste time anymore:
>>>>
>>>>     # time cat /dev/sda > /dev/null
>>>>     real    5m32.911s
>>>>     user    0m0.411s
>>>>     sys     0m56.664s
>>>>
>>>> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
>>>> ---
>>>>  include/linux/backing-dev.h |  2 +-
>>>>  include/linux/memcontrol.h  |  2 ++
>>>>  mm/backing-dev.c            | 19 ++++------
>>>>  mm/vmscan.c                 | 84 ++++++++++++++++++++++++++++++++-------------
>>>>  4 files changed, 70 insertions(+), 37 deletions(-)
>>>
>>> This patch seems overly complicated. Why don't you simply reduce the whole
>>> pgdat_flags handling to global_reclaim()?
>>>
>>
>> In that case cgroup2 reclaim wouldn't have any way of throttling if
>> cgroup is full of congested dirty pages.
> 
> It's been some time since I've looked into the throttling code so pardon
> my ignorance. Don't cgroup v2 users get throttled in the write path to
> not dirty too many pages in the first place?
 
Yes, they do. The same as no cgroup users. Basically, cgroup v2 mimics
all that global reclaim, dirty ratelimiting, throttling stuff.
However, balance_dirty_pages() can't always protect from too much dirty problem.
E.g. you could mmap() file and start writing to it at the speed of RAM.
balance_dirty_pages() can't stop you there.


> In other words, is your patch trying to fix two different things? One is
> per-memcg reclaim influencing the global pgdat state, which is clearly
> wrng, and cgroup v2 reclaim throttling that is not pgdat based?
> 

Kinda, but the second problem is introduced by fixing the first one. Since
without fixing the first problem we have congestion throttling in cgroup v2.
Yes, it's based on global pgdat state, which is wrong, but it should throttle
cgroupv2 reclaim when memcg is congested.

I didn't try to evaluate how useful this whole congestion throttling is,
and whether it's really needed. But if we need for global reclaim, than
we probably need it in cgroup2 reclaim too.


P.S. Some observation: blk-mq seems doesn't have any congestion mechanism.
Dunno whether is this intentional or just an oversight. This means congestion
throttling never happens on such system.

  reply	other threads:[~2018-03-21 17:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-15 16:45 [PATCH 1/6] mm/vmscan: Wake up flushers for legacy cgroups too Andrey Ryabinin
2018-03-15 16:45 ` [PATCH 2/6] mm/vmscan: Update stale comments Andrey Ryabinin
2018-03-20 15:00   ` Michal Hocko
2018-03-15 16:45 ` [PATCH 3/6] mm/vmscan: replace mm_vmscan_lru_shrink_inactive with shrink_page_list tracepoint Andrey Ryabinin
2018-03-15 16:45 ` [PATCH 4/6] mm/vmscan: remove redundant current_may_throttle() check Andrey Ryabinin
2018-03-20 15:11   ` Michal Hocko
2018-03-15 16:45 ` [PATCH 5/6] mm/vmscan: Don't change pgdat state on base of a single LRU list state Andrey Ryabinin
2018-03-20 15:25   ` Michal Hocko
2018-03-21 10:40     ` Andrey Ryabinin
2018-03-21 11:32       ` Michal Hocko
2018-03-21 15:57         ` Andrey Ryabinin
2018-03-15 16:45 ` [PATCH 6/6] mm/vmscan: Don't mess with pgdat->flags in memcg reclaim Andrey Ryabinin
2018-03-20 15:29   ` Michal Hocko
2018-03-21 11:14     ` Andrey Ryabinin
2018-03-21 11:43       ` Michal Hocko
2018-03-21 17:01         ` Andrey Ryabinin [this message]
2018-03-15 18:57 ` [PATCH 1/6] mm/vmscan: Wake up flushers for legacy cgroups too Shakeel Butt
2018-03-20 15:00 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d29cdc69-718c-7c7e-bffb-d716d343a154@virtuozzo.com \
    --to=aryabinin@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=tj@kernel.org \
    --subject='Re: [PATCH 6/6] mm/vmscan: Don'\''t mess with pgdat->flags in memcg reclaim.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).