From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751863AbeCUJ7S (ORCPT ); Wed, 21 Mar 2018 05:59:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:56400 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751505AbeCUJ7Q (ORCPT ); Wed, 21 Mar 2018 05:59:16 -0400 Date: Wed, 21 Mar 2018 10:59:13 +0100 From: Michal Hocko To: Andrey Ryabinin Cc: David Rientjes , "Li,Rongqing" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , "hannes@cmpxchg.org" Subject: Re: =?utf-8?B?562U5aSNOiDnrZTlpI06IFtQQVRD?= =?utf-8?Q?H=5D?= mm/memcontrol.c: speed up to force empty a memory cgroup Message-ID: <20180321095913.GE23100@dhcp22.suse.cz> References: <20180319085355.GQ23100@dhcp22.suse.cz> <2AD939572F25A448A3AE3CAEA61328C23745764B@BC-MAIL-M28.internal.baidu.com> <20180319103756.GV23100@dhcp22.suse.cz> <2AD939572F25A448A3AE3CAEA61328C2374589DC@BC-MAIL-M28.internal.baidu.com> <20180320083950.GD23100@dhcp22.suse.cz> <56508bd0-e8d7-55fd-5109-c8dacf26b13e@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 21-03-18 01:35:05, Andrey Ryabinin wrote: > On 03/21/2018 01:15 AM, David Rientjes wrote: > > On Wed, 21 Mar 2018, Andrey Ryabinin wrote: > > > >>>>> It would probably be best to limit the > >>>>> nr_pages to the amount that needs to be reclaimed, though, rather than > >>>>> over reclaiming. > >>>> > >>>> How do you achieve that? The charging path is not synchornized with the > >>>> shrinking one at all. > >>>> > >>> > >>> The point is to get a better guess at how many pages, up to > >>> SWAP_CLUSTER_MAX, that need to be reclaimed instead of 1. > >>> > >>>>> If you wanted to be invasive, you could change page_counter_limit() to > >>>>> return the count - limit, fix up the callers that look for -EBUSY, and > >>>>> then use max(val, SWAP_CLUSTER_MAX) as your nr_pages. > >>>> > >>>> I am not sure I understand > >>>> > >>> > >>> Have page_counter_limit() return the number of pages over limit, i.e. > >>> count - limit, since it compares the two anyway. Fix up existing callers > >>> and then clamp that value to SWAP_CLUSTER_MAX in > >>> mem_cgroup_resize_limit(). It's a more accurate guess than either 1 or > >>> 1024. > >>> > >> > >> JFYI, it's never 1, it's always SWAP_CLUSTER_MAX. > >> See try_to_free_mem_cgroup_pages(): > >> .... > >> struct scan_control sc = { > >> .nr_to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX), > >> > > > > Is SWAP_CLUSTER_MAX the best answer if I'm lowering the limit by 1GB? > > > > Absolutely not. I completely on your side here. > I've tried to fix this recently - http://lkml.kernel.org/r/20180119132544.19569-2-aryabinin@virtuozzo.com > I guess that Andrew decided to not take my patch, because Michal wasn't > happy about it (see mail archives if you want more details). I was unhappy about the explanation and justification of the patch. It is still not clear to me why try_to_free_mem_cgroup_pages with a single target should be slower than multiple calls of this function with smaller batches when the real reclaim is still SWAP_CLUSTER_MAX batched. There is also a theoretical risk of over reclaim. Especially with large targets. -- Michal Hocko SUSE Labs