LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, Rik van Riel <riel@surriel.com>,
	Christoph Lameter <cl@linux.com>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Cgroups <cgroups@vger.kernel.org>
Subject: Re: [PATCH v4 5/7] mm: rework non-root kmem_cache lifecycle management
Date: Tue, 21 May 2019 15:35:45 -0400	[thread overview]
Message-ID: <e94301ee-b12d-597f-d195-6716b0af1363@redhat.com> (raw)
In-Reply-To: <20190521192320.GA6658@tower.DHCP.thefacebook.com>

On 5/21/19 3:23 PM, Roman Gushchin wrote:
> On Tue, May 21, 2019 at 02:39:50PM -0400, Waiman Long wrote:
>> On 5/14/19 8:06 PM, Shakeel Butt wrote:
>>>> @@ -2651,20 +2652,35 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
>>>>         struct mem_cgroup *memcg;
>>>>         struct kmem_cache *memcg_cachep;
>>>>         int kmemcg_id;
>>>> +       struct memcg_cache_array *arr;
>>>>
>>>>         VM_BUG_ON(!is_root_cache(cachep));
>>>>
>>>>         if (memcg_kmem_bypass())
>>>>                 return cachep;
>>>>
>>>> -       memcg = get_mem_cgroup_from_current();
>>>> +       rcu_read_lock();
>>>> +
>>>> +       if (unlikely(current->active_memcg))
>>>> +               memcg = current->active_memcg;
>>>> +       else
>>>> +               memcg = mem_cgroup_from_task(current);
>>>> +
>>>> +       if (!memcg || memcg == root_mem_cgroup)
>>>> +               goto out_unlock;
>>>> +
>>>>         kmemcg_id = READ_ONCE(memcg->kmemcg_id);
>>>>         if (kmemcg_id < 0)
>>>> -               goto out;
>>>> +               goto out_unlock;
>>>>
>>>> -       memcg_cachep = cache_from_memcg_idx(cachep, kmemcg_id);
>>>> -       if (likely(memcg_cachep))
>>>> -               return memcg_cachep;
>>>> +       arr = rcu_dereference(cachep->memcg_params.memcg_caches);
>>>> +
>>>> +       /*
>>>> +        * Make sure we will access the up-to-date value. The code updating
>>>> +        * memcg_caches issues a write barrier to match this (see
>>>> +        * memcg_create_kmem_cache()).
>>>> +        */
>>>> +       memcg_cachep = READ_ONCE(arr->entries[kmemcg_id]);
>>>>
>>>>         /*
>>>>          * If we are in a safe context (can wait, and not in interrupt
>>>> @@ -2677,10 +2693,20 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
>>>>          * memcg_create_kmem_cache, this means no further allocation
>>>>          * could happen with the slab_mutex held. So it's better to
>>>>          * defer everything.
>>>> +        *
>>>> +        * If the memcg is dying or memcg_cache is about to be released,
>>>> +        * don't bother creating new kmem_caches. Because memcg_cachep
>>>> +        * is ZEROed as the fist step of kmem offlining, we don't need
>>>> +        * percpu_ref_tryget() here. css_tryget_online() check in
>>> *percpu_ref_tryget_live()
>>>
>>>> +        * memcg_schedule_kmem_cache_create() will prevent us from
>>>> +        * creation of a new kmem_cache.
>>>>          */
>>>> -       memcg_schedule_kmem_cache_create(memcg, cachep);
>>>> -out:
>>>> -       css_put(&memcg->css);
>>>> +       if (unlikely(!memcg_cachep))
>>>> +               memcg_schedule_kmem_cache_create(memcg, cachep);
>>>> +       else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt))
>>>> +               cachep = memcg_cachep;
>>>> +out_unlock:
>>>> +       rcu_read_lock();
>> There is one more bug that causes the kernel to panic on bootup when I
>> turned on debugging options.
>>
>> [   49.871437] =============================
>> [   49.875452] WARNING: suspicious RCU usage
>> [   49.879476] 5.2.0-rc1.bz1699202_memcg_test+ #2 Not tainted
>> [   49.884967] -----------------------------
>> [   49.888991] include/linux/rcupdate.h:268 Illegal context switch in
>> RCU read-side critical section!
>> [   49.897950]
>> [   49.897950] other info that might help us debug this:
>> [   49.897950]
>> [   49.905958]
>> [   49.905958] rcu_scheduler_active = 2, debug_locks = 1
>> [   49.912492] 3 locks held by systemd/1:
>> [   49.916252]  #0: 00000000633673c5 (&type->i_mutex_dir_key#5){.+.+},
>> at: lookup_slow+0x42/0x70
>> [   49.924788]  #1: 0000000029fa8c75 (rcu_read_lock){....}, at:
>> memcg_kmem_get_cache+0x12b/0x910
>> [   49.933316]  #2: 0000000029fa8c75 (rcu_read_lock){....}, at:
>> memcg_kmem_get_cache+0x3da/0x910
>>
>> It should be "rcu_read_unlock();" at the end.
> Oops. Good catch, thanks Waiman!
>
> I'm somewhat surprised it didn't get up in my tests, neither any of test
> bots caught it. Anyway, I'll fix it and send v5.

In non-preempt kernel rcu_read_lock() is almost a no-op. So you probably
won't see any ill effect with this bug.

>
> Does the rest of the patchset looks sane to you?

I haven't done a full review of the patch, but it looks sane to me from
my cursory look at it. We hit similar problem in Red Hat. That is why I
am looking at your patch. Looking forward to your v5 patch.

Cheers,
Longman


  reply	other threads:[~2019-05-21 19:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190514213940.2405198-1-guro@fb.com>
     [not found] ` <20190514213940.2405198-7-guro@fb.com>
2019-05-15  0:10   ` [PATCH v4 6/7] mm: reparent slab memory on cgroup removal Shakeel Butt
     [not found] ` <20190514213940.2405198-8-guro@fb.com>
2019-05-15  0:16   ` [PATCH v4 7/7] mm: fix /proc/kpagecgroup interface for slab pages Shakeel Butt
     [not found] ` <20190514213940.2405198-6-guro@fb.com>
2019-05-15  0:06   ` [PATCH v4 5/7] mm: rework non-root kmem_cache lifecycle management Shakeel Butt
2019-05-20 14:54     ` Waiman Long
2019-05-20 17:56       ` Roman Gushchin
     [not found]     ` <7d06354d-4542-af42-d83d-2bc4639b56f2@redhat.com>
2019-05-21 19:23       ` Roman Gushchin
2019-05-21 19:35         ` Waiman Long [this message]
2019-05-15 14:00   ` Christopher Lameter
2019-05-15 14:11     ` Shakeel Butt
2019-05-23  0:58   ` [mm] e52271917f: BUG:sleeping_function_called_from_invalid_context_at_mm/slab.h kernel test robot
2019-05-23 21:00     ` Roman Gushchin
2019-06-05  7:39 ` [PATCH v4 0/7] mm: reparent slab memory on cgroup removal Greg Thelen
2019-06-05 17:33   ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e94301ee-b12d-597f-d195-6716b0af1363@redhat.com \
    --to=longman@redhat.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=vdavydov.dev@gmail.com \
    --subject='Re: [PATCH v4 5/7] mm: rework non-root kmem_cache lifecycle management' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).