LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Shile Zhang <shile.zhang@linux.alibaba.com>
To: Kirill Tkhai <ktkhai@virtuozzo.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v2 1/1] mm: fix interrupt disabled long time inside deferred_init_memmap()
Date: Wed, 4 Mar 2020 10:34:03 +0800	[thread overview]
Message-ID: <386d7d5f-a57d-f5b1-acee-131ce23d35ec@linux.alibaba.com> (raw)
In-Reply-To: <fc22967d-0803-2e6f-26af-148a24f8f958@virtuozzo.com>

Hi Kirill,

Thanks for your quickly reply!

On 2020/3/4 00:52, Kirill Tkhai wrote:
> On 03.03.2020 19:15, Shile Zhang wrote:
>> When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread will
>> initialise the deferred pages with local interrupts disabled. It is
>> introduced by commit 3a2d7fa8a3d5 ("mm: disable interrupts while
>> initializing deferred pages").
>>
>> The local interrupt will be disabled long time inside
>> deferred_init_memmap(), depends on memory size.
>> On machine with NCPUS <= 2, the 'pgdatinit' kthread could be pined on
>> boot CPU, then the tick timer will stuck long time, which caused the
>> system wall time inaccuracy.
>>
>> For example, the dmesg shown that:
>>
>>    [    0.197975] node 0 initialised, 32170688 pages in 1ms
>>
>> Obviously, 1ms is unreasonable.
>> Now, fix it by restore in the pending interrupts inside the while loop.
>> The reasonable demsg shown likes:
>>
>> [    1.069306] node 0 initialised, 32203456 pages in 894ms
> The way I understand the original problem, that Pavel fixed:
>
> we need disable irqs in deferred_init_memmap() since this function may be called
> in parallel with deferred_grow_zone() called from interrupt handler. So, Pavel
> added lock to fix the race.
>
> In case of we temporary unlock the lock, interrupt still be possible,
> so my previous proposition returns the problem back.
>
> Now thought again, I think we have to just add:
>
> 	pgdat_resize_unlock();
> 	pgdat_resize_lock();
>
> instead of releasing interrupts, since in case of we just release them with lock held,
> a call of interrupt->deferred_grow_zone() bring us to a deadlock.
>
> So, unlock the lock is must.

Yes, you're right! I missed this point.
Thanks for your comment!

>
>> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
>> ---
>>   mm/page_alloc.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3c4eb750a199..d3f337f2e089 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1809,8 +1809,12 @@ static int __init deferred_init_memmap(void *data)
>>   	 * that we can avoid introducing any issues with the buddy
>>   	 * allocator.
>>   	 */
>> -	while (spfn < epfn)
>> +	while (spfn < epfn) {
>>   		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
>> +		/* let in any pending interrupts */
>> +		local_irq_restore(flags);
>> +		local_irq_save(flags);
>> +	}
>>   zone_empty:
>>   	pgdat_resize_unlock(pgdat, &flags);
> I think we need here something like below (untested):
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 79e950d76ffc..323afa9a4db5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1828,7 +1828,7 @@ static int __init deferred_init_memmap(void *data)
>   {
>   	pg_data_t *pgdat = data;
>   	const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
> -	unsigned long spfn = 0, epfn = 0, nr_pages = 0;
> +	unsigned long spfn = 0, epfn = 0, nr_pages = 0, prev_nr_pages = 0;
>   	unsigned long first_init_pfn, flags;
>   	unsigned long start = jiffies;
>   	struct zone *zone;
> @@ -1869,8 +1869,18 @@ static int __init deferred_init_memmap(void *data)
>   	 * that we can avoid introducing any issues with the buddy
>   	 * allocator.
>   	 */
> -	while (spfn < epfn)
> +	while (spfn < epfn) {
>   		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
> +		/*
> +		 * Release interrupts every 1Gb to give a possibility
> +		 * a timer to advance jiffies.
> +		 */
> +		if (nr_pages - prev_nr_pages > (1UL << (30 - PAGE_SHIFT))) {
> +			prev_nr_pages = nr_pages;
> +			pgdat_resize_unlock(pgdat, &flags);
> +			pgdat_resize_lock(pgdat, &flags);
> +		}
> +	}
>   zone_empty:
>   	pgdat_resize_unlock(pgdat, &flags);
>   
>
> (I believe the comment may be improved more).

Yeah, your patch is better!
I test your code and it works!
But it seems that 1G is still hold the interrupts too long, about 40ms 
in my env
with Intel(R) Xeon(R) 2.5GHz). I tried other size, it is OK to use 1024 
pages (4MB),
which suggested by Andrew's before.

Could you please help to review it again?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3c4eb750a199..5def66d3ffcd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1768,7 +1768,7 @@ static int __init deferred_init_memmap(void *data)
  {
         pg_data_t *pgdat = data;
         const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
-       unsigned long spfn = 0, epfn = 0, nr_pages = 0;
+       unsigned long spfn = 0, epfn = 0, nr_pages = 0, prev_nr_pages = 0;
         unsigned long first_init_pfn, flags;
         unsigned long start = jiffies;
         struct zone *zone;
@@ -1809,8 +1809,17 @@ static int __init deferred_init_memmap(void *data)
          * that we can avoid introducing any issues with the buddy
          * allocator.
          */
-       while (spfn < epfn)
+       while (spfn < epfn) {
                 nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+               /*
+                * Restore pending interrupts every 1024 pages to give
+                * the chance tick timer to advance jiffies.
+                */
+               if (nr_pages - prev_nr_pages > 1024) {
+                       pgdat_resize_unlock(&flags);
+                       pgdat_resize_lock(&flags);
+               }
+       }
  zone_empty:
         pgdat_resize_unlock(pgdat, &flags);



  reply	other threads:[~2020-03-04  2:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-03 16:15 [PATCH v2 0/1] mm: fix interrupt disabled long time inside Shile Zhang
2020-03-03 16:15 ` [PATCH v2 1/1] mm: fix interrupt disabled long time inside deferred_init_memmap() Shile Zhang
2020-03-03 16:52   ` Kirill Tkhai
2020-03-04  2:34     ` Shile Zhang [this message]
2020-03-04 10:47       ` Kirill Tkhai
2020-03-11  1:44         ` Shile Zhang
2020-03-11 11:42           ` Kirill Tkhai
2020-03-11 12:00             ` Shile Zhang
2020-03-11 12:04               ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=386d7d5f-a57d-f5b1-acee-131ce23d35ec@linux.alibaba.com \
    --to=shile.zhang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    --subject='Re: [PATCH v2 1/1] mm: fix interrupt disabled long time inside deferred_init_memmap()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).