LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Russell King - ARM Linux <linux@armlinux.org.uk>
To: Matthew Wilcox <willy@infradead.org>
Cc: Jia He <hejianet@gmail.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.com>,
	Wei Yang <richard.weiyang@gmail.com>,
	Kees Cook <keescook@chromium.org>,
	Laura Abbott <labbott@redhat.com>,
	Vladimir Murzin <vladimir.murzin@arm.com>,
	Philip Derrin <philip@cog.systems>,
	AKASHI Takahiro <takahiro.akashi@linaro.org>,
	James Morse <james.morse@arm.com>,
	Steve Capper <steve.capper@arm.com>,
	Pavel Tatashin <pasha.tatashin@oracle.com>,
	Gioh Kim <gi-oh.kim@profitbricks.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Kemi Wang <kemi.wang@intel.com>, Petr Tesarik <ptesarik@suse.com>,
	YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	Daniel Vacek <neelx@redhat.com>,
	Eugeniu Rosca <erosca@de.adit-jv.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Jia He <jia.he@hxt-semitech.com>
Subject: Re: [PATCH v7 2/5] arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()
Date: Fri, 6 Apr 2018 10:09:20 +0100	[thread overview]
Message-ID: <20180406090920.GM16141@n2100.armlinux.org.uk> (raw)
In-Reply-To: <20180405125054.GC2647@bombadil.infradead.org>

On Thu, Apr 05, 2018 at 05:50:54AM -0700, Matthew Wilcox wrote:
> On Thu, Apr 05, 2018 at 08:44:12PM +0800, Jia He wrote:
> > 
> > 
> > On 4/5/2018 7:34 PM, Matthew Wilcox Wrote:
> > > On Thu, Apr 05, 2018 at 01:04:35AM -0700, Jia He wrote:
> > > > Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
> > > > where possible") optimized the loop in memmap_init_zone(). But there is
> > > > still some room for improvement. E.g. if pfn and pfn+1 are in the same
> > > > memblock region, we can simply pfn++ instead of doing the binary search
> > > > in memblock_next_valid_pfn.
> > > Sure, but I bet if we are >end_pfn, we're almost certainly going to the
> > > start_pfn of the next block, so why not test that as well?
> > > 
> > > > +	/* fast path, return pfn+1 if next pfn is in the same region */
> > > > +	if (early_region_idx != -1) {
> > > > +		start_pfn = PFN_DOWN(regions[early_region_idx].base);
> > > > +		end_pfn = PFN_DOWN(regions[early_region_idx].base +
> > > > +				regions[early_region_idx].size);
> > > > +
> > > > +		if (pfn >= start_pfn && pfn < end_pfn)
> > > > +			return pfn;
> > > 		early_region_idx++;
> > > 		start_pfn = PFN_DOWN(regions[early_region_idx].base);
> > > 		if (pfn >= end_pfn && pfn <= start_pfn)
> > > 			return start_pfn;
> > Thanks, thus the binary search in next step can be discarded?
> 
> I don't know all the circumstances in which this is called.  Maybe a linear
> search with memo is more appropriate than a binary search.

That's been brought up before, and the reasoning appears to be
something along the lines of...

Academics and published wisdom is that on cached architectures, binary
searches are bad because it doesn't operate efficiently due to the
overhead from having to load cache lines.  Consequently, there seems
to be a knee-jerk reaction that "all binary searches are bad, we must
eliminate them."

What is failed to be grasped here, though, is that it is typical that
the number of entries in this array tend to be small, so the entire
array takes up one or two cache lines, maybe a maximum of four lines
depending on your cache line length and number of entries.

This means that the binary search expense is reduced, and is lower
than a linear search for the majority of cases.

What is key here as far as performance is concerned is whether the
general usage of pfn_valid() by the kernel is optimal.  We should
not optimise only for the boot case, which means evaluating the
effect of these changes with _real_ workloads, not just "does my
machine boot a milliseconds faster".

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

  reply	other threads:[~2018-04-06  9:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-05  8:04 [PATCH v7 0/5] optimize memblock_next_valid_pfn and early_pfn_valid on arm and arm64 Jia He
2018-04-05  8:04 ` [PATCH v7 1/5] mm: page_alloc: remain memblock_next_valid_pfn() " Jia He
2018-04-05 11:23   ` Matthew Wilcox
2018-04-05 12:29     ` Jia He
2018-04-05  8:04 ` [PATCH v7 2/5] arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn() Jia He
2018-04-05 11:34   ` Matthew Wilcox
2018-04-05 12:44     ` Jia He
2018-04-05 12:50       ` Matthew Wilcox
2018-04-06  9:09         ` Russell King - ARM Linux [this message]
2018-04-06 10:23           ` Daniel Vacek
2018-04-08  2:05           ` Jia He
2018-04-05  8:04 ` [PATCH v7 3/5] mm/memblock: introduce memblock_search_pfn_regions() Jia He
2018-04-05  8:04 ` [PATCH v7 4/5] arm: arm64: introduce pfn_valid_region() Jia He
2018-04-05  8:04 ` [PATCH v7 5/5] mm: page_alloc: reduce unnecessary binary search in early_pfn_valid() Jia He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180406090920.GM16141@n2100.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=catalin.marinas@arm.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=erosca@de.adit-jv.com \
    --cc=gi-oh.kim@profitbricks.com \
    --cc=hannes@cmpxchg.org \
    --cc=hejianet@gmail.com \
    --cc=james.morse@arm.com \
    --cc=jia.he@hxt-semitech.com \
    --cc=keescook@chromium.org \
    --cc=kemi.wang@intel.com \
    --cc=labbott@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=nborisov@suse.com \
    --cc=neelx@redhat.com \
    --cc=pasha.tatashin@oracle.com \
    --cc=philip@cog.systems \
    --cc=ptesarik@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=steve.capper@arm.com \
    --cc=takahiro.akashi@linaro.org \
    --cc=vbabka@suse.cz \
    --cc=vladimir.murzin@arm.com \
    --cc=will.deacon@arm.com \
    --cc=willy@infradead.org \
    --cc=yasu.isimatu@gmail.com \
    --subject='Re: [PATCH v7 2/5] arm: arm64: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).