LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>,
	Mike Stroyan <mike.stroyan@hp.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Date: Thu, 26 Apr 2007 17:53:49 +1000	[thread overview]
Message-ID: <46305A8D.2080003@yahoo.com.au> (raw)
In-Reply-To: <20070425205548.fd51b301.akpm@linux-foundation.org>

Hi,

I had a couple of questions which I'm hoping someone would be kind
enough to explain :)

Andrew Morton wrote:
> guys, aplication crashes on million-dollar machines aren't nice.  Please review carefully
> and urgently?
> 
> 
> Begin forwarded message:
> 
> Date: Wed, 25 Apr 2007 18:16:15 -0600
> From: Mike Stroyan <mike.stroyan@hp.com>
> To: "Luck, Tony" <tony.luck@intel.com>
> Cc: linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org
> Subject: [PATCH] ia64: race flushing icache in do_no_page path
> 
> 
>   This is a very similar problem to a copy-on-write cache flushing problem
> that Tony Luck fixed in July 2006.  In this case the do_no_page function
> handles a fault in an executable or library that is mmapped from an
> NFS file system.  The code is copied into a newly reallocated page.
> The lazy_mmu_prot_update() function should be used to flush old entries
> from the icache for that page on ia64 processors.  But that call is made
> after a set_pte_at call that makes the page accessible to other threads
> executing the same code.  This was seen to cause application crashes
> when an OpenMP application ran many threads calling same functions at
> the same time.  The first thread to reach a page starts to fault in the
> new code.  One of the other threads overtakes the first and executes old
> data from the icache.  That could result in bad instructions.  It is more
> obvious when an old cache line contains prefetched non-instruction bits
> that result in an illegal instruction trap.

I wonder how this is different to all the other code which calls
lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example,
_could_ fault in executable code, couldn't it?

It is because do_swap_page uses flush_icache_page()? So why doesn't
the flush_icache_page() work in do_no_page as well? (It seems to look
like a superset of lazy_mmu_prot_update on ia64?!?).

And while we're looking at flush_icache_page, why is there none in
do_wp_page (I admit, I'm not really up to scratch on d/i cache aliasing
handling, but cachetlb.txt seems to suggest that cow_user_page fits the
description). That is, if we're already trying to cover our butts wrt
SMC, then do_wp_page _could_ be cow'ing executable code, couldn't it?

And for that matter, I admit I don't understand how the icache flushing
can be done lazily, only at change-protection time. Why is any
flush_dcache_page() site not a problem for an _existing_ executable pte
wrt d/i cache aliases?

BTW. while I'm ranting, I hope all this stuff has gone so complex for a
reason, and that being that the alternative simpler approach of more
flushes, less lazy, less complex, less buggy was tested and found to be
noticably slower... :)



> 
>   The problem has only been seen on montecito processors which have
> separate level 2 icache and dcache.  This dcache to icache coherency
> problem is more likely to occur there because of the much larger level
> 2 icache.  I suspect that the non-NFS case is working because direct
> DMA into the new page is making the instruction cache coherent.  Any
> file system that uses a non-DMA copy into the text page could show the
> same problem.
> 
> Signed-off-by: Mike Stroyan <mike.stroyan@hp.com>
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index e7066e7..50c8848 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2291,6 +2291,7 @@ retry:
>  		entry = mk_pte(new_page, vma->vm_page_prot);
>  		if (write_access)
>  			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +		lazy_mmu_prot_update(entry);
>  		set_pte_at(mm, address, page_table, entry);
>  		if (anon) {
>  			inc_mm_counter(mm, anon_rss);
> @@ -2312,7 +2313,6 @@ retry:
>  
>  	/* no need to invalidate: a not-present page shouldn't be cached */
>  	update_mmu_cache(vma, address, entry);
> -	lazy_mmu_prot_update(entry);
>  unlock:
>  	pte_unmap_unlock(page_table, ptl);
>  	if (dirty_page) {
> 


-- 
SUSE Labs, Novell Inc.

       reply	other threads:[~2007-04-26  7:54 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070425205548.fd51b301.akpm@linux-foundation.org>
2007-04-26  7:53 ` Nick Piggin [this message]
2007-04-26 17:35   ` Mike Stroyan
2007-04-27 11:55     ` Nick Piggin
2007-04-27 14:18       ` Hugh Dickins
2007-04-27 17:02         ` David Mosberger-Tang
2007-04-28  1:31         ` Rohit Seth
2007-04-28  5:34           ` Hugh Dickins
2007-04-28 18:17             ` Rohit Seth
2007-05-01 11:52               ` Nick Piggin
2007-05-02  0:36                 ` Rohit Seth
2007-05-02  2:05                   ` Nick Piggin
2007-04-28  2:16         ` Nick Piggin
2007-04-28  1:24       ` Rohit Seth
2007-04-28  2:00         ` Nick Piggin
2007-04-28  3:04           ` Nick Piggin
2007-04-28  5:20             ` Hugh Dickins
2007-04-28  6:03               ` Nick Piggin
2007-04-28 18:30                 ` Rohit Seth
2007-05-01 11:47                   ` Nick Piggin
2007-05-02  0:36                     ` Rohit Seth
2007-04-28 18:05               ` Rohit Seth
2007-05-01 11:43                 ` Nick Piggin
2007-05-04 21:32                   ` Mike Stroyan
2007-04-28  4:11           ` Nick Piggin
2007-04-28 17:57           ` Rohit Seth
2007-05-01 11:39             ` Nick Piggin
2007-05-02  0:36               ` Rohit Seth
2007-05-02  1:57                 ` Nick Piggin
2007-07-04 14:24 Zoltan Menyhart
2007-07-04 16:58 ` KAMEZAWA Hiroyuki
2007-07-05  8:57   ` Zoltan Menyhart
2007-07-05 17:36     ` Mike Stroyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46305A8D.2080003@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@linux-foundation.org \
    --cc=hugh@veritas.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mike.stroyan@hp.com \
    --cc=tony.luck@intel.com \
    --subject='Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).