LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Anton Salikhmetov" <salikhmetov@gmail.com>
To: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Miklos Szeredi" <miklos@szeredi.hu>,
	peterz@infradead.org, linux-mm@kvack.org, jakob@unthought.net,
	linux-kernel@vger.kernel.org, valdis.kletnieks@vt.edu,
	riel@redhat.com, ksm@42.dk, staubach@redhat.com,
	jesper.juhl@gmail.com, akpm@linux-foundation.org,
	protasnb@gmail.com, r.e.wolff@bitwizard.nl,
	hidave.darkstar@gmail.com, hch@infradead.org
Subject: Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files
Date: Sat, 19 Jan 2008 01:35:25 +0300	[thread overview]
Message-ID: <4df4ef0c0801181435y67fee713h83f8e7f2a5b4e803@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.1.00.0801181406580.2957@woody.linux-foundation.org>

2008/1/19, Linus Torvalds <torvalds@linux-foundation.org>:
>
>
> On Sat, 19 Jan 2008, Anton Salikhmetov wrote:
> >
> > The page_check_address() function is called from the
> > page_mkclean_one() routine as follows:
>
> .. and the page_mkclean_one() function is totally different.
>
> Lookie here, this is the correct and complex sequence:
>
> >                 entry = ptep_clear_flush(vma, address, pte);
> >                 entry = pte_wrprotect(entry);
> >                 entry = pte_mkclean(entry);
> >                 set_pte_at(mm, address, pte, entry);
>
> That's a rather expensive sequence, but it's done exactly because it has
> to be done that way. What it does is to
>
>  - *atomically* load the pte entry _and_ clear the old one in memory.
>
>    That's the
>
>         entry = ptep_clear_flush(vma, address, pte);
>
>    thing, and it basically means that it's doing some
>    architecture-specific magic to make sure that another CPU that accesses
>    the PTE at the same time will never actually modify the pte (because
>    it's clear and not valid)
>
>  - it then - while the page table is actually clear and invalid - takes
>    the old value and turns it into the new one:
>
>         entry = pte_wrprotect(entry);
>         entry = pte_mkclean(entry);
>
>  - and finally, it replaces the entry with the new one:
>
>         set_pte_at(mm, address, pte, entry);
>
>    which takes care to write the new entry in some specific way that is
>    atomic wrt other CPU's (ie on 32-bit x86 with a 64-bit page table
>    entry it writes the high word first, see the write barriers in
>    "native_set_pte()" in include/asm-x86/pgtable-3level.h
>
> Now, compare that subtle and correct thing with what is *not* correct:
>
>         if (pte_dirty(*pte) && pte_write(*pte))
>                 *pte = pte_wrprotect(*pte);
>
> which makes no effort at all to make sure that it's safe in case another
> CPU updates the accessed bit.
>
> Now, arguably it's unlikely to cause horrible problems at least on x86,
> because:
>
>  - we only do this if the pte is already marked dirty, so while we can
>    lose the accessed bit, we can *not* lose the dirty bit. And the
>    accessed bit isn't such a big deal.
>
>  - it's not doing any of the "be careful about" ordering things, but since
>    the really important bits aren't changing, ordering probably won't
>    practically matter.
>
> But the problem is that we have something like 24 different architectures,
> it's hard to make sure that none of them have issues.
>
> In other words: it may well work in practice. But when these things go
> subtly wrong, they are *really* nasty to find, and the unsafe sequence is
> really not how it's supposed to be done. For example, you don't even flush
> the TLB, so even if there are no cross-CPU issues, there's probably going
> to be writable entries in the TLB that now don't match the page tables.
>
> Will it matter? Again, probably impossible to see in practice. But ...

Linus, I am very grateful to you for your extremely clear explanation
of the issue I have overlooked!

Back to the msync() issue, I'm going to come back with a new design
for the bug fix.

Thank you once again.

Anton

>
>                 Linus
>

  reply	other threads:[~2008-01-18 22:35 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-17 22:31 [PATCH -v6 0/2] Fixing the issue with memory-mapped file times Anton Salikhmetov
2008-01-17 22:31 ` [PATCH -v6 1/2] Massive code cleanup of sys_msync() Anton Salikhmetov
2008-01-18  9:33   ` Miklos Szeredi
2008-01-18 10:30     ` Anton Salikhmetov
2008-01-17 22:31 ` [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files Anton Salikhmetov
2008-01-18  9:51   ` Miklos Szeredi
2008-01-18 10:15     ` Peter Zijlstra
2008-01-18 10:25       ` Peter Zijlstra
2008-01-18 10:39         ` Anton Salikhmetov
2008-01-18 17:58         ` Linus Torvalds
2008-01-18 18:11           ` Miklos Szeredi
2008-01-18 18:28             ` Rik van Riel
2008-01-18 18:51               ` Miklos Szeredi
2008-01-18 18:43             ` Linus Torvalds
2008-01-18 18:57               ` Miklos Szeredi
2008-01-18 19:08                 ` Linus Torvalds
2008-01-18 19:22                   ` Miklos Szeredi
2008-01-18 19:35                     ` Linus Torvalds
2008-01-18 19:58                       ` Anton Salikhmetov
2008-01-18 20:22                         ` Linus Torvalds
2008-01-18 21:03                           ` Anton Salikhmetov
2008-01-18 21:27                             ` Linus Torvalds
2008-01-18 22:04                               ` Anton Salikhmetov
2008-01-18 22:21                                 ` Linus Torvalds
2008-01-18 22:35                                   ` Anton Salikhmetov [this message]
2008-01-18 22:32                       ` Ingo Oeser
2008-01-18 22:47                         ` Linus Torvalds
2008-01-18 22:54                           ` Rik van Riel
2008-01-19  0:50                             ` Matt Mackall
2008-01-19  4:25                               ` Rik van Riel
2008-01-19 10:22                               ` Miklos Szeredi
2008-01-19 15:49                                 ` Matt Mackall
2008-01-21 14:25                           ` Peter Staubach
2008-01-21 14:36                             ` Anton Salikhmetov
2008-01-18 10:38       ` Miklos Szeredi
2008-01-18 11:00         ` Peter Zijlstra
2008-01-18 11:17           ` Miklos Szeredi
2008-01-18 11:23             ` Peter Zijlstra
2008-01-18 11:36               ` Miklos Szeredi
2008-01-18  9:40 ` [PATCH -v6 0/2] Fixing the issue with memory-mapped file times Miklos Szeredi
2008-01-18 10:31   ` Anton Salikhmetov
2008-01-18 19:48   ` Anton Salikhmetov
2008-01-19 10:45     ` Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4df4ef0c0801181435y67fee713h83f8e7f2a5b4e803@mail.gmail.com \
    --to=salikhmetov@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=hidave.darkstar@gmail.com \
    --cc=jakob@unthought.net \
    --cc=jesper.juhl@gmail.com \
    --cc=ksm@42.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=peterz@infradead.org \
    --cc=protasnb@gmail.com \
    --cc=r.e.wolff@bitwizard.nl \
    --cc=riel@redhat.com \
    --cc=staubach@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valdis.kletnieks@vt.edu \
    --subject='Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).