LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@qumranet.com>
To: Jack Steiner <steiner@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>, Robin Holt <holt@sgi.com>,
	Avi Kivity <avi@qumranet.com>, Izik Eidus <izike@qumranet.com>,
	Nick Piggin <npiggin@suse.de>,
	kvm-devel@lists.sourceforge.net,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	daniel.blueman@quadrics.com, Hugh Dickins <hugh@veritas.com>
Subject: Re: [patch 1/6] mmu_notifier: Core code
Date: Wed, 30 Jan 2008 17:38:45 +0100	[thread overview]
Message-ID: <20080130163845.GO7233@v2.random> (raw)
In-Reply-To: <20080130155306.GA13746@sgi.com>

On Wed, Jan 30, 2008 at 09:53:06AM -0600, Jack Steiner wrote:
> That will also resolve the problem we discussed yesterday. 
> I want to unregister my mmu_notifier when a GRU segment is
> unmapped. This would not necessarily be at task termination.

My proof that there is something wrong in the smp locking of the
current code is very simple: it can't be right to use
hlist_for_each_entry_safe_rcu and rcu_read_lock inside
mmu_notifier_release, and then to call hlist_del_rcu without any
spinlock or semaphore. If we walk the list with
hlist_for_each_entry_safe_rcu (and not with
hlist_for_each_entry_safe), it means the list _can_ change from under
us, and in turn the hlist_del_rcu must be surrounded by a spinlock or
sempahore too!

If by design the list _can't_ change from under us and calling
hlist_del_rcu was safe w/o locks, then hlist_for_each_entry_safe is
_sure_ enough for mmu_notifier_release, and rcu_read_lock most
certainly can be removed too.

To make an usage case where the race could trigger, I was thinking at
somebody bumping the mm_count (not mm_users) and registering a
notifier while mmu_notifier_release runs and relaying on ->release to
know if it has to run mmu_notifier_unregister. However I now started
wondering how it can relay on ->release to know that if ->release is
called after hlist_del_rcu because with the latest changes ->release
will also allow the mn to release itself ;). It's unsafe to call
list_del_rcu twice (the second will crash on a poisoned entry).

This starts to make me think we should remove the auto-disarming
feature and require the notifier-user to have the ->release call
mmu_notifier_unregister first and to free the "mn" inside ->release
too if needed. Or alternatively the notifier-user can bump mm_count
and to call a mmu_notifier_unregister before calling mmdrop (like kvm
could do).

Another approach is to simply define mmu_notifier_release as
implicitly serialized by other code design, with a real lock (not rcu)
against the whole register/unregister operations. So to guarantee the
notifier list can't change from under us while mmu_notifier_release
runs. If we go this route, yes, the auto-disarming hlist_del can be
kept, the current code would have been safe, but to avoid confusion
the mmu_notifier_release shall become this:

void mmu_notifier_release(struct mm_struct *mm)
{
	struct mmu_notifier *mn;
	struct hlist_node *n, *t;

	if (unlikely(!hlist_empty(&mm->mmu_notifier.head))) {
		hlist_for_each_entry_safe(mn, n, t,
					  &mm->mmu_notifier.head, hlist) {
			hlist_del(&mn->hlist);
			if (mn->ops->release)
				mn->ops->release(mn, mm);
		}
	}
}

> However, the mmap_sem is already held for write by the core
> VM at the point I would call the unregister function.
> Currently, there is no __mmu_notifier_unregister() defined.
> 
> Moving to a different lock solves the problem.

Unless the mmu_notifier_release becomes like above and we rely on the
user of the mmu notifiers to implement a highlevel external lock that
will we definitely forbid to bump the mm_count of the mm, and to call
register/unregister while mmu_notifier_release could run, 1) moving to a
different lock and 2) removing the auto-disarming hlist_del_rcu from
mmu_notifier_release sounds the only possible smp safe way.

As far as KVM is concerned mmu_notifier_released could be changed to
the version I written above and everything should be ok. For KVM the
mm_count bump is done by the task that also holds a mm_user, so when
exit_mmap runs I don't think the list could possible change anymore.

Anyway those are details that can be perfected after mainline merging,
so this isn't something to worry about too much right now. My idea is
to keep working to perfect it while I hope progress is being made by
Christoph to merge the mmu notifiers V3 patchset in mainline ;).

  reply	other threads:[~2008-01-30 16:39 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-30  2:29 [patch 0/6] [RFC] MMU Notifiers V3 Christoph Lameter
2008-01-30  2:29 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-01-30 15:37   ` Andrea Arcangeli
2008-01-30 15:53     ` Jack Steiner
2008-01-30 16:38       ` Andrea Arcangeli [this message]
2008-01-30 19:19       ` Christoph Lameter
2008-01-30 22:20         ` Robin Holt
2008-01-30 23:38           ` Andrea Arcangeli
2008-01-30 23:55             ` Christoph Lameter
2008-01-31  0:12               ` [kvm-devel] " Andrea Arcangeli
2008-01-31  1:27                 ` Christoph Lameter
2008-01-30 17:10     ` Peter Zijlstra
2008-01-30 19:28       ` Christoph Lameter
2008-01-30 18:02   ` Robin Holt
2008-01-30 19:08     ` Christoph Lameter
2008-01-30 19:14     ` Christoph Lameter
2008-01-30  2:29 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-01-30  2:29 ` [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap Christoph Lameter
2008-01-30 18:03   ` Robin Holt
2008-01-30  2:29 ` [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps Christoph Lameter
2008-01-30  2:29 ` [patch 5/6] mmu_notifier: Callbacks for xip_filemap.c Christoph Lameter
2008-01-30  2:29 ` [patch 6/6] mmu_notifier: Add invalidate_all() Christoph Lameter
  -- strict thread matches above, loose matches on Subject: below --
2008-02-15  6:48 [patch 0/6] MMU Notifiers V7 Christoph Lameter
2008-02-15  6:49 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-02-16  3:37   ` Andrew Morton
2008-02-16  8:45     ` Avi Kivity
2008-02-16  8:56       ` Andrew Morton
2008-02-16  9:21         ` Avi Kivity
2008-02-16 10:41     ` Brice Goglin
2008-02-16 10:58       ` Andrew Morton
2008-02-16 19:31         ` Christoph Lameter
2008-02-16 19:21     ` Christoph Lameter
2008-02-17  3:01       ` Andrea Arcangeli
2008-02-17 12:24         ` Robin Holt
2008-02-17  5:04     ` Doug Maxey
2008-02-18 22:33   ` Roland Dreier
2008-02-08 22:06 [patch 0/6] MMU Notifiers V6 Christoph Lameter
2008-02-08 22:06 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-01-28 20:28 [patch 0/6] [RFC] MMU Notifiers V2 Christoph Lameter
2008-01-28 20:28 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-01-28 22:06   ` Christoph Lameter
2008-01-29  0:05   ` Robin Holt
2008-01-29  1:19     ` Christoph Lameter
2008-01-29 13:59   ` Andrea Arcangeli
2008-01-29 14:34     ` Andrea Arcangeli
2008-01-29 19:49     ` Christoph Lameter
2008-01-29 20:41       ` Avi Kivity
2008-01-29 16:07   ` Robin Holt
2008-02-05 18:05   ` Andy Whitcroft
2008-02-05 18:17     ` Peter Zijlstra
2008-02-05 18:19     ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080130163845.GO7233@v2.random \
    --to=andrea@qumranet.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=avi@qumranet.com \
    --cc=benh@kernel.crashing.org \
    --cc=clameter@sgi.com \
    --cc=daniel.blueman@quadrics.com \
    --cc=holt@sgi.com \
    --cc=hugh@veritas.com \
    --cc=izike@qumranet.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=steiner@sgi.com \
    --subject='Re: [patch 1/6] mmu_notifier: Core code' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).