LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Alexander Mihalicyn <alexander@mihalicyn.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, Milton Miller <miltonm@bga.com>,
	Jack Miller <millerjo@us.ibm.com>,
	Pavel Tikhomirov <ptikhomirov@virtuozzo.com>,
	Manfred Spraul <manfred@colorfullife.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@suse.com>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Andrei Vagin <avagin@gmail.com>,
	Christian Brauner <christian@brauner.io>
Subject: Re: [PATCH 0/2] shm: omit forced shm destroy if task IPC namespace was changed
Date: Sat, 10 Jul 2021 13:55:33 +0300	[thread overview]
Message-ID: <CAJqdLrpx+xEMGQLZo7jS5BTAw-k2sWPrv9fCt0x8t=6Nbn7u+w@mail.gmail.com> (raw)
In-Reply-To: <20210709181241.cca57cf83c52964b2cd0dcf0@linux-foundation.org>

On Sat, Jul 10, 2021 at 4:12 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue,  6 Jul 2021 16:22:57 +0300 Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> wrote:
>
> > Hello,
> >
> > Task IPC namespace shm's has shm_rmid_forced feature which is per IPC namespace
> > and controlled by kernel.shm_rmid_forced sysctl. When feature is turned on,
> > then during task exit (and unshare(CLONE_NEWIPC)) all sysvshm's will be destroyed
> > by exit_shm(struct task_struct *task) function. But there is a problem if task
> > was changed IPC namespace since shmget() call. In such situation exit_shm() function
> > will try to call
> > shm_destroy(<new_ipc_namespace_ptr>, <sysvshmem_from_old_ipc_namespace>)
> > which leads to the situation when sysvshm object still attached to old
> > IPC namespace but freed; later during old IPC namespace cleanup we will try to
> > free such sysvshm object for the second time and will get the problem :)
> >
> > First patch solves this problem by postponing shm_destroy to the moment when
> > IPC namespace cleanup will be called.
> > Second patch is useful to prevent (or easy catch) such bugs in the future by
> > adding corresponding WARNings.
> >

Andrew,

thanks for your attention to the patches!

>
> (cc's added)
>
> I assume that a
>
> Fixes: b34a6b1da371ed8af ("ipc: introduce shm_rmid_forced sysctl") is
> appropriate here?

Really not, this patch looks fully safe because it always takes
objects to free from
concrete IPC namespace idr with appropriate locking. For example
/* Destroy all already created segments, but not mapped yet */
down_write(&shm_ids(ns).rw_mutex);
idr_for_each(&shm_ids(ns).ipcs_idr, &shm_try_destroy_current, ns);
up_write(&shm_ids(ns).rw_mutex);
here we take ns from current->nsproxy, lock idr, and destroy only shms
from this particular IPC ns.

But after b34a6b1da ("shm: make exit_shm work proportional to task
activity") we introduced
new field (struct sysv_shm sysvshm) on task_struct. exit_shm()
function was changed so:

list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist)
    shm_mark_orphan(shp, ns);

instead of previous idr_for_each(&shm_ids(ns).ipcs_idr,
&shm_try_destroy_current, ns);

Now, using setns() syscall, we can construct situation when on
task->sysvshm.shm_clist list
we have shm items from several (!) IPC namespaces. Before this patch
we always destroying
shmems only from current->nsproxy->ipc_ns, but now we can do something
like this:

shmget(0xAAAA, 4096, IPC_CREAT|0700); <-- add item to task->sysvshm.shm_clist
setns(fd, CLONE_NEWIPC);
shmget(0xAAAA, 4096, IPC_CREAT|0700); <-- add item to task->sysvshm.shm_clist
*now we have two items in task->sysvshm.shm_clist but from different
IPC namespaces*

(I've reproducer program and can send it privately)

That's a problem. If we take a look on
int ksys_unshare(unsigned long unshare_flags)

we can see following part:

if (unshare_flags & CLONE_NEWIPC) {
   /* Orphan segments in old ns (see sem above). */
   exit_shm(current);
   shm_init_task(current);
}

here we cleaning up this list BEFORE changing IPC namespace. So, if we
do something like:
shmget(0xAAAA, 4096, IPC_CREAT|0700); <-- add item to task->sysvshm.shm_clist
unshare(CLONE_NEWIPC); <-- task->sysvshm.shm_clist is cleaned and
reinitialized here
shmget(0xAAAA, 4096, IPC_CREAT|0700); <-- add item to task->sysvshm.shm_clist

and all fine!

So, semantics of setns() and unshare() is different here. We can fix
this problem by adding
analogical calls to exit_shm(), shm_init_task() into

static void commit_nsset(struct nsset *nsset)
...
#ifdef CONFIG_IPC_NS
if (flags & CLONE_NEWIPC) {
     exit_sem(me);
+   shm_init_task(current);
+  exit_shm(current);
}
#endif

with this change semantics of unshare() and setns() will be equal in
terms of the shm_rmid_forced
feature. But this may break some applications which using setns() and
IPC resources not destroying
after that syscall. (CRIU using setns() extensively and we have to
change our IPC ns C/R implementation
a little bit if we take this way of fixing the problem).

I've proposed a change which keeps the old behaviour of setns() but
fixes double free.

>
> A double-free is serious.  Should this fix be backported into earlier
> kernels?

Yes, the IPC namespace code was not changed seriously, so it means
that we can easily apply this patch to older kernels.
(I've CCed stable lists in the patch where the fix was)

CCed: Andrei Vagin and Christian Brauner

Regards,
Alex

  reply	other threads:[~2021-07-10 11:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-06 13:22 Alexander Mikhalitsyn
2021-07-06 13:22 ` [PATCH 1/2] shm: skip shm_destroy " Alexander Mikhalitsyn
2021-07-06 13:22 ` [PATCH 2/2] ipc: WARN if trying to remove ipc object which is absent Alexander Mikhalitsyn
2021-07-10  1:12 ` [PATCH 0/2] shm: omit forced shm destroy if task IPC namespace was changed Andrew Morton
2021-07-10 10:55   ` Alexander Mihalicyn [this message]
     [not found]     ` <CALgW_8VUk0us_umLncUv2DUMkOi3qixmT+YkHV4Dhpt_nNMZHw@mail.gmail.com>
2021-07-11 10:33       ` Alexander Mihalicyn
2021-07-11 11:46         ` Manfred Spraul
2021-07-12  9:54           ` Alexander Mihalicyn
2021-07-12 19:18             ` Eric W. Biederman
2021-07-12 19:27               ` Alexander Mihalicyn
2021-07-14 17:30                 ` [RFC PATCH] shm: extend forced shm destroy to support objects from several IPC nses Alexander Mikhalitsyn
2021-07-21  6:32                   ` Manfred Spraul
2021-07-22 18:46                   ` Manfred Spraul
2021-07-30 14:52                     ` Alexander Mihalicyn
2021-07-14 17:42                 ` [PATCH 0/2] shm: omit forced shm destroy if task IPC namespace was changed Alexander Mihalicyn
2021-09-23 16:36               ` Manfred Spraul
2021-09-24 16:45                 ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJqdLrpx+xEMGQLZo7jS5BTAw-k2sWPrv9fCt0x8t=6Nbn7u+w@mail.gmail.com' \
    --to=alexander@mihalicyn.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=christian@brauner.io \
    --cc=dave@stgolabs.net \
    --cc=ebiederm@xmission.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mhocko@suse.com \
    --cc=millerjo@us.ibm.com \
    --cc=miltonm@bga.com \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=vdavydov.dev@gmail.com \
    --subject='Re: [PATCH 0/2] shm: omit forced shm destroy if task IPC namespace was changed' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).