Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Jann Horn <jannh@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Christoph Hellwig <hch@lst.de>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org,
Alexander Viro <viro@zeniv.linux.org.uk>,
"Eric W . Biederman" <ebiederm@xmission.com>,
Oleg Nesterov <oleg@redhat.com>,
Russell King <linux@armlinux.org.uk>,
linux-arm-kernel@lists.infradead.org,
Mark Salter <msalter@redhat.com>,
Aurelien Jacquiot <jacquiot.aurelien@gmail.com>,
linux-c6x-dev@linux-c6x.org,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Rich Felker <dalias@libc.org>,
linux-sh@vger.kernel.org
Subject: Re: [PATCH v2 4/5] binfmt_elf, binfmt_elf_fdpic: Use a VMA list snapshot
Date: Tue, 5 May 2020 13:03:58 +0200 [thread overview]
Message-ID: <20200505110358.GC17400@lst.de> (raw)
In-Reply-To: <20200429214954.44866-5-jannh@google.com>
On Wed, Apr 29, 2020 at 11:49:53PM +0200, Jann Horn wrote:
> In both binfmt_elf and binfmt_elf_fdpic, use a new helper
> dump_vma_snapshot() to take a snapshot of the VMA list (including the gate
> VMA, if we have one) while protected by the mmap_sem, and then use that
> snapshot instead of walking the VMA list without locking.
>
> An alternative approach would be to keep the mmap_sem held across the
> entire core dumping operation; however, keeping the mmap_sem locked while
> we may be blocked for an unbounded amount of time (e.g. because we're
> dumping to a FUSE filesystem or so) isn't really optimal; the mmap_sem
> blocks things like the ->release handler of userfaultfd, and we don't
> really want critical system daemons to grind to a halt just because someone
> "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or something
> like that.
>
> Since both the normal ELF code and the FDPIC ELF code need this
> functionality (and if any other binfmt wants to add coredump support in the
> future, they'd probably need it, too), implement this with a common helper
> in fs/coredump.c.
>
> A downside of this approach is that we now need a bigger amount of kernel
> memory per userspace VMA in the normal ELF case, and that we need O(n)
> kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't
> be terribly bad.
>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> fs/binfmt_elf.c | 152 +++++++++++++--------------------------
> fs/binfmt_elf_fdpic.c | 86 ++++++++++------------
> fs/coredump.c | 68 ++++++++++++++++++
> include/linux/coredump.h | 10 +++
> 4 files changed, 168 insertions(+), 148 deletions(-)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index fb36469848323..dffe9dc8497ca 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1292,8 +1292,12 @@ static bool always_dump_vma(struct vm_area_struct *vma)
> return false;
> }
>
> +#define DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER 1
> +
> /*
> * Decide what to dump of a segment, part, all or none.
> + * The result must be fixed up via vma_dump_size_fixup() once we're in a context
> + * that's allowed to sleep arbitrarily long.
> */
> static unsigned long vma_dump_size(struct vm_area_struct *vma,
> unsigned long mm_flags)
> @@ -1348,30 +1352,15 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
>
> /*
> * If this looks like the beginning of a DSO or executable mapping,
> - * check for an ELF header. If we find one, dump the first page to
> - * aid in determining what was mapped here.
> + * we'll check for an ELF header. If we find one, we'll dump the first
> + * page to aid in determining what was mapped here.
> + * However, we shouldn't sleep on userspace reads while holding the
> + * mmap_sem, so we just return a placeholder for now that will be fixed
> + * up later in vma_dump_size_fixup().
> */
> if (FILTER(ELF_HEADERS) &&
> - vma->vm_pgoff == 0 && (vma->vm_flags & VM_READ)) {
> - u32 __user *header = (u32 __user *) vma->vm_start;
> - u32 word;
> - /*
> - * Doing it this way gets the constant folded by GCC.
> - */
> - union {
> - u32 cmp;
> - char elfmag[SELFMAG];
> - } magic;
> - BUILD_BUG_ON(SELFMAG != sizeof word);
> - magic.elfmag[EI_MAG0] = ELFMAG0;
> - magic.elfmag[EI_MAG1] = ELFMAG1;
> - magic.elfmag[EI_MAG2] = ELFMAG2;
> - magic.elfmag[EI_MAG3] = ELFMAG3;
> - if (unlikely(get_user(word, header)))
> - word = 0;
> - if (word == magic.cmp)
> - return PAGE_SIZE;
> - }
> + vma->vm_pgoff == 0 && (vma->vm_flags & VM_READ))
> + return DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER;
>
> #undef FILTER
>
> @@ -1381,6 +1370,22 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
> return vma->vm_end - vma->vm_start;
> }
>
> +/* Fix up the result from vma_dump_size(), now that we're allowed to sleep. */
> +static void vma_dump_size_fixup(struct core_vma_metadata *meta)
> +{
> + char elfmag[SELFMAG];
> +
> + if (meta->dump_size != DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER)
> + return;
> +
> + if (copy_from_user(elfmag, (void __user *)meta->start, SELFMAG)) {
> + meta->dump_size = 0;
> + return;
> + }
> + meta->dump_size =
> + (memcmp(elfmag, ELFMAG, SELFMAG) == 0) ? PAGE_SIZE : 0;
> +}
While this code looks entirely correct, it took me way too long to
follow. I'd take te DUMP_SIZE_MAYBE_ELFHDR_PLACEHOLDER into the caller,
and then have a simple function like this:
static void vma_dump_size_fixup(struct core_vma_metadata *meta)
{
char elfmag[SELFMAG];
if (copy_from_user(elfmag, (void __user *)meta->start, SELFMAG) ||
memcmp(elfmag, ELFMAG, sizeof(elfmag)) != 0)
meta->dump_size = 0;
else
meta->dump_size = PAGE_SIZE;
}
Also a few comments explaining why we do this fixup would help readers
of the code.
> - if (vma->vm_flags & VM_WRITE)
> - phdr.p_flags |= PF_W;
> - if (vma->vm_flags & VM_EXEC)
> - phdr.p_flags |= PF_X;
> + phdr.p_flags = meta->flags & VM_READ ? PF_R : 0;
> + phdr.p_flags |= meta->flags & VM_WRITE ? PF_W : 0;
> + phdr.p_flags |= meta->flags & VM_EXEC ? PF_X : 0;
Sorry for another nitpick, but I find the spelled out version with the
if a lot easier to read.
> +int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count,
> + struct core_vma_metadata **vma_meta,
> + unsigned long (*dump_size_cb)(struct vm_area_struct *, unsigned long))
Plase don't use single tabs for indentating parameter continuations
(we actually have two styles - either two tabs or aligned after the
opening brace, pick your poison :))
> + *vma_meta = kvmalloc_array(*vma_count, sizeof(**vma_meta), GFP_KERNEL);
> + if (!*vma_meta) {
> + up_read(&mm->mmap_sem);
> + return -ENOMEM;
> + }
> +
> + for (i = 0, vma = first_vma(current, gate_vma); vma != NULL;
> + vma = next_vma(vma, gate_vma)) {
> + (*vma_meta)[i++] = (struct core_vma_metadata) {
> + .start = vma->vm_start,
> + .end = vma->vm_end,
> + .flags = vma->vm_flags,
> + .dump_size = dump_size_cb(vma, cprm->mm_flags)
> + };
This looks a little weird. Why not kcalloc + just initialize the four
fields we actually fill out here?
next prev parent reply other threads:[~2020-05-05 11:04 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-29 21:49 [PATCH v2 0/5] Fix ELF / FDPIC ELF core dumping, and use mmap_sem properly in there Jann Horn
2020-04-29 21:49 ` [PATCH v2 1/5] binfmt_elf_fdpic: Stop using dump_emit() on user pointers on !MMU Jann Horn
2020-05-05 10:48 ` Christoph Hellwig
2020-05-05 11:42 ` Jann Horn
2020-05-05 12:15 ` Christoph Hellwig
2020-08-11 3:05 ` Jann Horn
2020-04-29 21:49 ` [PATCH v2 2/5] coredump: Let dump_emit() bail out on short writes Jann Horn
2020-04-29 21:49 ` [PATCH v2 3/5] coredump: Refactor page range dumping into common helper Jann Horn
2020-05-05 10:50 ` Christoph Hellwig
2020-05-05 11:44 ` Jann Horn
2020-04-29 21:49 ` [PATCH v2 4/5] binfmt_elf, binfmt_elf_fdpic: Use a VMA list snapshot Jann Horn
2020-05-05 11:03 ` Christoph Hellwig [this message]
2020-05-05 12:11 ` Jann Horn
2020-04-29 21:49 ` [PATCH v2 5/5] mm/gup: Take mmap_sem in get_dump_page() Jann Horn
2020-04-29 21:56 ` [PATCH v2 0/5] Fix ELF / FDPIC ELF core dumping, and use mmap_sem properly in there Russell King - ARM Linux admin
2020-04-29 23:03 ` Linus Torvalds
2020-04-30 1:27 ` Nicolas Pitre
2020-04-30 14:10 ` Greg Ungerer
2020-04-30 14:51 ` Rich Felker
2020-04-30 21:13 ` Rob Landley
2020-05-01 6:00 ` Greg Ungerer
2020-05-01 19:09 ` Rob Landley
2020-04-30 16:54 ` Linus Torvalds
2020-04-30 19:07 ` Eric W. Biederman
2020-05-01 5:44 ` Greg Ungerer
2020-05-01 11:13 ` Eric W. Biederman
2020-05-01 7:14 ` Greg Ungerer
2020-04-30 1:59 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200505110358.GC17400@lst.de \
--to=hch@lst.de \
--cc=akpm@linux-foundation.org \
--cc=dalias@libc.org \
--cc=ebiederm@xmission.com \
--cc=jacquiot.aurelien@gmail.com \
--cc=jannh@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-c6x-dev@linux-c6x.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-sh@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=msalter@redhat.com \
--cc=oleg@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=ysato@users.sourceforge.jp \
--subject='Re: [PATCH v2 4/5] binfmt_elf, binfmt_elf_fdpic: Use a VMA list snapshot' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).