Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Kees Cook <keescook@chromium.org>
To: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>,
	Will Drewry <wad@chromium.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	bpf@vger.kernel.org, alaaemadhossney.ae@gmail.com,
	syzkaller@googlegroups.com, Jann Horn <jannh@google.com>,
	Tycho Andersen <tycho@tycho.pizza>,
	Sargun Dhillon <sargun@sargun.me>,
	Christian Brauner <christian.brauner@ubuntu.com>
Subject: Re: memory leak in do_seccomp
Date: Sat, 31 Jul 2021 20:25:58 -0700	[thread overview]
Message-ID: <202107311901.8CDF235F65@keescook> (raw)
In-Reply-To: <CADVatmPShADZ0F133eS3KjeKj1ZjTNAQfy_QOoJVBan02wuR+Q@mail.gmail.com>

On Sat, Jul 31, 2021 at 08:20:29PM +0100, Sudip Mukherjee wrote:
> Hi All,
> 
> We had been running syzkaller on v5.10.y and a "memory leak in
> do_seccomp" was being reported on it. I got some time to check that
> today and have managed to get a syzkaller
> reproducer. I dont have a C reproducer which I can share but I can use
> the syz-reproducer to reproduce this with next-20210730.
> The old report on v5.10.y is at
> https://elisa-builder-00.iol.unh.edu/syzkaller/report?id=f6ddd3b592f00e95f9cbd2e74f70a5b04b015c6f

Thanks for the details!

Is this the same as what syzbot saw here (with a C reproducer)?
https://syzkaller.appspot.com/bug?id=2809bb0ac77ad9aa3f4afe42d6a610aba594a987

I can't figure out what happened with the "Patch testing request" that
was made; there's no link?

> 
> BUG: memory leak
> unreferenced object 0xffff888019282c00 (size 512):
>   comm "syz-executor.1", pid 7389, jiffies 4294761829 (age 17.841s)
>   hex dump (first 32 bytes):
>     01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<00000000762c0963>] do_seccomp+0x2d5/0x27d0

Can you run "./scripts/faddr2line do_seccomp+0x2d5/0x27d0" for this? I
expect it'll be:
        sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);

>     [<0000000006e512d1>] do_syscall_64+0x3b/0x90
>     [<0000000094ae9ff8>] entry_SYSCALL_64_after_hwframe+0x44/0xae

The "size 512" in your v5.10.y report is from seccomp_prepare_filter()
(noted above). seccomp_prepare_filter() cleans up its error paths.

> 
> BUG: memory leak
> unreferenced object 0xffffc900006b5000 (size 4096):
>   comm "syz-executor.1", pid 7389, jiffies 4294761829 (age 17.841s)
>   hex dump (first 32 bytes):
>     01 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<00000000854901e5>] __vmalloc_node_range+0x550/0x9a0
>     [<000000002686628f>] __vmalloc_node+0xb5/0x100
>     [<0000000004cbd298>] bpf_prog_alloc_no_stats+0x38/0x350
>     [<0000000009149728>] bpf_prog_alloc+0x24/0x170
>     [<000000000fe7f1e7>] bpf_prog_create_from_user+0xad/0x2e0
>     [<000000000c70eb02>] do_seccomp+0x325/0x27d0
>     [<0000000006e512d1>] do_syscall_64+0x3b/0x90
>     [<0000000094ae9ff8>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Again, I'm curious about where do_seccomp+0x325/0x27d0 is for this, but
the matching one in v5.10 shows:
        ret = bpf_prog_create_from_user(&sfilter->prog, fprog,
                                        seccomp_check_filter, save_orig);

This and everything remaining below else has bpf_prog_create_from_user()
in the allocation path.

> 
> BUG: memory leak
> unreferenced object 0xffff888026eb1000 (size 2048):
>   comm "syz-executor.1", pid 7389, jiffies 4294761829 (age 17.842s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<0000000072de7240>] bpf_prog_alloc_no_stats+0xeb/0x350
>     [<0000000009149728>] bpf_prog_alloc+0x24/0x170
>     [<000000000fe7f1e7>] bpf_prog_create_from_user+0xad/0x2e0
>     [<000000000c70eb02>] do_seccomp+0x325/0x27d0
>     [<0000000006e512d1>] do_syscall_64+0x3b/0x90
>     [<0000000094ae9ff8>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> BUG: memory leak
> unreferenced object 0xffff888014dddac0 (size 16):
>   comm "syz-executor.1", pid 7389, jiffies 4294761829 (age 17.842s)
>   hex dump (first 16 bytes):
>     01 00 ca 08 80 88 ff ff c8 ef df 14 80 88 ff ff  ................

These are two kernel pointers:
	0xffff888008ca0001 (unaligned by 1 byte?!)
	0xffff888014dfefc8

Ah, no, this is from:

struct sock_fprog_kern {
        u16                     len;
        struct sock_filter      *filter;
};

The "ca 08 80 88 ff ff" bytes are uninitialized padding. ;) "len" has
a value of 1 (which matches the syzkaller reproducer args below of a
single BPF instruction).

        fp->orig_prog = kmalloc(sizeof(*fkprog), GFP_KERNEL);
        if (!fp->orig_prog)
                return -ENOMEM;

        fkprog = fp->orig_prog;
        fkprog->len = fprog->len;
	...
        fkprog->filter = kmemdup(fp->insns, fsize,
                                 GFP_KERNEL | __GFP_NOWARN);

>   backtrace:
>     [<00000000c5d4ed93>] bpf_prog_store_orig_filter+0x7b/0x1e0
>     [<000000007cb21c2a>] bpf_prog_create_from_user+0x1c6/0x2e0
>     [<000000000c70eb02>] do_seccomp+0x325/0x27d0
>     [<0000000006e512d1>] do_syscall_64+0x3b/0x90
>     [<0000000094ae9ff8>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> BUG: memory leak
> unreferenced object 0xffff888014dfefc8 (size 8):
>   comm "syz-executor.1", pid 7389, jiffies 4294761829 (age 17.842s)
>   hex dump (first 8 bytes):
>     06 00 00 00 ff ff ff 7f                          ........

This contains a userspace (likely stack) pointer, and is referenced
by the second pointer above. (i.e. kmemdup() above, but how have the
contents become a user stack pointer?)

>   backtrace:
>     [<00000000ee5550f8>] kmemdup+0x23/0x50
>     [<00000000f1acd067>] bpf_prog_store_orig_filter+0x103/0x1e0
>     [<000000007cb21c2a>] bpf_prog_create_from_user+0x1c6/0x2e0
>     [<000000000c70eb02>] do_seccomp+0x325/0x27d0
>     [<0000000006e512d1>] do_syscall_64+0x3b/0x90
>     [<0000000094ae9ff8>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Not sure if this has been already reported or not, but I will be happy
> to test if you have a fix for this.

I was suspecting a missing error path free near bpf_prepare_filter()
as called by bpf_prog_create_from_user() here:

        /* bpf_prepare_filter() already takes care of freeing
         * memory in case something goes wrong.
         */
        fp = bpf_prepare_filter(fp, trans);
        if (IS_ERR(fp))
                return PTR_ERR(fp);

Since only seccomp and af_packet use bpf_prog_create_from_user(),
and af_packet sets neither a "trans" callback nor save_orig. But if
"trans" fails (due to some BPF instructions seccomp doesn't support),
I'd expect this leak to be detected more often.

bpf_prepare_filter() is documented as cleaning up allocations on failure,
though I notice its cleanup differs from bpf_prog_create_from_user()'s,
which uses __bpf_prog_free() instead of __bfp_prog_release(). But
that should only make a difference for orig_prog getting freed,
and bpf_prog_store_orig_filter() should already be freeing that on
failures too.

Similarly, bpf_migrate_filter() cleanups up on failure too, so this
doesn't seem to be it:

        if (!fp->jited)
                fp = bpf_migrate_filter(fp);

        return fp;

So, I'm going to assume the missing free is somehow related to
process management, since I see the Syzkaller reproducer mentions
SECCOMP_SET_MODE_FILTER_LISTENER, fork(), and ptrace(). :)

Quoting from the v5.10.y report:
> # {Threaded:true Collide:true Repeat:true RepeatTimes:0 Procs:8 Slowdown:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 Leak:true NetInjection:true NetDevices:true NetReset:true Cgroups:true BinfmtMisc:true CloseFDs:true KCSAN:false DevlinkPCI:false USB:true VhciInjection:false Wifi:false IEEE802154:false Sysctl:true UseTmpDir:true HandleSegv:true Repro:false Trace:false}
> seccomp$SECCOMP_SET_MODE_FILTER_LISTENER(0x1, 0x0, &(0x7f0000000000)={0x1, &(0x7f0000000040)=[{0x6, 0x0, 0x0, 0x7fffffff}]})

0x1 is SECCOMP_SET_MODE_FILTER
0x0 is empty flags
{0x6, 0x0, 0x0, 0x7fffffff} is
	BPF_STMT(BPF_RET, SECCOMP_RET_ALLOW | 0xffff)

For "SECCOMP_SET_MODE_FILTER_LISTENER", defined here:
https://github.com/google/syzkaller/blob/master/sys/linux/seccomp.txt#L15
I was expecting flags to include SECCOMP_FILTER_FLAG_NEW_LISTENER:
	seccomp$SECCOMP_SET_MODE_FILTER_LISTENER(
			op const[SECCOMP_SET_MODE_FILTER],
			flags flags[seccomp_flags_listener],
			arg ptr[in, sock_fprog]) fd_seccomp (breaks_returns)

For the flags:

	seccomp_flags_listener = SECCOMP_FILTER_FLAG_NEW_LISTENER,
				 SECCOMP_FILTER_FLAG_LOG_LISTENER,
				 SECCOMP_FILTER_FLAG_SPEC_ALLOW_LISTENER

which is:

	SECCOMP_FILTER_FLAG_LOG_LISTENER = 10
	SECCOMP_FILTER_FLAG_NEW_LISTENER = 8
	SECCOMP_FILTER_FLAG_SPEC_ALLOW = 4
	SECCOMP_FILTER_FLAG_SPEC_ALLOW_LISTENER = 12

How is flags 0 above? (Maybe I don't understand the syzkaller reproducer
meaning fully?)

> r0 = fork()
> ptrace(0x10, r0)

0x10 is PTRACE_ATTACH

My best guess is there is some LISTENER refcount state we can get into
where all the processes die, but a reference is left alive.

-Kees

-- 
Kees Cook

  reply	other threads:[~2021-08-01  3:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-31 19:20 Sudip Mukherjee
2021-08-01  3:25 ` Kees Cook [this message]
2021-08-01 21:10   ` Sudip Mukherjee
  -- strict thread matches above, loose matches on Subject: below --
2020-08-11 17:06 syzbot
2020-08-31  3:50 ` syzbot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202107311901.8CDF235F65@keescook \
    --to=keescook@chromium.org \
    --cc=alaaemadhossney.ae@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=daniel@iogearbox.net \
    --cc=jannh@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=netdev@vger.kernel.org \
    --cc=sargun@sargun.me \
    --cc=songliubraving@fb.com \
    --cc=sudipm.mukherjee@gmail.com \
    --cc=syzkaller@googlegroups.com \
    --cc=tycho@tycho.pizza \
    --cc=wad@chromium.org \
    --cc=yhs@fb.com \
    --subject='Re: memory leak in do_seccomp' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).