Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* fcntl(F_DUPFD) causing apparent file descriptor table corruption
@ 2020-05-19 21:03 Thiago Macieira
  2020-05-19 21:45 ` Al Viro
  0 siblings, 1 reply; 5+ messages in thread
From: Thiago Macieira @ 2020-05-19 21:03 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2732 bytes --]

Hello Al & others

While doing something I shouldn't be doing in my code, I realised that my code 
stopped responding. Turns out that after some F_DUPFD forcing the file 
descriptor preposterously high, the low file descriptors become EBADF. 
Something must be wrong in expand_files, but I don't have the expertise to 
debug it. I'm hoping someone else will.

I've attached a testcase for it. It needs to be run as root. It's not threaded 
and it reproduces the problem 100% of the time on my stable kernel (5.6.13). 
This is not a security issue, since it affects only the calling process 
itself.

On my machine, /proc/sys/fs/nr_open is 1073741816 and I have 32 GB of RAM (if 
the problem is related to memory consumption).

The problem only occurs when growing the table.

strace shows something like:
fcntl(2, F_DUPFD, 1024)                 = 1024
close(1024)                             = 0
fcntl(2, F_DUPFD, 2048)                 = 2048
close(2048)                             = 0
fcntl(2, F_DUPFD, 4096)                 = 4096
close(4096)                             = 0
fcntl(2, F_DUPFD, 8192)                 = 8192
close(8192)                             = 0
fcntl(2, F_DUPFD, 16384)                = 16384
close(16384)                            = 0
fcntl(2, F_DUPFD, 32768)                = 32768
close(32768)                            = 0
fcntl(2, F_DUPFD, 65536)                = 65536
close(65536)                            = 0
fcntl(2, F_DUPFD, 131072)               = 131072
close(131072)                           = 0
fcntl(2, F_DUPFD, 262144)               = 262144
close(262144)                           = 0
fcntl(2, F_DUPFD, 524288)               = 524288
close(524288)                           = 0
fcntl(2, F_DUPFD, 1048576)              = 1048576
close(1048576)                          = 0
fcntl(2, F_DUPFD, 2097152)              = 2097152
close(2097152)                          = 0
fcntl(2, F_DUPFD, 4194304)              = 4194304
close(4194304)                          = 0
fcntl(2, F_DUPFD, 8388608)              = 8388608
close(8388608)                          = 0
fcntl(2, F_DUPFD, 16777216)             = 16777216
close(16777216)                         = 0
fcntl(2, F_DUPFD, 33554432)             = 33554432
close(33554432)                         = 0
fcntl(2, F_DUPFD, 67108864)             = 67108864
close(67108864)                         = 0
fcntl(2, F_DUPFD, 134217728)            = 134217728
close(134217728)                        = 0
fcntl(2, F_DUPFD, 536870912)            = 536870912
close(536870912)                        = 0
write(1, "success\n", 8)                = EBADF
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products

[-- Attachment #2: dupfd-bug.c --]
[-- Type: text/plain, Size: 1419 bytes --]

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/resource.h>
#include <unistd.h>

void run_test(int maxfd)
{
    int srcfd = STDERR_FILENO;
    int minfd = 1024;

    /* Linux kernel expands the file descriptor table exponentially, so
     * keep requesting a minimum file descriptor exponentially. */
    for ( ; minfd < maxfd; minfd *= 2) {
        int fd;
        do {
            fd = fcntl(srcfd, F_DUPFD, minfd);
        } while (fd == -1 && errno == EINTR);

        if (fd == -1) {
            if (errno != EMFILE)
                perror("fcntl");
            return;
        }
        close(fd);
    }
}

int main(int argc, char **argv)
{
    struct rlimit lim;
    if (argc > 1) {
        lim.rlim_max = lim.rlim_cur = strtol(argv[1], NULL, 0);
    } else {
        int n;
        FILE *f = fopen("/proc/sys/fs/nr_open","r");
        if (!f) {
            perror("fopen");
            return EXIT_FAILURE;
        }
        if (fscanf(f, "%d", &n) != 1)
            return EXIT_FAILURE;
        fclose(f);
        lim.rlim_max = lim.rlim_cur = n;
    }

    if (setrlimit(RLIMIT_NOFILE, &lim) == -1) {
        perror("setrlimit");
        return EXIT_FAILURE;
    }

    run_test(lim.rlim_cur);

    static const char msg[] = "success\n";
    int ok1 = write(STDOUT_FILENO, msg, strlen(msg)) == strlen(msg);
    return ok1 ? EXIT_SUCCESS : 255;
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fcntl(F_DUPFD) causing apparent file descriptor table corruption
  2020-05-19 21:03 fcntl(F_DUPFD) causing apparent file descriptor table corruption Thiago Macieira
@ 2020-05-19 21:45 ` Al Viro
  2020-05-19 22:04   ` Al Viro
  2020-05-19 22:18   ` Thiago Macieira
  0 siblings, 2 replies; 5+ messages in thread
From: Al Viro @ 2020-05-19 21:45 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: linux-fsdevel, Linus Torvalds

On Tue, May 19, 2020 at 02:03:03PM -0700, Thiago Macieira wrote:

> On my machine, /proc/sys/fs/nr_open is 1073741816 and I have 32 GB of RAM (if 
> the problem is related to memory consumption).
> 
> The problem only occurs when growing the table.
> 
> strace shows something like:
> fcntl(2, F_DUPFD, 1024)                 = 1024
> close(1024)                             = 0
> fcntl(2, F_DUPFD, 2048)                 = 2048
> close(2048)                             = 0
> fcntl(2, F_DUPFD, 4096)                 = 4096
> close(4096)                             = 0
> fcntl(2, F_DUPFD, 8192)                 = 8192
> close(8192)                             = 0
> fcntl(2, F_DUPFD, 16384)                = 16384
> close(16384)                            = 0
> fcntl(2, F_DUPFD, 32768)                = 32768
> close(32768)                            = 0
> fcntl(2, F_DUPFD, 65536)                = 65536
> close(65536)                            = 0
> fcntl(2, F_DUPFD, 131072)               = 131072
> close(131072)                           = 0
> fcntl(2, F_DUPFD, 262144)               = 262144
> close(262144)                           = 0
> fcntl(2, F_DUPFD, 524288)               = 524288
> close(524288)                           = 0
> fcntl(2, F_DUPFD, 1048576)              = 1048576
> close(1048576)                          = 0
> fcntl(2, F_DUPFD, 2097152)              = 2097152
> close(2097152)                          = 0
> fcntl(2, F_DUPFD, 4194304)              = 4194304
> close(4194304)                          = 0
> fcntl(2, F_DUPFD, 8388608)              = 8388608
> close(8388608)                          = 0
> fcntl(2, F_DUPFD, 16777216)             = 16777216
> close(16777216)                         = 0
> fcntl(2, F_DUPFD, 33554432)             = 33554432
> close(33554432)                         = 0
> fcntl(2, F_DUPFD, 67108864)             = 67108864
> close(67108864)                         = 0
> fcntl(2, F_DUPFD, 134217728)            = 134217728
> close(134217728)                        = 0
> fcntl(2, F_DUPFD, 536870912)            = 536870912
> close(536870912)                        = 0
> write(1, "success\n", 8)                = EBADF

Er...  It's going by powers of two, isn't it?  So where's
268435456 between 134217728 and 536870912?  And, assuming
it's a 64bit box, 536870912 pointers is 4Gb, which suggests
that we are running into 32bit overflow on calculating
some size...  Let's see -
static int expand_fdtable(struct files_struct *files, unsigned int nr)
...
        cur_fdt = files_fdtable(files);
        BUG_ON(nr < cur_fdt->max_fds);
        copy_fdtable(new_fdt, cur_fdt);
so we probably want copy_fdtable().
static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
{
        unsigned int cpy, set;

        BUG_ON(nfdt->max_fds < ofdt->max_fds);

        cpy = ofdt->max_fds * sizeof(struct file *);

Right, here's your multiplication overflow.

        set = (nfdt->max_fds - ofdt->max_fds) * sizeof(struct file *);
        memcpy(nfdt->fd, ofdt->fd, cpy);
... and here's not getting the things copied.  Which means that pointer
is left uninitialized and the damn thing might very well be a security
problem - you'd lucked out and ran into NULL, but had there been a pointer
to something, you would've gotten a memory corruptor.

The obvious fix would be to turn cpy and set into size_t - as in
ed fs/file.c <<'EOF'
/copy_fdtable/+2s/unsigned int/size_t/
w
q
EOF

On size_t overflow you would've failed allocation before getting to that
point - see sysctl_nr_open_max initializer.  Overflow in alloc_fdtable()
(nr is unsigned int there) also can't happen, AFAICS - the worst you
can get is 1U<<31, which will fail sysctl_nr_open comparison.

I really wonder about the missing couple of syscalls in your strace, though;
could you verify that they _are_ missing and see what the fix above does to
your testcase?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fcntl(F_DUPFD) causing apparent file descriptor table corruption
  2020-05-19 21:45 ` Al Viro
@ 2020-05-19 22:04   ` Al Viro
  2020-05-19 22:18   ` Thiago Macieira
  1 sibling, 0 replies; 5+ messages in thread
From: Al Viro @ 2020-05-19 22:04 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: linux-fsdevel, Linus Torvalds

On Tue, May 19, 2020 at 10:45:20PM +0100, Al Viro wrote:

> The obvious fix would be to turn cpy and set into size_t - as in
> ed fs/file.c <<'EOF'
> /copy_fdtable/+2s/unsigned int/size_t/
> w
> q
> EOF
> 
> On size_t overflow you would've failed allocation before getting to that
> point - see sysctl_nr_open_max initializer.  Overflow in alloc_fdtable()
> (nr is unsigned int there) also can't happen, AFAICS - the worst you
> can get is 1U<<31, which will fail sysctl_nr_open comparison.
> 
> I really wonder about the missing couple of syscalls in your strace, though;
> could you verify that they _are_ missing and see what the fix above does to
> your testcase?

Anyway, whether it's all there is to your reproducers or not, the bug
is obvious; I've pushed the following into #fixes.

commit 784233a6d4a56f1d0e6e4490fbf38d3cce5742ec
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue May 19 17:48:52 2020 -0400

    fix multiplication overflow in copy_fdtable()
    
    cpy and set really should be size_t; we won't get an overflow on that,
    since sysctl_nr_open can't be set above ~(size_t)0 / sizeof(void *),
    so nr that would've managed to overflow size_t on that multiplication
    won't get anywhere near copy_fdtable() - we'll fail with EMFILE
    before that.
    
    Cc: stable@kernel.org # v2.6.25+
    Fixes: 9cfe015aa424 (get rid of NR_OPEN and introduce a sysctl_nr_open)
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/file.c b/fs/file.c
index c8a4e4c86e55..abb8b7081d7a 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -70,7 +70,7 @@ static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt,
  */
 static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
 {
-	unsigned int cpy, set;
+	size_t cpy, set;
 
 	BUG_ON(nfdt->max_fds < ofdt->max_fds);
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fcntl(F_DUPFD) causing apparent file descriptor table corruption
  2020-05-19 21:45 ` Al Viro
  2020-05-19 22:04   ` Al Viro
@ 2020-05-19 22:18   ` Thiago Macieira
  2020-05-19 22:28     ` Al Viro
  1 sibling, 1 reply; 5+ messages in thread
From: Thiago Macieira @ 2020-05-19 22:18 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]

On Tuesday, 19 May 2020 14:45:20 PDT Al Viro wrote:
> ... and here's not getting the things copied.  Which means that pointer
> is left uninitialized and the damn thing might very well be a security
> problem - you'd lucked out and ran into NULL, but had there been a pointer
> to something, you would've gotten a memory corruptor.

Not sure I'd call it a security issue. Only root (CAP_SYS_RESOURCE) can cause 
it by raising the file descriptor limit from the defaults. The kernel still 
defaults to 4096 and systemd raises it on boot to 512k, which is 3 orders of 
magnitude less than what is needed to cause the issue.

> I really wonder about the missing couple of syscalls in your strace, though;
> could you verify that they _are_ missing and see what the fix above does to
> your testcase?

Looking at my terminal backtrace, I might have made a copy & paste mistake of 
the trace while flipping pages. Unfortunately, the trace file I had in /tmp 
was lost because I needed to reboot the machine. The other traces I have in my 
terminal show:

fcntl(2, F_DUPFD, 134217728)            = 134217728
close(134217728)                        = 0
fcntl(2, F_DUPFD, 268435456)            = 268435456
close(268435456)                        = 0
fcntl(2, F_DUPFD, 536870912)            = 536870912
close(536870912)                        = 0
write(1, "success\n", 8)                = ?
^C^Czsh: killed     sudo strace ./dupfd-bug

I had to killall -9 strace at this point. See the attached oops.

Then I insisted:

fcntl(2, F_DUPFD, 67108864)             = 67108864
close(67108864)                         = 0
fcntl(2, F_DUPFD, 134217728)            = 134217728
close(134217728)                        = 0
fcntl(2, F_DUPFD, 268435456Shared connection to <REDACTED> closed.

At this point, I need to drive to the office to reboot the machine. Building 
the kernel and testing will take a few days.

Note to self: don't play with possible kernel bugs without a VM.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel System Software Products

[-- Attachment #2: oops.txt --]
[-- Type: text/plain, Size: 4435 bytes --]

[19186.822144] alloc_fd: slot 536870912 not NULL!
[19186.822963] BUG: unable to handle page fault for address: 00000000000d73ef
[19186.823725] #PF: supervisor read access in kernel mode
[19186.824331] #PF: error_code(0x0000) - not-present page
[19186.824950] PGD 0 P4D 0 
[19186.825344] Oops: 0000 [#6] SMP PTI
[19186.825813] CPU: 2 PID: 71323 Comm: dupfd-bug Tainted: G      D   I       5.6.13-952.native #1
[19186.826724] Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS Gaming 3 Pro-CF, BIOS F3 12/28/2017
[19186.827837] RIP: 0010:__fget_light+0x63/0x70
[19186.828374] Code: c0 5d c3 48 8b 4f 20 8b 01 41 39 c0 73 22 44 89 c7 48 39 c7 48 19 ff 48 8b 41 08 41 21 f8 4a 8d 04 c0 48 8b 00 48 85 c0 74 06 <23> 50 44 75 01 c3 31 c0 c3 0f 1f 40 00 55 be 00 40 00 00 48 89 e5
[19186.830091] RSP: 0018:ffffaf8281fb7ed0 EFLAGS: 00010202
[19186.830750] RAX: 00000000000d73ab RBX: 0000000000000001 RCX: ffff8ab156879300
[19186.831537] RDX: 0000000000004000 RSI: 0000000000004000 RDI: ffffffffffffffff
[19186.832326] RBP: ffffaf8281fb7ee0 R08: 0000000000000001 R09: ffff8ab0d5988000
[19186.833121] R10: ffffaf8280127e6d R11: 000000000000000c R12: ffffaf8281fb7f58
[19186.833898] R13: 00005651ef8ac034 R14: 0000000000000008 R15: 0000000000000000
[19186.834672] FS:  00007ff99749d540(0000) GS:ffff8ab69f680000(0000) knlGS:0000000000000000
[19186.835529] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19186.836192] CR2: 00000000000d73ef CR3: 0000000255988006 CR4: 00000000003606e0
[19186.836967] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19186.837741] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19186.838520] Call Trace:
[19186.838920]  ? __fdget_pos+0x12/0x50
[19186.839411]  ksys_write+0x1a/0xd0
[19186.839877]  __x64_sys_write+0x15/0x20
[19186.840381]  do_syscall_64+0x55/0xf0
[19186.840866]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[19186.841467] RIP: 0033:0x7ff9973b789a
[19186.841945] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 20 b8 01 00 00 00 c5 fc 77 0f 05 <66> 0f 1f 44 00 00 48 3d 00 f0 ff ff 77 60 c3 0f 1f 80 00 00 00 00
[19186.843655] RSP: 002b:00007ffd0f3801c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[19186.844475] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff9973b789a
[19186.845255] RDX: 0000000000000008 RSI: 00005651ef8ac034 RDI: 0000000000000001
[19186.846041] RBP: 00005651f0ae92a0 R08: 0000000000000000 R09: 0000000000000000
[19186.846825] R10: 0000000000000000 R11: 0000000000000246 R12: 00005651ef8ab1d0
[19186.847606] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[19186.848392] Modules linked in: xt_REDIRECT iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables bpfilter intel_wmi_thunderbolt wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nfit edac_core libnvdimm nouveau snd_hda_codec_hdmi encrypted_keys trusted tpm snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core psmouse isst_if_common snd_hwdep mxm_wmi serio_raw snd_pcm nvidiafb snd_timer vgastate snd fb_ddc e1000e soundcore mei_me i2c_i801 mei wmi atkbd libps2 i8042
[19186.852589] CR2: 00000000000d73ef
[19186.853068] ---[ end trace de4959d19d1789ea ]---
[19189.681955] RIP: 0010:__fget_files+0x32/0x70
[19189.682535] Code: d0 48 8b 57 20 89 ce 8b 02 41 39 c2 73 49 49 39 c2 48 19 c0 48 8b 52 08 44 21 c0 48 8d 04 c2 4c 8b 18 4d 85 db 74 30 44 89 c8 <41> 23 43 44 75 27 49 8b 43 38 49 8d 53 38 48 85 c0 74 0f 48 8d 0c
[19189.684289] RSP: 0018:ffffaf8281883ec0 EFLAGS: 00010206
[19189.684945] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0000000000000001
[19189.685750] RDX: ffffaf849e009000 RSI: 0000000000000001 RDI: ffff8aaff7dc1340
[19189.686559] RBP: ffffaf8281883ec8 R08: 0000000000000008 R09: 0000000000004000
[19189.687363] R10: 0000000000000008 R11: 0000000a00000000 R12: ffffaf8281883f58
[19189.688172] R13: 000000000593d323 R14: 0000000000000001 R15: 0000000000000000
[19189.688981] FS:  00007ff99749d540(0000) GS:ffff8ab69f680000(0000) knlGS:0000000000000000
[19189.689860] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19189.690556] CR2: 00000000000d73ef CR3: 0000000255988006 CR4: 00000000003606e0
[19189.691360] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19189.692170] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19189.693774] ------------[ cut here ]------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: fcntl(F_DUPFD) causing apparent file descriptor table corruption
  2020-05-19 22:18   ` Thiago Macieira
@ 2020-05-19 22:28     ` Al Viro
  0 siblings, 0 replies; 5+ messages in thread
From: Al Viro @ 2020-05-19 22:28 UTC (permalink / raw)
  To: Thiago Macieira; +Cc: linux-fsdevel, Linus Torvalds

On Tue, May 19, 2020 at 03:18:13PM -0700, Thiago Macieira wrote:

> > I really wonder about the missing couple of syscalls in your strace, though;
> > could you verify that they _are_ missing and see what the fix above does to
> > your testcase?
> 
> Looking at my terminal backtrace, I might have made a copy & paste mistake of 
> the trace while flipping pages. Unfortunately, the trace file I had in /tmp 
> was lost because I needed to reboot the machine. The other traces I have in my 
> terminal show:
> 
> fcntl(2, F_DUPFD, 134217728)            = 134217728
> close(134217728)                        = 0
> fcntl(2, F_DUPFD, 268435456)            = 268435456
> close(268435456)                        = 0
> fcntl(2, F_DUPFD, 536870912)            = 536870912
> close(536870912)                        = 0
> write(1, "success\n", 8)                = ?
> ^C^Czsh: killed     sudo strace ./dupfd-bug
> 
> I had to killall -9 strace at this point. See the attached oops.

BS values in the array of struct file pointers due to the problem above.
And very likely a memory corruption as well.

> Then I insisted:
> 
> fcntl(2, F_DUPFD, 67108864)             = 67108864
> close(67108864)                         = 0
> fcntl(2, F_DUPFD, 134217728)            = 134217728
> close(134217728)                        = 0
> fcntl(2, F_DUPFD, 268435456Shared connection to <REDACTED> closed.
> 
> At this point, I need to drive to the office to reboot the machine. Building 
> the kernel and testing will take a few days.
> 
> Note to self: don't play with possible kernel bugs without a VM.

... at least not without remote console, complete with ability to
powercycle the box.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-05-19 22:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 21:03 fcntl(F_DUPFD) causing apparent file descriptor table corruption Thiago Macieira
2020-05-19 21:45 ` Al Viro
2020-05-19 22:04   ` Al Viro
2020-05-19 22:18   ` Thiago Macieira
2020-05-19 22:28     ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).