LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serge@hallyn.com>
To: Soren Hansen <soren@linux2go.dk>
Cc: linux-kernel@vger.kernel.org, Dave Hansen <haveblue@us.ibm.com>
Subject: Re: nbd locking problems
Date: Fri, 28 Jan 2011 21:18:26 +0000	[thread overview]
Message-ID: <20110128211825.GA20707@mail.hallyn.com> (raw)
In-Reply-To: <AANLkTimP=5A-BNHUwV0qhvjw8y4HevMvXtOBpRr6vFYR@mail.gmail.com>

Quoting Soren Hansen (soren@linux2go.dk):
> I'm seeing the following lockup on a current kernel:
> 
> [117360.880311] INFO: task qemu-nbd:16115 blocked for more than 120 seconds.
> [117360.880319] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [117360.880326] qemu-nbd      D 0000000000000001     0 16115  16114 0x00000004
> [117360.880337]  ffff88007d775d98 0000000000000082 ffff88007d775fd8
> ffff88007d774000
> [117360.880348]  0000000000013a80 ffff8800020347e0 ffff88007d775fd8
> 0000000000013a80
> [117360.880359]  ffff880133730000 ffff880002034440 ffffea0004333db8
> ffffffffa071c020
> [117360.880369] Call Trace:
> [117360.880389]  [<ffffffff815b9997>] __mutex_lock_slowpath+0xf7/0x180
> [117360.880400]  [<ffffffff81132f13>] ? handle_mm_fault+0x1c3/0x410
> [117360.880408]  [<ffffffff815b93eb>] mutex_lock+0x2b/0x50
> [117360.880419]  [<ffffffffa071a21c>] nbd_ioctl+0x6c/0x1c0 [nbd]
> [117360.880429]  [<ffffffff812cb970>] blkdev_ioctl+0x230/0x730
> [117360.880438]  [<ffffffff811967a1>] block_ioctl+0x41/0x50
> [117360.880447]  [<ffffffff81175c03>] do_vfs_ioctl+0x93/0x370
> [117360.880455]  [<ffffffff8116fa83>] ? putname+0x33/0x50
> [117360.880463]  [<ffffffff81175f61>] sys_ioctl+0x81/0xa0
> [117360.880473]  [<ffffffff8100c0c2>] system_call_fastpath+0x16/0x1b
> 
> It seems to be caused by 2a48fc0ab24241755dc93bfd4f01d68efab47f5a which
> replaced lock_kernel() with mutex operations.
> 
> In my attempts to debug this problem, I reverted that commit.
> 
> There's a printk statement that logs each ioctl call right before the
> lock_kernel call. I extended it to log the pid of the caller, and added
> a similar printk statement after unlock_kernel.
> 
> This worked for me, and I saw something like this in my dmesg:
> 
> [122322.650443] nbd9: in: nbd_ioctl pid=(24396) cmd=set-blksize(0xab01) arg=1024
> [122322.650454] nbd9: out: nbd_ioctl pid=(24396)
> cmd=set-blksize(0xab01) arg=1024
> [122322.650462] nbd9: in: nbd_ioctl pid=(24396)
> cmd=set-size-blocks(0xab07) arg=153600
> [122322.650469] nbd9: out: nbd_ioctl pid=(24396)
> cmd=set-size-blocks(0xab07) arg=153600
> [122322.650476] nbd9: in: nbd_ioctl pid=(24396) cmd=clear-sock(0xab04)
> arg=153600
> [122322.650483] nbd9: out: nbd_ioctl pid=(24396)
> cmd=clear-sock(0xab04) arg=153600
> [122322.650490] nbd9: in: nbd_ioctl pid=(24396) cmd=set-sock(0xab00) arg=3
> [122322.650497] nbd9: out: nbd_ioctl pid=(24396) cmd=set-sock(0xab00) arg=3
> [122322.651178] nbd9: in: nbd_ioctl pid=(24396) cmd=do-it(0xab03)
> arg=139925697560832
> [122322.652454]  nbd9: unknown partition table
> [122323.070522] nbd9: in: nbd_ioctl pid=(24427) cmd=clear-que(0xab05)
> arg=140267460190640
> [122323.070527] nbd9: out: nbd_ioctl pid=(24427) cmd=clear-que(0xab05)
> arg=140267460190640
> [122323.070530] nbd9: in: nbd_ioctl pid=(24427) cmd=disconnect(0xab08)
> arg=140267460190640
> [122323.070532] nbd9: NBD_DISCONNECT
> [122323.070555] nbd9: out: nbd_ioctl pid=(24427)
> cmd=disconnect(0xab08) arg=140267460190640
> [122323.070558] nbd9: in: nbd_ioctl pid=(24427) cmd=clear-sock(0xab04)
> arg=140267460190640
> [122323.070561] nbd9: out: nbd_ioctl pid=(24427)
> cmd=clear-sock(0xab04) arg=140267460190640
> [122323.070667] nbd9: Receive control failed (result -32)
> [122323.070710] nbd9: queue cleared
> [122323.071186] nbd9: out: nbd_ioctl pid=(24396) cmd=do-it(0xab03)
> arg=139925697560832
> [122323.071194] nbd9: in: nbd_ioctl pid=(24396) cmd=clear-que(0xab05) arg=32
> [122323.071197] nbd9: out: nbd_ioctl pid=(24396) cmd=clear-que(0xab05) arg=32
> [122323.071200] nbd9: in: nbd_ioctl pid=(24396) cmd=clear-sock(0xab04) arg=32
> [122323.071203] nbd9: out: nbd_ioctl pid=(24396) cmd=clear-sock(0xab04) arg=32
> 
> As I understand it, lock_kernel is implemented using a semaphore which
> allows a single process to invoke lock_kernel more than once without
> locking up. However, as can be seen above (timestamp 122322.651178), pid
> 24396 invokes the NBD_DO_IT ioctl. This calls lock_kernel and starts a
> kernel thread to do the heavy lifting of nbd. Later on, pid 24427 comes
> along, wanting to shut down the connection. It works. I don't understand
> how it makes it past the lock_kernel() call. There's only one instance
> of each of lock_kernel and unlock_kernel in the nbd driver, so I'm
> fairly certain unlock_kernel isn't being called (at least not by nbd
> directly).
> 
> With the mutex code introduced by
> 2a48fc0ab24241755dc93bfd4f01d68efab47f5a but with my extra printk's, this
> is what I get:
> 
> [125474.139952] nbd9: in: nbd_ioctl pid=(26015) cmd=set-blksize(0xab01) arg=1024
> [125474.139963] nbd9: out: nbd_ioctl pid=(26015)
> cmd=set-blksize(0xab01) arg=1024
> [125474.139971] nbd9: in: nbd_ioctl pid=(26015)
> cmd=set-size-blocks(0xab07) arg=153600
> [125474.139979] nbd9: out: nbd_ioctl pid=(26015)
> cmd=set-size-blocks(0xab07) arg=153600
> [125474.139986] nbd9: in: nbd_ioctl pid=(26015) cmd=clear-sock(0xab04)
> arg=153600
> [125474.139994] nbd9: out: nbd_ioctl pid=(26015)
> cmd=clear-sock(0xab04) arg=153600
> [125474.140001] nbd9: in: nbd_ioctl pid=(26015) cmd=set-sock(0xab00) arg=3
> [125474.140036] nbd9: out: nbd_ioctl pid=(26015) cmd=set-sock(0xab00) arg=3
> [125474.140660] nbd9: in: nbd_ioctl pid=(26015) cmd=do-it(0xab03)
> arg=140041726828800
> [125474.142255]  nbd9: unknown partition table
> [125475.604613] nbd9: in: nbd_ioctl pid=(26047) cmd=clear-que(0xab05)
> arg=139640717427120
> 
> ...and then it sits there until I kill -9 26015.  This makes sense to
> me, though. The mutex is held by 26015, so of course 26047 can't grab
> it, too. The mystery is why it works with lock_kernel. It doesn't
> release the kernel explicitly, and it doesn't (explicitly, at least)
> call schedule().

Thanks for looking into this, Soren.

As Dave just explained to me, BKL is released when you sleep :)  I
assume that's how it gets released around the kthread_create().  I
*think* you're right that the new mutex is superfluous, but I'd like to
look through the ioctl code and make sure there's no shared state which
we need to protect.  I don't see how there could be, or rather, if there
were, then it was broken before.

In the meantime, could you please post your patch?  (Is it the
three-liner I'd expect?)

thanks,
-serge

  reply	other threads:[~2011-01-28 21:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-26 10:52 Soren Hansen
2011-01-28 21:18 ` Serge E. Hallyn [this message]
2011-01-28 21:30   ` Arnd Bergmann
2011-01-28 23:11   ` Soren Hansen
2011-03-13 10:28     ` Pavel Machek
2011-01-31 19:35   ` Serge E. Hallyn
2011-01-28 23:23 ` [PATCH] nbd: Remove module-level ioctl mutex Soren Hansen
2011-01-30 15:07 nbd locking problems Paul Clements

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110128211825.GA20707@mail.hallyn.com \
    --to=serge@hallyn.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=soren@linux2go.dk \
    --subject='Re: nbd locking problems' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).