LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* dget BUG from proc_exe_link in 2.6.6-mm2
@ 2004-05-17  0:37 Neil Brown
       [not found] ` <20040516180812.602b2aac.akpm@osdl.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Neil Brown @ 2004-05-17  0:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Andrea Arcangeli


It is entirely possible that this is a known problem, as I saw a bit
of discussion in linux-kernel about locking and VMAs, but..

I had 2.6.6-mm2 running over the weekend, and some nightly jobs
including  /usr/sbin/dpkg-statoverride
triggered:


        ___      ______
      0--,|    /OOOOOO\
     {_o  /  /OO plop OO\
       \__\_/OO oh dear OOO\s
          \OOOOOOOOOOOOOOOO/
           __XXX__   __XXX__
------------[ cut here ]------------
kernel BUG at include/linux/dcache.h:277!
invalid operand: 0000 [#5]
SMP DEBUG_PAGEALLOC
Modules linked in:
CPU:    0
EIP:    0060:[<c0194c97>]    Not tainted VLI
EFLAGS: 00010246   (2.6.6-mm2) 
EIP is at proc_exe_link+0x107/0x150
eax: 00000000   ebx: e7d17dc4   ecx: e2be1f60   edx: ecfa4f58
esi: e7d17de4   edi: de698fc8   ebp: e2be1f60   esp: e2be1f24
ds: 007b   es: 007b   ss: 0068
Process dpkg-statoverri (pid: 26426, threadinfo=e2be1000 task=de698a50)
Stack: c0d8a000 c011505b fffffffe e2be1f5c 00000282 0024c5a3 00000000 ead86e94 
       00000000 00000fff c0195cca ead86e94 ead86e94 bfffec40 40a7ce68 f7fa1720 
       40a7ce68 1de1cfe4 c043dee0 ead86e94 00000fff e2be1f80 c0169187 c67ccf58 
Call Trace:
 [<c011505b>] kernel_map_pages+0x2b/0x68
 [<c0195cca>] proc_pid_readlink+0x8a/0x160
 [<c0169187>] sys_readlink+0x87/0x90
 [<c014efdb>] sys_brk+0xeb/0x120
 [<c0105dff>] syscall_call+0x7/0xb


It would seem that a readlink of /proc/$PID/exe is happening while the
vma is being torn down, and when we dget(vma->vm_file->f_dentry), the
dentry has already be dput by someone.  So presumable whoever calls
fput on vma->vm_file isn't doing it under mm->mmap_sem.

Well, that's my guess.  I'll let you know if anything happens under
2.6.6-mm3.

NeilBrown

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: dget BUG from proc_exe_link in 2.6.6-mm2
       [not found] ` <20040516180812.602b2aac.akpm@osdl.org>
@ 2004-05-17  2:31   ` Neil Brown
  0 siblings, 0 replies; 2+ messages in thread
From: Neil Brown @ 2004-05-17  2:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, andrea, Hugh Dickins

On Sunday May 16, akpm@osdl.org wrote:
> exit_mmap() (at least) doesn't hold down_write(mmap_sem), and never has -
> it assumes that there are no more references to the going-away mm's vma
> tree.  It forgot about /proc.  I don't immediately see why this is a new
> bug.
> 
> I dunno if this will work, but I do know that it'll cause deadlocks every
> time when the oops code tries to kill off the oopsing task via do_exit(),
> which is a bit unfortunate.
> 

You don't really need to protect the remove_vm_struct.  You only need
to protect mm->mmap.  It should be sufficient to 'down' and 'up' the
semaphore after "mm->mmap = NULL" and before calling remove_mv_struct.
That will synchronise with proc_exe_link.

NeilBrown


 ----------- Diffstat output ------------
 ./mm/mmap.c |    7 +++++++
 1 files changed, 7 insertions(+)

diff ./mm/mmap.c~current~ ./mm/mmap.c
--- ./mm/mmap.c~current~	2004-05-17 12:28:07.000000000 +1000
+++ ./mm/mmap.c	2004-05-17 12:28:09.000000000 +1000
@@ -1493,6 +1493,13 @@ void exit_mmap(struct mm_struct *mm)
 
 	spin_unlock(&mm->page_table_lock);
 
+	down_write(&mm->mmap_sem);
+	/* anyone who might have grabbed mm->mmap before we NULLed it
+	 * should have done so under mm->mmap_sem (e.g. proc_exe_link)
+	 * and so will have let go if it by now, so it is safe to tear it down
+	 */
+	up_write(&mm->mmap_sem);
+
 	/*
 	 * Walk the list again, actually closing and freeing it
 	 * without holding any MM locks.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-05-17  2:31 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-17  0:37 dget BUG from proc_exe_link in 2.6.6-mm2 Neil Brown
     [not found] ` <20040516180812.602b2aac.akpm@osdl.org>
2004-05-17  2:31   ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).