LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
@ 2006-08-08 14:39 Jesper Juhl
  2006-08-09  5:53 ` Grant Coady
  2006-08-17  6:49 ` [NFS] " Neil Brown
  0 siblings, 2 replies; 16+ messages in thread
From: Jesper Juhl @ 2006-08-08 14:39 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Trond Myklebust, nfs

I have some webservers that have recently started reporting the
following message in their logs :

  do_vfs_lock: VFS is out of sync with lock manager!

The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
the message started appearing.
The servers were previously running 2.6.13.4 without experiencing this problem.
Nothing has changed except the kernel.

I've googled a bit and found this mail
(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
"The above is a lockd error that states that the VFS is failing to track
your NFS locks correctly".
Ok, but that doesn't really help me resolve the issue. The servers are
indeed running NFS and access their apache DocumentRoots from a NFS
mount.

Is there anything I can do to help track down this issue?

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-08 14:39 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager! Jesper Juhl
@ 2006-08-09  5:53 ` Grant Coady
  2006-08-09  8:07   ` Jesper Juhl
  2006-08-17  6:49 ` [NFS] " Neil Brown
  1 sibling, 1 reply; 16+ messages in thread
From: Grant Coady @ 2006-08-09  5:53 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Linux Kernel Mailing List, Trond Myklebust, nfs

On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <jesper.juhl@gmail.com> wrote:

>I have some webservers that have recently started reporting the
>following message in their logs :
>
>  do_vfs_lock: VFS is out of sync with lock manager!
>
>The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
>the message started appearing.
>The servers were previously running 2.6.13.4 without experiencing this problem.
>Nothing has changed except the kernel.
>
>I've googled a bit and found this mail
>(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
>"The above is a lockd error that states that the VFS is failing to track
>your NFS locks correctly".
>Ok, but that doesn't really help me resolve the issue. The servers are
>indeed running NFS and access their apache DocumentRoots from a NFS
>mount.
>
>Is there anything I can do to help track down this issue?

I don't have an answer, but offer this observation: five boxen running 
2.6.17.8 doing six simultaneous

  bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1

didn't burp.  The /home/share/ is an NFS export from another box running 
2.4.33-rc3a, me not sure if this was exercising any NFS locking as the 
NFS source file was only opened for non-exclusive read-only.

Grant.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-09  5:53 ` Grant Coady
@ 2006-08-09  8:07   ` Jesper Juhl
  2006-08-10 22:37     ` Jesper Juhl
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Juhl @ 2006-08-09  8:07 UTC (permalink / raw)
  To: Grant Coady; +Cc: Linux Kernel Mailing List, Trond Myklebust, nfs

On 09/08/06, Grant Coady <gcoady.lk@gmail.com> wrote:
> On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
>
> >I have some webservers that have recently started reporting the
> >following message in their logs :
> >
> >  do_vfs_lock: VFS is out of sync with lock manager!
> >
> >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
> >the message started appearing.
> >The servers were previously running 2.6.13.4 without experiencing this problem.
> >Nothing has changed except the kernel.
> >
> >I've googled a bit and found this mail
> >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
> >"The above is a lockd error that states that the VFS is failing to track
> >your NFS locks correctly".
> >Ok, but that doesn't really help me resolve the issue. The servers are
> >indeed running NFS and access their apache DocumentRoots from a NFS
> >mount.
> >
> >Is there anything I can do to help track down this issue?
>
> I don't have an answer, but offer this observation: five boxen running
> 2.6.17.8 doing six simultaneous
>
>   bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
>
> didn't burp.  The /home/share/ is an NFS export from another box running
> 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
> NFS source file was only opened for non-exclusive read-only.
>
The NFS server here is running 2.6.11.11 and doesn't seem to be
reporting any problems. But I now have two more of my webservers (both
running 2.6.17.8) that have started to complain about "do_vfs_lock:
VFS is out of sync with lock manager!"

I've not found a way to cause the message to be repported at will unfortunately.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-09  8:07   ` Jesper Juhl
@ 2006-08-10 22:37     ` Jesper Juhl
  2006-08-11  0:30       ` Grant Coady
  2006-08-13 23:08       ` Grant Coady
  0 siblings, 2 replies; 16+ messages in thread
From: Jesper Juhl @ 2006-08-10 22:37 UTC (permalink / raw)
  To: Grant Coady; +Cc: Linux Kernel Mailing List, Trond Myklebust, nfs

On 09/08/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On 09/08/06, Grant Coady <gcoady.lk@gmail.com> wrote:
> > On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
> >
> > >I have some webservers that have recently started reporting the
> > >following message in their logs :
> > >
> > >  do_vfs_lock: VFS is out of sync with lock manager!
> > >
> > >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
> > >the message started appearing.
> > >The servers were previously running 2.6.13.4 without experiencing this problem.
> > >Nothing has changed except the kernel.
> > >
> > >I've googled a bit and found this mail
> > >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
> > >"The above is a lockd error that states that the VFS is failing to track
> > >your NFS locks correctly".
> > >Ok, but that doesn't really help me resolve the issue. The servers are
> > >indeed running NFS and access their apache DocumentRoots from a NFS
> > >mount.
> > >
> > >Is there anything I can do to help track down this issue?
> >
> > I don't have an answer, but offer this observation: five boxen running
> > 2.6.17.8 doing six simultaneous
> >
> >   bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
> >
> > didn't burp.  The /home/share/ is an NFS export from another box running
> > 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
> > NFS source file was only opened for non-exclusive read-only.
> >
> The NFS server here is running 2.6.11.11 and doesn't seem to be
> reporting any problems. But I now have two more of my webservers (both
> running 2.6.17.8) that have started to complain about "do_vfs_lock:
> VFS is out of sync with lock manager!"
>
> I've not found a way to cause the message to be repported at will unfortunately.
>
Today 3 more of my webservers running 2.6.17.8 reported this message.
The machines all seem to be running fine still, so it doesn't seem to
be a serious problem, but it would still be nice to get it fixed ;)


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-10 22:37     ` Jesper Juhl
@ 2006-08-11  0:30       ` Grant Coady
  2006-08-13 23:08       ` Grant Coady
  1 sibling, 0 replies; 16+ messages in thread
From: Grant Coady @ 2006-08-11  0:30 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Linux Kernel Mailing List, Trond Myklebust, nfs

On Fri, 11 Aug 2006 00:37:35 +0200, "Jesper Juhl" <jesper.juhl@gmail.com> wrote:

>On 09/08/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
>> On 09/08/06, Grant Coady <gcoady.lk@gmail.com> wrote:
>> > On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
>> >
>> > >I have some webservers that have recently started reporting the
>> > >following message in their logs :
>> > >
>> > >  do_vfs_lock: VFS is out of sync with lock manager!
>> > >
>> > >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
>> > >the message started appearing.
>> > >The servers were previously running 2.6.13.4 without experiencing this problem.
>> > >Nothing has changed except the kernel.
>> > >
>> > >I've googled a bit and found this mail
>> > >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
>> > >"The above is a lockd error that states that the VFS is failing to track
>> > >your NFS locks correctly".
>> > >Ok, but that doesn't really help me resolve the issue. The servers are
>> > >indeed running NFS and access their apache DocumentRoots from a NFS
>> > >mount.
>> > >
>> > >Is there anything I can do to help track down this issue?
>> >
>> > I don't have an answer, but offer this observation: five boxen running
>> > 2.6.17.8 doing six simultaneous
>> >
>> >   bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
>> >
>> > didn't burp.  The /home/share/ is an NFS export from another box running
>> > 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
>> > NFS source file was only opened for non-exclusive read-only.
>> >
>> The NFS server here is running 2.6.11.11 and doesn't seem to be
>> reporting any problems. But I now have two more of my webservers (both
>> running 2.6.17.8) that have started to complain about "do_vfs_lock:
>> VFS is out of sync with lock manager!"
>>
>> I've not found a way to cause the message to be repported at will unfortunately.
>>
>Today 3 more of my webservers running 2.6.17.8 reported this message.
>The machines all seem to be running fine still, so it doesn't seem to
>be a serious problem, but it would still be nice to get it fixed ;)

Do you have some test case other than web server that triggers it?  
I can try it here, recently did much NFS testing reported under 
2.4.33-rc1 or -rc2 on lkml to try sort a vfs_unlink issue.

My web server here is low volume, not a good test situation -- plus 
it no longer runs under 2.6 and I don't want to take it down too often 
to sort out why.  Once it could dual boot 2.4 or 2.6 without trouble, 
but that option fell apart when 2.6.16 came out.

Grant.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-10 22:37     ` Jesper Juhl
  2006-08-11  0:30       ` Grant Coady
@ 2006-08-13 23:08       ` Grant Coady
  1 sibling, 0 replies; 16+ messages in thread
From: Grant Coady @ 2006-08-13 23:08 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Linux Kernel Mailing List, Trond Myklebust, nfs

On Fri, 11 Aug 2006 00:37:35 +0200, "Jesper Juhl" <jesper.juhl@gmail.com> wrote:

>On 09/08/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
>> On 09/08/06, Grant Coady <gcoady.lk@gmail.com> wrote:
>> > On Tue, 8 Aug 2006 16:39:54 +0200, "Jesper Juhl" <jesper.juhl@gmail.com> wrote:
>> >
>> > >I have some webservers that have recently started reporting the
>> > >following message in their logs :
>> > >
>> > >  do_vfs_lock: VFS is out of sync with lock manager!
>> > >
>> > >The serveres kernels were upgraded to 2.6.17.8 and since the upgrade
>> > >the message started appearing.
>> > >The servers were previously running 2.6.13.4 without experiencing this problem.
>> > >Nothing has changed except the kernel.
>> > >
>> > >I've googled a bit and found this mail
>> > >(http://lkml.org/lkml/2005/8/23/254) from Trond saying that
>> > >"The above is a lockd error that states that the VFS is failing to track
>> > >your NFS locks correctly".
>> > >Ok, but that doesn't really help me resolve the issue. The servers are
>> > >indeed running NFS and access their apache DocumentRoots from a NFS
>> > >mount.
>> > >
>> > >Is there anything I can do to help track down this issue?
>> >
>> > I don't have an answer, but offer this observation: five boxen running
>> > 2.6.17.8 doing six simultaneous
>> >
>> >   bzcat /home/share/linux-2.6/patch-2.6.18-rc4.bz2|patch -p1
>> >
>> > didn't burp.  The /home/share/ is an NFS export from another box running
>> > 2.4.33-rc3a, me not sure if this was exercising any NFS locking as the
>> > NFS source file was only opened for non-exclusive read-only.
>> >
>> The NFS server here is running 2.6.11.11 and doesn't seem to be
>> reporting any problems. But I now have two more of my webservers (both
>> running 2.6.17.8) that have started to complain about "do_vfs_lock:
>> VFS is out of sync with lock manager!"
>>
>> I've not found a way to cause the message to be repported at will unfortunately.
>>
>Today 3 more of my webservers running 2.6.17.8 reported this message.
>The machines all seem to be running fine still, so it doesn't seem to
>be a serious problem, but it would still be nice to get it fixed ;)

I'm running continuous kernel 2.4.33 rebuild from make mrproper plus 
another console extracting tarball, diff tree against last_extracted, 
on pair of 2.6.17.8 boxen overnight with NFS TCP support, no problems, 
now testing without TCP support.  Report again only if I see problems.  

Let me know if you want to see test scripts.

Grant.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-08 14:39 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager! Jesper Juhl
  2006-08-09  5:53 ` Grant Coady
@ 2006-08-17  6:49 ` Neil Brown
  2006-08-17  9:58   ` Jesper Juhl
  1 sibling, 1 reply; 16+ messages in thread
From: Neil Brown @ 2006-08-17  6:49 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Linux Kernel Mailing List, nfs, Trond Myklebust

On Tuesday August 8, jesper.juhl@gmail.com wrote:
> I have some webservers that have recently started reporting the
> following message in their logs :
> 
>   do_vfs_lock: VFS is out of sync with lock manager!


I can imagine that happening if you mount with '-o nolocks'.
Then a non-blocking lock could cause that message (I think).
Can you conform that you aren't using 'nolocks'.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-17  6:49 ` [NFS] " Neil Brown
@ 2006-08-17  9:58   ` Jesper Juhl
  2006-08-21  3:34     ` Neil Brown
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Juhl @ 2006-08-17  9:58 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux Kernel Mailing List, nfs, Trond Myklebust

On 17/08/06, Neil Brown <neilb@suse.de> wrote:
> On Tuesday August 8, jesper.juhl@gmail.com wrote:
> > I have some webservers that have recently started reporting the
> > following message in their logs :
> >
> >   do_vfs_lock: VFS is out of sync with lock manager!
>
>
> I can imagine that happening if you mount with '-o nolocks'.
> Then a non-blocking lock could cause that message (I think).
> Can you conform that you aren't using 'nolocks'.
>
Confirmed.

All my webservers (the ones that generate this message) are identical
and this is the filesystems they have and their mount options:

/ is on a local scsi disk, ext3 fs, mounted with (rw)
/boot is on a local scsi disk, ext3 fs, mounted with (rw)
users homedirs (where the DocumentRoots are) are NFS mounted with
mount options (rw,rsize=8192,wsize=8192,hard,intr,addr=xxx.xxx.xxx.xxx)

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-17  9:58   ` Jesper Juhl
@ 2006-08-21  3:34     ` Neil Brown
  2006-08-21 19:54       ` Trond Myklebust
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Brown @ 2006-08-21  3:34 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: nfs, Linux Kernel Mailing List, Trond Myklebust

On Thursday August 17, jesper.juhl@gmail.com wrote:
> On 17/08/06, Neil Brown <neilb@suse.de> wrote:
> > On Tuesday August 8, jesper.juhl@gmail.com wrote:
> > > I have some webservers that have recently started reporting the
> > > following message in their logs :
> > >
> > >   do_vfs_lock: VFS is out of sync with lock manager!
> >
> >
> > I can imagine that happening if you mount with '-o nolocks'.
> > Then a non-blocking lock could cause that message (I think).
> > Can you conform that you aren't using 'nolocks'.
> >
> Confirmed.

Thanks.  I suspected as much but don't like to assume.

I've look more thoroughly at this code and I think the message is
meaningless and can be ignored.

Looking in fs/nfs/file.c (at 2.6.18-rc4-mm1 if it matters, but 2.6.17
is much the same)

 - do_vfs_lock is only called when the filesystem was mounted with
    -o nolock  EXCEPT
 - If a lock request to the server in interrupted (when mounted with
     -o intr) then do_vfs_lock is called to try to get the lock
    locally.  Normally equivalent code will be called inside
    fs/lockd/clntproc.c when the server replies that the lock has been
    gained.  In the case of an interrupt though this doesn't happen
    but the lock may still have happened on the server.  So we record
    locally that the lock was gained, to ensure that it gets unlocked
    when the process exits.

As you don't have '-o nolocks' you must be hitting the second case.
The lock call to the server returns -EINTR or -ERESTARTSYS and
do_vfs_lock is called just-in-case.  
As this is a just-in-case call, it is quite possible that the lock is
held by some other process, so getting an error is entirely possible.
So printing the message in this case seems wrong.

On the other hand, printing the message in any other case seems wrong
too, as server locking is not being used, so there is nothing to get
out of sync with.

As a further complication, I don't think that in the just-in-case
situation that it should risk waiting for the lock.
Now maybe we can be sure there is a pending signal which will break
out of any wait (though I'm worried about -ERESTARTSYS - that doesn't
imply a signal does it?), but I would feel more comfortable if
FL_SLEEP were turned off in that path.

So: Trond:  Any obvious errors in the above?
Is the following patch ok?

NeilBrown


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfs/file.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff .prev/fs/nfs/file.c ./fs/nfs/file.c
--- .prev/fs/nfs/file.c	2006-08-21 13:28:25.000000000 +1000
+++ ./fs/nfs/file.c	2006-08-21 13:30:27.000000000 +1000
@@ -452,9 +452,6 @@ static int do_vfs_lock(struct file *file
 		default:
 			BUG();
 	}
-	if (res < 0)
-		printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n",
-				__FUNCTION__);
 	return res;
 }
 
@@ -504,10 +501,13 @@ static int do_setlk(struct file *filp, i
 		 * we clean up any state on the server. We therefore
 		 * record the lock call as having succeeded in order to
 		 * ensure that locks_remove_posix() cleans it out when
-		 * the process exits.
+		 * the process exits.  Make sure not to sleep if
+		 * someone else holds the lock.
 		 */
-		if (status == -EINTR || status == -ERESTARTSYS)
+		if (status == -EINTR || status == -ERESTARTSYS) {
+			fl->fl_flags &= ~FL_SLEEP;
 			do_vfs_lock(filp, fl);
+		}
 	} else
 		status = do_vfs_lock(filp, fl);
 	unlock_kernel();

 
  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-21  3:34     ` Neil Brown
@ 2006-08-21 19:54       ` Trond Myklebust
  2006-11-21 12:43         ` Jesper Juhl
  2007-01-29  5:08         ` Neil Brown
  0 siblings, 2 replies; 16+ messages in thread
From: Trond Myklebust @ 2006-08-21 19:54 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jesper Juhl, nfs, Linux Kernel Mailing List

On Mon, 2006-08-21 at 13:34 +1000, Neil Brown wrote:
> Looking in fs/nfs/file.c (at 2.6.18-rc4-mm1 if it matters, but 2.6.17
> is much the same)
> 
>  - do_vfs_lock is only called when the filesystem was mounted with
>     -o nolock  EXCEPT
>  - If a lock request to the server in interrupted (when mounted with
>      -o intr) then do_vfs_lock is called to try to get the lock
>     locally.  Normally equivalent code will be called inside
>     fs/lockd/clntproc.c when the server replies that the lock has been
>     gained.  In the case of an interrupt though this doesn't happen
>     but the lock may still have happened on the server.  So we record
>     locally that the lock was gained, to ensure that it gets unlocked
>     when the process exits.
> 
> As you don't have '-o nolocks' you must be hitting the second case.
> The lock call to the server returns -EINTR or -ERESTARTSYS and
> do_vfs_lock is called just-in-case.  
> As this is a just-in-case call, it is quite possible that the lock is
> held by some other process, so getting an error is entirely possible.
> So printing the message in this case seems wrong.
> 
> On the other hand, printing the message in any other case seems wrong
> too, as server locking is not being used, so there is nothing to get
> out of sync with.
> 
> As a further complication, I don't think that in the just-in-case
> situation that it should risk waiting for the lock.
> Now maybe we can be sure there is a pending signal which will break
> out of any wait (though I'm worried about -ERESTARTSYS - that doesn't
> imply a signal does it?), but I would feel more comfortable if
> FL_SLEEP were turned off in that path.
> 
> So: Trond:  Any obvious errors in the above?
> Is the following patch ok?

Could we instead replace it with a dprintk() that returns the value of
"res"? That will keep it useful for debugging purposes.

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-21 19:54       ` Trond Myklebust
@ 2006-11-21 12:43         ` Jesper Juhl
  2006-11-27  9:19           ` Jesper Juhl
  2007-01-29  5:08         ` Neil Brown
  1 sibling, 1 reply; 16+ messages in thread
From: Jesper Juhl @ 2006-11-21 12:43 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Neil Brown, nfs, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3882 bytes --]

On 21/08/06, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> On Mon, 2006-08-21 at 13:34 +1000, Neil Brown wrote:
> > Looking in fs/nfs/file.c (at 2.6.18-rc4-mm1 if it matters, but 2.6.17
> > is much the same)
> >
> >  - do_vfs_lock is only called when the filesystem was mounted with
> >     -o nolock  EXCEPT
> >  - If a lock request to the server in interrupted (when mounted with
> >      -o intr) then do_vfs_lock is called to try to get the lock
> >     locally.  Normally equivalent code will be called inside
> >     fs/lockd/clntproc.c when the server replies that the lock has been
> >     gained.  In the case of an interrupt though this doesn't happen
> >     but the lock may still have happened on the server.  So we record
> >     locally that the lock was gained, to ensure that it gets unlocked
> >     when the process exits.
> >
> > As you don't have '-o nolocks' you must be hitting the second case.
> > The lock call to the server returns -EINTR or -ERESTARTSYS and
> > do_vfs_lock is called just-in-case.
> > As this is a just-in-case call, it is quite possible that the lock is
> > held by some other process, so getting an error is entirely possible.
> > So printing the message in this case seems wrong.
> >
> > On the other hand, printing the message in any other case seems wrong
> > too, as server locking is not being used, so there is nothing to get
> > out of sync with.
> >
> > As a further complication, I don't think that in the just-in-case
> > situation that it should risk waiting for the lock.
> > Now maybe we can be sure there is a pending signal which will break
> > out of any wait (though I'm worried about -ERESTARTSYS - that doesn't
> > imply a signal does it?), but I would feel more comfortable if
> > FL_SLEEP were turned off in that path.
> >
> > So: Trond:  Any obvious errors in the above?
> > Is the following patch ok?
>
> Could we instead replace it with a dprintk() that returns the value of
> "res"? That will keep it useful for debugging purposes.
>

How about the below?
(compile tested only)

Neil: I left your Signed-off-by line since I just modified your patch slightly.

Since Gmail will probably mangle the inline patch, it is attached as well.


Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
--

 fs/nfs/file.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index cc93865..22572af 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -428,8 +428,8 @@ static int do_vfs_lock(struct file *file
                        BUG();
        }
        if (res < 0)
-               printk(KERN_WARNING "%s: VFS is out of sync with lock
manager!\n",
-                               __FUNCTION__);
+               dprintk("%s: VFS is out of sync with lock manager (res
= %d)!\n",
+                               __FUNCTION__, res);
        return res;
 }

@@ -479,10 +479,13 @@ static int do_setlk(struct file *filp, i
                 * we clean up any state on the server. We therefore
                 * record the lock call as having succeeded in order to
                 * ensure that locks_remove_posix() cleans it out when
-                * the process exits.
+                * the process exits. Make sure not to sleep if
+                * someone else holds the lock.
                 */
-               if (status == -EINTR || status == -ERESTARTSYS)
+               if (status == -EINTR || status == -ERESTARTSYS) {
+                       fl->fl_flags &= ~FL_SLEEP;
                        do_vfs_lock(filp, fl);
+               }
        } else
                status = do_vfs_lock(filp, fl);
        unlock_kernel();



-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

[-- Attachment #2: VFS_is_out_of_sync_with_lock_manager.diff --]
[-- Type: text/x-patch, Size: 1176 bytes --]


Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
-- 

 fs/nfs/file.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index cc93865..22572af 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -428,8 +428,8 @@ static int do_vfs_lock(struct file *file
 			BUG();
 	}
 	if (res < 0)
-		printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n",
-				__FUNCTION__);
+		dprintk("%s: VFS is out of sync with lock manager (res = %d)!\n",
+				__FUNCTION__, res);
 	return res;
 }
 
@@ -479,10 +479,13 @@ static int do_setlk(struct file *filp, i
 		 * we clean up any state on the server. We therefore
 		 * record the lock call as having succeeded in order to
 		 * ensure that locks_remove_posix() cleans it out when
-		 * the process exits.
+		 * the process exits. Make sure not to sleep if
+		 * someone else holds the lock.
 		 */
-		if (status == -EINTR || status == -ERESTARTSYS)
+		if (status == -EINTR || status == -ERESTARTSYS) {
+			fl->fl_flags &= ~FL_SLEEP;
 			do_vfs_lock(filp, fl);
+		}
 	} else
 		status = do_vfs_lock(filp, fl);
 	unlock_kernel();


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-11-21 12:43         ` Jesper Juhl
@ 2006-11-27  9:19           ` Jesper Juhl
  0 siblings, 0 replies; 16+ messages in thread
From: Jesper Juhl @ 2006-11-27  9:19 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Neil Brown, nfs, Linux Kernel Mailing List

Any chance we could get the patch below (or something similar) pushed
into 2.6.19?


On 21/11/06, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On 21/08/06, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> > On Mon, 2006-08-21 at 13:34 +1000, Neil Brown wrote:
> > > Looking in fs/nfs/file.c (at 2.6.18-rc4-mm1 if it matters, but 2.6.17
> > > is much the same)
> > >
> > >  - do_vfs_lock is only called when the filesystem was mounted with
> > >     -o nolock  EXCEPT
> > >  - If a lock request to the server in interrupted (when mounted with
> > >      -o intr) then do_vfs_lock is called to try to get the lock
> > >     locally.  Normally equivalent code will be called inside
> > >     fs/lockd/clntproc.c when the server replies that the lock has been
> > >     gained.  In the case of an interrupt though this doesn't happen
> > >     but the lock may still have happened on the server.  So we record
> > >     locally that the lock was gained, to ensure that it gets unlocked
> > >     when the process exits.
> > >
> > > As you don't have '-o nolocks' you must be hitting the second case.
> > > The lock call to the server returns -EINTR or -ERESTARTSYS and
> > > do_vfs_lock is called just-in-case.
> > > As this is a just-in-case call, it is quite possible that the lock is
> > > held by some other process, so getting an error is entirely possible.
> > > So printing the message in this case seems wrong.
> > >
> > > On the other hand, printing the message in any other case seems wrong
> > > too, as server locking is not being used, so there is nothing to get
> > > out of sync with.
> > >
> > > As a further complication, I don't think that in the just-in-case
> > > situation that it should risk waiting for the lock.
> > > Now maybe we can be sure there is a pending signal which will break
> > > out of any wait (though I'm worried about -ERESTARTSYS - that doesn't
> > > imply a signal does it?), but I would feel more comfortable if
> > > FL_SLEEP were turned off in that path.
> > >
> > > So: Trond:  Any obvious errors in the above?
> > > Is the following patch ok?
> >
> > Could we instead replace it with a dprintk() that returns the value of
> > "res"? That will keep it useful for debugging purposes.
> >
>
> How about the below?
> (compile tested only)
>
> Neil: I left your Signed-off-by line since I just modified your patch slightly.
>
> Since Gmail will probably mangle the inline patch, it is attached as well.
>
>
> Signed-off-by: Neil Brown <neilb@suse.de>
> Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
> --
>
>  fs/nfs/file.c |   11 +++++++----
>  1 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index cc93865..22572af 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -428,8 +428,8 @@ static int do_vfs_lock(struct file *file
>                         BUG();
>         }
>         if (res < 0)
> -               printk(KERN_WARNING "%s: VFS is out of sync with lock
> manager!\n",
> -                               __FUNCTION__);
> +               dprintk("%s: VFS is out of sync with lock manager (res
> = %d)!\n",
> +                               __FUNCTION__, res);
>         return res;
>  }
>
> @@ -479,10 +479,13 @@ static int do_setlk(struct file *filp, i
>                  * we clean up any state on the server. We therefore
>                  * record the lock call as having succeeded in order to
>                  * ensure that locks_remove_posix() cleans it out when
> -                * the process exits.
> +                * the process exits. Make sure not to sleep if
> +                * someone else holds the lock.
>                  */
> -               if (status == -EINTR || status == -ERESTARTSYS)
> +               if (status == -EINTR || status == -ERESTARTSYS) {
> +                       fl->fl_flags &= ~FL_SLEEP;
>                         do_vfs_lock(filp, fl);
> +               }
>         } else
>                 status = do_vfs_lock(filp, fl);
>         unlock_kernel();
>



-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2006-08-21 19:54       ` Trond Myklebust
  2006-11-21 12:43         ` Jesper Juhl
@ 2007-01-29  5:08         ` Neil Brown
  2007-01-29 14:16           ` Trond Myklebust
  1 sibling, 1 reply; 16+ messages in thread
From: Neil Brown @ 2007-01-29  5:08 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jesper Juhl, nfs, Linux Kernel Mailing List

On Monday August 21, trond.myklebust@fys.uio.no wrote:
> 
> Could we instead replace it with a dprintk() that returns the value of
> "res"? That will keep it useful for debugging purposes.

(only 5 months later...)

Sure, how about this?

Thanks,
NeilBrown


Remove warning: VFS is out of sync with lock manager.

But keep it as a dprintk

The message can be generated in a quite normal situation:
 If a 'lock' request is interrupted, then the lock client needs to
  record that the server has the lock, incase it does.
 When we come the unlock, the server might say it doesn't, even
  though we think it does (or might) and this generates the message.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/nfs/file.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff .prev/fs/nfs/file.c ./fs/nfs/file.c
--- .prev/fs/nfs/file.c	2007-01-29 16:04:09.000000000 +1100
+++ ./fs/nfs/file.c	2007-01-29 16:04:45.000000000 +1100
@@ -434,8 +434,9 @@ static int do_vfs_lock(struct file *file
 			BUG();
 	}
 	if (res < 0)
-		printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n",
-				__FUNCTION__);
+		dprintk(KERN_WARNING "%s: VFS is out of sync with lock manager"
+			" - error %d!\n",
+				__FUNCTION__, res);
 	return res;
 }
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2007-01-29  5:08         ` Neil Brown
@ 2007-01-29 14:16           ` Trond Myklebust
  2007-01-30 23:42             ` Jesper Juhl
  0 siblings, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2007-01-29 14:16 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jesper Juhl, nfs, Linux Kernel Mailing List

On Mon, 2007-01-29 at 16:08 +1100, Neil Brown wrote:
> On Monday August 21, trond.myklebust@fys.uio.no wrote:
> > 
> > Could we instead replace it with a dprintk() that returns the value of
> > "res"? That will keep it useful for debugging purposes.
> 
> (only 5 months later...)
> 
> Sure, how about this?
> 
> Thanks,
> NeilBrown

ACKed.

Thanks Neil!

Trond

> 
> Remove warning: VFS is out of sync with lock manager.
> 
> But keep it as a dprintk
> 
> The message can be generated in a quite normal situation:
>  If a 'lock' request is interrupted, then the lock client needs to
>   record that the server has the lock, incase it does.
>  When we come the unlock, the server might say it doesn't, even
>   though we think it does (or might) and this generates the message.
> 
> Signed-off-by: Neil Brown <neilb@suse.de>
> 
> ### Diffstat output
>  ./fs/nfs/file.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff .prev/fs/nfs/file.c ./fs/nfs/file.c
> --- .prev/fs/nfs/file.c	2007-01-29 16:04:09.000000000 +1100
> +++ ./fs/nfs/file.c	2007-01-29 16:04:45.000000000 +1100
> @@ -434,8 +434,9 @@ static int do_vfs_lock(struct file *file
>  			BUG();
>  	}
>  	if (res < 0)
> -		printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n",
> -				__FUNCTION__);
> +		dprintk(KERN_WARNING "%s: VFS is out of sync with lock manager"
> +			" - error %d!\n",
> +				__FUNCTION__, res);
>  	return res;
>  }
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2007-01-29 14:16           ` Trond Myklebust
@ 2007-01-30 23:42             ` Jesper Juhl
  2007-02-01 22:39               ` Neil Brown
  0 siblings, 1 reply; 16+ messages in thread
From: Jesper Juhl @ 2007-01-30 23:42 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Neil Brown, nfs, Linux Kernel Mailing List

On 29/01/07, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> On Mon, 2007-01-29 at 16:08 +1100, Neil Brown wrote:
> > On Monday August 21, trond.myklebust@fys.uio.no wrote:
> > >
> > > Could we instead replace it with a dprintk() that returns the value of
> > > "res"? That will keep it useful for debugging purposes.
> >
> > (only 5 months later...)
> >
> > Sure, how about this?
> >
> > Thanks,
> > NeilBrown
>
> ACKed.
>
> Thanks Neil!
>

Finally getting that in will be sooooo nice :-)  Thank you.

Btw: any reason why not to include the
  fl->fl_flags &= ~FL_SLEEP;
bit as well?  As in http://lkml.org/lkml/2006/11/27/41  ??


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager!
  2007-01-30 23:42             ` Jesper Juhl
@ 2007-02-01 22:39               ` Neil Brown
  0 siblings, 0 replies; 16+ messages in thread
From: Neil Brown @ 2007-02-01 22:39 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Trond Myklebust, nfs, Linux Kernel Mailing List

On Wednesday January 31, jesper.juhl@gmail.com wrote:
> On 29/01/07, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> 
> Finally getting that in will be sooooo nice :-)  Thank you.
> 
> Btw: any reason why not to include the
>   fl->fl_flags &= ~FL_SLEEP;
> bit as well?  As in http://lkml.org/lkml/2006/11/27/41  ??
> 

Uhmm... I guess I had forgotten it ....

Looking again, I cannot convince myself that it is needed.
The comment says "If we were signalled ...", and if we were signalled
then do_vfs_lock won't block anyway.  I think I was probably being
over-cautious as I didn't know the significance of testing for
-ERESTARTSYS.
However it seems to only get returned if a signal is pending (so why
EINTR isn't returned I still don't know) so a signal must be pending
on that piece of code, so there is no need to clear FL_SLEEP.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-02-01 22:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-08 14:39 2.6.17.8 - do_vfs_lock: VFS is out of sync with lock manager! Jesper Juhl
2006-08-09  5:53 ` Grant Coady
2006-08-09  8:07   ` Jesper Juhl
2006-08-10 22:37     ` Jesper Juhl
2006-08-11  0:30       ` Grant Coady
2006-08-13 23:08       ` Grant Coady
2006-08-17  6:49 ` [NFS] " Neil Brown
2006-08-17  9:58   ` Jesper Juhl
2006-08-21  3:34     ` Neil Brown
2006-08-21 19:54       ` Trond Myklebust
2006-11-21 12:43         ` Jesper Juhl
2006-11-27  9:19           ` Jesper Juhl
2007-01-29  5:08         ` Neil Brown
2007-01-29 14:16           ` Trond Myklebust
2007-01-30 23:42             ` Jesper Juhl
2007-02-01 22:39               ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).