Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [fuse-devel] Cross-host entry caching and file open/create
       [not found] <DM6PR12MB33857B8B2E49DF0DD0C4F950DD5D0@DM6PR12MB3385.namprd12.prod.outlook.com>
@ 2020-08-21 15:49 ` Miklos Szeredi
  2020-08-26 20:28   ` Ken Schalk
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2020-08-21 15:49 UTC (permalink / raw)
  To: Ken Schalk; +Cc: fuse-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]

On Thu, Aug 20, 2020 at 12:24 AM Ken Schalk <kschalk@nvidia.com> wrote:
>

> If the open flags include O_EXCL, then we're seeing a failure with
> EEXIST without any call to our FUSE filesystem's create operation (or
> any other FUSE operations).  The kernel makes this failure decision
> based on its cached information about the previously accessed (now
> deleted) file.  If the open flags do not include O_EXCL, then we're
> seeing a failure with ENOENT from our open operation (because the file
> does not actually exist anymore), with no call to our create operation
> (because the kernel believed that the file existed, causing it to make
> a FUSE open request rather than a FUSE create request).

Does the attached patch fix it?

Thanks,
Miklos

[-- Attachment #2: fuse-reval-exclusive-create.patch --]
[-- Type: text/x-patch, Size: 508 bytes --]

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 26f028bc760b..ec5552ff9826 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -204,7 +204,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 	if (inode && is_bad_inode(inode))
 		goto invalid;
 	else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
-		 (flags & LOOKUP_REVAL)) {
+		 (flags & (LOOKUP_EXCL | LOOKUP_REVAL))) {
 		struct fuse_entry_out outarg;
 		FUSE_ARGS(args);
 		struct fuse_forget_link *forget;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [fuse-devel] Cross-host entry caching and file open/create
  2020-08-21 15:49 ` [fuse-devel] Cross-host entry caching and file open/create Miklos Szeredi
@ 2020-08-26 20:28   ` Ken Schalk
  2020-08-28 21:01     ` Ken Schalk
  0 siblings, 1 reply; 7+ messages in thread
From: Ken Schalk @ 2020-08-26 20:28 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

On Aug 21, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Aug 20, 2020 at 12:24 AM Ken Schalk <kschalk@nvidia.com> wrote:
> > If the open flags include O_EXCL, then we're seeing a failure with
> > EEXIST without any call to our FUSE filesystem's create operation (or
> > any other FUSE operations).  The kernel makes this failure decision
> > based on its cached information about the previously accessed (now
> > deleted) file.  If the open flags do not include O_EXCL, then we're
> > seeing a failure with ENOENT from our open operation (because the file
> > does not actually exist anymore), with no call to our create operation
> > (because the kernel believed that the file existed, causing it to make
> > a FUSE open request rather than a FUSE create request).

> Does the attached patch fix it?

Thanks very much for your help.  The patch you provided does solve the
problem in the O_CREAT|O_EXCL case (by making a lookup call to
re-validate the entry of the since deleted file), but not in the
O_CREAT case.  (In that case the kernel still winds up making a FUSE
open request rather than a FUSE create request.)  I'd like to suggest
the slightly different attached patch instead, which triggers
re-validation in both cases.

I wonder if maybe re-validation should be triggered on LOOKUP_OPEN as
well, but I suspect that might have more implications I haven't
discovered.

--Ken Schalk

[-- Attachment #2: fuse-reval-create.patch --]
[-- Type: application/octet-stream, Size: 513 bytes --]

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 26f028b..1565e19 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -204,7 +204,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 	if (inode && is_bad_inode(inode))
 		goto invalid;
 	else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
-		 (flags & LOOKUP_REVAL)) {
+                (flags & (LOOKUP_CREATE | LOOKUP_REVAL))) {
 		struct fuse_entry_out outarg;
 		FUSE_ARGS(args);
 		struct fuse_forget_link *forget;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [fuse-devel] Cross-host entry caching and file open/create
  2020-08-26 20:28   ` Ken Schalk
@ 2020-08-28 21:01     ` Ken Schalk
  2020-10-01 18:25       ` Miklos Szeredi
  0 siblings, 1 reply; 7+ messages in thread
From: Ken Schalk @ 2020-08-28 21:01 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2251 bytes --]

On Aug 26, 2020 Ken Schalk <kschalk@nvidia.com> wrote:
> On Aug 21, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote:
> > On Thu, Aug 20, 2020 at 12:24 AM Ken Schalk <kschalk@nvidia.com> wrote:
> > > If the open flags include O_EXCL, then we're seeing a failure
> > > with EEXIST without any call to our FUSE filesystem's create
> > > operation (or any other FUSE operations).  The kernel makes this
> > > failure decision based on its cached information about the
> > > previously accessed (now deleted) file.  If the open flags do
> > > not include O_EXCL, then we're seeing a failure with ENOENT from
> > > our open operation (because the file does not actually exist
> > > anymore), with no call to our create operation (because the
> > > kernel believed that the file existed, causing it to make a FUSE
> > > open request rather than a FUSE create request).

> > Does the attached patch fix it?

> Thanks very much for your help.  The patch you provided does solve
> the problem in the O_CREAT|O_EXCL case (by making a lookup call to
> re-validate the entry of the since deleted file), but not in the
> O_CREAT case.  (In that case the kernel still winds up making a FUSE
> open request rather than a FUSE create request.)  I'd like to
> suggest the slightly different attached patch instead, which
> triggers re-validation in both cases.

I'm going to make one additional suggestion here, included in the
attached patch: always re-validate a negative entry for file open.
This is sort of the dual of the problem of a file recently unlinked on
a remote host, as it involves a file recently created on a remote
host.

If the kernel has cached a negative dentry, it will fail an open
system call without any FUSE requests up to the user-space end.  This
patch gives the user-space side the opportunity to report that a file
that previously did not exist has recently been created.  (In
distributed build automation workloads, opening a file recently
created through a mount on a peer host is pretty common.)  I would say
that this brings the open system call behavior under FUSE into line
with what we observe on NFS for cases where a file has been recently
created or unlinked on a remote host.

--Ken Schalk

[-- Attachment #2: fuse-reval-create-or-open-negative.patch --]
[-- Type: application/octet-stream, Size: 567 bytes --]

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 26f028b..c4d768c 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -204,7 +204,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 	if (inode && is_bad_inode(inode))
 		goto invalid;
 	else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
-		 (flags & LOOKUP_REVAL)) {
+                (!inode && (flags & LOOKUP_OPEN)) ||
+                (flags & (LOOKUP_CREATE | LOOKUP_REVAL))) {
 		struct fuse_entry_out outarg;
 		FUSE_ARGS(args);
 		struct fuse_forget_link *forget;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [fuse-devel] Cross-host entry caching and file open/create
  2020-08-28 21:01     ` Ken Schalk
@ 2020-10-01 18:25       ` Miklos Szeredi
  2020-11-23 21:07         ` Ken Schalk
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2020-10-01 18:25 UTC (permalink / raw)
  To: Ken Schalk; +Cc: fuse-devel, linux-fsdevel

On Fri, Aug 28, 2020 at 11:01 PM Ken Schalk <kschalk@nvidia.com> wrote:
>
> On Aug 26, 2020 Ken Schalk <kschalk@nvidia.com> wrote:
> > On Aug 21, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > On Thu, Aug 20, 2020 at 12:24 AM Ken Schalk <kschalk@nvidia.com> wrote:
> > > > If the open flags include O_EXCL, then we're seeing a failure
> > > > with EEXIST without any call to our FUSE filesystem's create
> > > > operation (or any other FUSE operations).  The kernel makes this
> > > > failure decision based on its cached information about the
> > > > previously accessed (now deleted) file.  If the open flags do
> > > > not include O_EXCL, then we're seeing a failure with ENOENT from
> > > > our open operation (because the file does not actually exist
> > > > anymore), with no call to our create operation (because the
> > > > kernel believed that the file existed, causing it to make a FUSE
> > > > open request rather than a FUSE create request).
>
> > > Does the attached patch fix it?
>
> > Thanks very much for your help.  The patch you provided does solve
> > the problem in the O_CREAT|O_EXCL case (by making a lookup call to
> > re-validate the entry of the since deleted file), but not in the
> > O_CREAT case.  (In that case the kernel still winds up making a FUSE
> > open request rather than a FUSE create request.)  I'd like to
> > suggest the slightly different attached patch instead, which
> > triggers re-validation in both cases.

Which is a problem, because that makes O_CREAT on existing files (a
fairly common case) add a new synchronous request, possibly resulting
in a performance regression.

I don't see an easy way this can be fixed, and I'm not sure this needs
to be fixed.

Are you seeing a real issue with just O_CREAT?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [fuse-devel] Cross-host entry caching and file open/create
  2020-10-01 18:25       ` Miklos Szeredi
@ 2020-11-23 21:07         ` Ken Schalk
  2020-11-24  8:16           ` Miklos Szeredi
  0 siblings, 1 reply; 7+ messages in thread
From: Ken Schalk @ 2020-11-23 21:07 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, linux-fsdevel

On Thu, Oct 1, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Aug 28, 2020 at 11:01 PM Ken Schalk <kschalk@nvidia.com> wrote:
> > > Thanks very much for your help.  The patch you provided does solve 
> > > the problem in the O_CREAT|O_EXCL case (by making a lookup call to 
> > > re-validate the entry of the since deleted file), but not in the 
> > > O_CREAT case.  (In that case the kernel still winds up making a FUSE 
> > > open request rather than a FUSE create request.)  I'd like to 
> > > suggest the slightly different attached patch instead, which 
> > > triggers re-validation in both cases.

> Which is a problem, because that makes O_CREAT on existing files (a
> fairly common case) add a new synchronous request, possibly
> resulting in a performance regression.

> I don't see an easy way this can be fixed, and I'm not sure this
> needs to be fixed.

> Are you seeing a real issue with just O_CREAT?

Yes, we definitely do see issues with just O_CREAT.  The specific
sequence that involves O_CREAT without O_EXCL is:

1. A file exists and is accessed through our FUSE distributed
   filesystem on host X.  The kernel on host X caches the directory
   entry for the file.

2. The file is unlinked through our FUSE distributed filesystem on
   host Y.

3. An open(2) call with O_CREAT for the file occurs on host X.
   Because the kernel has a cached dentry for the now deleted file, it
   makes a FUSE open request to our filesystem (rather than a FUSE
   create request).

4. Our filesystem's open handler finds that the file does not exist
   (because it was unlinked in step 2) and replies to the open request
   with ENOENT.  (The FUSE open handler cannot tell that O_CREAT was
   specified in the flags of the syscall as that bit is not passed
   through in the flags in the FUSE open request, so it can't
   automatically handle this case by creating the file.)

5. The kernel passes the ENOENT error code on as the result of the
   open(2) system call, so the file is not created and not opened.

To me this seems clearly incorrect in terms of observable behavior.
The file does not exist at the point of the open(2) syscall with
O_CREAT in step 3 (although the kernel on host X has not become aware
of its deletion).  The file should be created and opened.  An open(2)
syscall with O_CREAT shouldn't fail with ENOENT because the file does
not exist (which is what happens in this situation currently).

I don't see how to avoid this without some kernel-level change with
acceptable performance.  (We could make the unlink on host Y
synchronously perform an entry invalidation across all other hosts
where our FUSE daemon is running, but that would be a huge performance
problem and a significant addition of complexity in our FUSE daemon.)

I believe that there are at least two other ways to resolve this
without adding the synchronous lookup request on every open syscall
with O_CREAT:

- Preserve the O_CREAT bit in the flags passed through in the FUSE
  open request.  (I believe the place where this bit is maked out is
  in fuse_send_open in fs/fuse/file.c.)  That would allow the FUSE
  open handler to know that file creation was requested and to perform
  the creation in this case.  (I'll mention that this would be similar
  to a behavior we've implemented in our FUSE create handler to open
  an existing file when O_EXCL is not in the flags, which allows
  handling cases where a file was recently created on a remote host.)

- If a FUSE open call fails with ENOENT when O_CREAT is used, have the
  kernel drop the cached dentry and then make a FUSE create request.

Thanks.

--Ken Schalk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [fuse-devel] Cross-host entry caching and file open/create
  2020-11-23 21:07         ` Ken Schalk
@ 2020-11-24  8:16           ` Miklos Szeredi
  2020-12-18  4:39             ` Ken Schalk
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2020-11-24  8:16 UTC (permalink / raw)
  To: Ken Schalk; +Cc: fuse-devel, linux-fsdevel

On Mon, Nov 23, 2020 at 10:07 PM Ken Schalk <kschalk@nvidia.com> wrote:
>
> On Thu, Oct 1, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote:
> > On Fri, Aug 28, 2020 at 11:01 PM Ken Schalk <kschalk@nvidia.com> wrote:
> > > > Thanks very much for your help.  The patch you provided does solve
> > > > the problem in the O_CREAT|O_EXCL case (by making a lookup call to
> > > > re-validate the entry of the since deleted file), but not in the
> > > > O_CREAT case.  (In that case the kernel still winds up making a FUSE
> > > > open request rather than a FUSE create request.)  I'd like to
> > > > suggest the slightly different attached patch instead, which
> > > > triggers re-validation in both cases.
>
> > Which is a problem, because that makes O_CREAT on existing files (a
> > fairly common case) add a new synchronous request, possibly
> > resulting in a performance regression.
>
> > I don't see an easy way this can be fixed, and I'm not sure this
> > needs to be fixed.
>
> > Are you seeing a real issue with just O_CREAT?
>
> Yes, we definitely do see issues with just O_CREAT.  The specific
> sequence that involves O_CREAT without O_EXCL is:
>
> 1. A file exists and is accessed through our FUSE distributed
>    filesystem on host X.  The kernel on host X caches the directory
>    entry for the file.
>
> 2. The file is unlinked through our FUSE distributed filesystem on
>    host Y.
>
> 3. An open(2) call with O_CREAT for the file occurs on host X.
>    Because the kernel has a cached dentry for the now deleted file, it
>    makes a FUSE open request to our filesystem (rather than a FUSE
>    create request).
>
> 4. Our filesystem's open handler finds that the file does not exist
>    (because it was unlinked in step 2) and replies to the open request
>    with ENOENT.

ESTALE is the correct error value here, not ENOENT.   I agree this is
subtle and not well documented, but it's quite logical:  the cache
contains stale lookup information and hence open finds the file
non-existent.  In no case shall the OPEN request return ENOENT, that's
up to the LOOKUP request.

Exclusive create is a different matter.  It must not open an existing
file, and so it must never send an OPEN request.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [fuse-devel] Cross-host entry caching and file open/create
  2020-11-24  8:16           ` Miklos Szeredi
@ 2020-12-18  4:39             ` Ken Schalk
  0 siblings, 0 replies; 7+ messages in thread
From: Ken Schalk @ 2020-12-18  4:39 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, linux-fsdevel

On Tue, Nov 24, 2020 Miklos Szeredi <miklos@szeredi.hu> wrote:
> ESTALE is the correct error value here, not ENOENT.  I agree this is
> subtle and not well documented, but it's quite logical: the cache
> contains stale lookup information and hence open finds the file
> non-existent.

Thank you for explaining the use of ESTALE here.  It definitely does
resolve my O_CREAT without O_EXCL issue in a better way.  I would
agree that this is not well documented, as other than the kernel
source itself the only reference I can find to this is in
Documentation/filesystems/path-lookup.rst (which I admit I had not
read prior to this).

Do you have any recommendation on the portion of my patch relating to
negative directory entries?  This is another area where currently we
don't seem to be able to get precisely the same behavior under FUSE as
under NFS.  Specifically, a file open under NFS when there is a
negative directory entry in the kernel's cache always seems to result
in a re-validating GETATTR RPC to the NFS server (which, if the file
has been recently created, allows the client to become aware of its
existence).  Is there a better way we can achieve this with FUSE than
the change in the other portion of my patch (forcing a lookup call on
any open when there is a negative directory entry for the target of
the open)?

To again be specific, the sequence of events that I'm trying to get to
work differently is:

1. A lookup occurs on host X for name which does not exist, and our
   FUSE daemon's response causes the kernel to cache a negative
   directory entry.  (I know that we can disable kernel-level caching
   of negative lookup results, but we would rather not do so for
   performance.)

2. The name is created as a file through our through our FUSE
   distributed filesystem on host Y.

3. An open(2) call for the recently created file occurs on host X.
   Because the kernel has a cached negative dentry for the file (which
   now exists), the open(2) call fails with ENOENT without any FUSE
   requests.

This comes up in the context of a distributed build system where the
work of building an intermediate file may be dispatched to a remote
host.  Under NFS, the open(2) consults the NFS server and discovers
that the name now exists.

--Ken Schalk


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-12-18  4:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <DM6PR12MB33857B8B2E49DF0DD0C4F950DD5D0@DM6PR12MB3385.namprd12.prod.outlook.com>
2020-08-21 15:49 ` [fuse-devel] Cross-host entry caching and file open/create Miklos Szeredi
2020-08-26 20:28   ` Ken Schalk
2020-08-28 21:01     ` Ken Schalk
2020-10-01 18:25       ` Miklos Szeredi
2020-11-23 21:07         ` Ken Schalk
2020-11-24  8:16           ` Miklos Szeredi
2020-12-18  4:39             ` Ken Schalk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).