LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH 0/3] enhanced ESTALE error handling @ 2008-01-18 15:35 Peter Staubach 2008-01-18 15:46 ` J. Bruce Fields ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Peter Staubach @ 2008-01-18 15:35 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 2923 bytes --] Hi. Here is a patch set which modifies the system to enhance the ESTALE error handling for system calls which take pathnames as arguments. The error, ESTALE, was originally introduced to handle the situation where a file handle, which NFS uses to uniquely identify a file on the server, no longer refers to a valid file on the server. This can happen when the file is removed on the server, either by an application on the server, some other client accessing the server, or sometimes even by another mounted file system from the same client. It can also happen when the file resides upon a file system which is no longer exported. The error, ESTALE, is usually seen when cached directory information is used to convert a pathname to a dentry/inode pair. The information is discovered to be out of date or stale when a subsequent operation is sent to the NFS server. This can easily happen in system calls such as stat(2) when the pathname is converted a dentry/inode pair using cached information, but then a subsequent GETATTR call to the server discovers that the file handle is no longer valid. System calls which take pathnames as arguments should never see ESTALE errors from situations like this. These system calls should either fail with an ENOENT error if the pathname can not be successfully be translated to a dentry/inode pair or succeed or fail based on their own semantics. ESTALE errors which occur during the lookup process can be handled by dropping the dentry which refers to the non-existent file from the dcache and then restarting the lookup process. Care can be taken to ensure that forward progress is always being made in order to avoiding infinite loops. ESTALE errors which occur during operations subsequent to the lookup process can be handled by unwinding appropriately and then performing the lookup process again. Eventually, either the lookup process will succeed or fail correctly or the subsequent operation will succeed or fail on its own merits. This support is desired in order to tighten up recovery from discovering stale resources due to the loose cache consistency semantics that file systems such as NFS employ. In particular, there are several large Red Hat customers, converting from Solaris to Linux, who desire this support in order that their applications environments continue to work. Please note that system calls which do not take pathnames as arguments or perhaps use file descriptors to identify the file to be manipulated may still fail with ESTALE errors. There is no recovery possible with these systems calls like there is with system calls which take pathnames as arguments. This support was tested using the attached programs and running multiple copies on mounted file systems which do not share superblocks. When two or more copies of this program are running, many ESTALE errors can be seen over the network. Comments? Thanx... ps [-- Attachment #2: syscallgen.c --] [-- Type: text/x-csrc, Size: 15004 bytes --] # #define _XOPEN_SOURCE 500 #define _LARGEFILE64_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <sys/statfs.h> #include <sys/inotify.h> #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> void mkdir_test(void); void link_test(void); void open_test(void); void access_test(void); void chmod_test(void); void chown_test(void); void readlink_test(void); void utimes_test(void); void chdir_test(void); void chroot_test(void); void rename_test(void); void exec_test(void); void mknod_test(void); void statfs_test(void); void truncate_test(void); void xattr_test(void); void inotify_test(void); struct tests { void (*test)(void); }; struct tests tests[] = { mkdir_test, link_test, open_test, access_test, chmod_test, chown_test, readlink_test, utimes_test, chdir_test, chroot_test, rename_test, exec_test, mknod_test, statfs_test, truncate_test, xattr_test, inotify_test }; pid_t test_pids[sizeof(tests) / sizeof(tests[0])]; pid_t parent_pid; void kill_tests(int); int main(int argc, char *argv[]) { int i; parent_pid = getpid(); sigset(SIGINT, kill_tests); sighold(SIGINT); for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { test_pids[i] = fork(); if (test_pids[i] == 0) { for (;;) (*tests[i].test)(); /* NOTREACHED */ } } sigrelse(SIGINT); pause(); } void kill_tests(int sig) { int i; for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { if (test_pids[i] != -1) { if (kill(test_pids[i], SIGTERM) < 0) perror("kill"); } } exit(0); } void check_error(int error, char *operation) { if (error < 0 && errno == ESTALE) { perror(operation); kill(parent_pid, SIGINT); pause(); } } void check_error_child(int error, char *operation) { if (error < 0 && errno == ESTALE) { perror(operation); kill(parent_pid, SIGINT); exit(1); } } void do_stats(char *file) { int error; struct stat stbuf; struct stat64 stbuf64; error = stat(file, &stbuf); check_error(error, "stat"); error = stat64(file, &stbuf64); check_error(error, "stat64"); error = lstat(file, &stbuf); check_error(error, "lstat"); error = lstat64(file, &stbuf64); check_error(error, "lstat64"); } void do_stats_child(char *file) { int error; struct stat stbuf; struct stat64 stbuf64; error = stat(file, &stbuf); check_error_child(error, "stat"); error = stat64(file, &stbuf64); check_error_child(error, "stat64"); error = lstat(file, &stbuf); check_error_child(error, "lstat"); error = lstat64(file, &stbuf64); check_error_child(error, "lstat64"); } char *mkdir_dirs[] = { "mkdir/a", "mkdir/a/b", "mkdir/a/b/c", "mkdir/a/b/c/d", "mkdir/a/b/c/d/e", "mkdir/a/b/c/d/e/f", "mkdir/a/b/c/d/e/f/g", "mkdir/a/b/c/d/e/f/g/h", "mkdir/a/b/c/d/e/f/g/h/i", "mkdir/a/b/c/d/e/f/g/h/i/j", "mkdir/a/b/c/d/e/f/g/h/i/j/k", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z", NULL }; void mkdir_test() { int i; int error; error = mkdir("mkdir", 0755); check_error(error, "mkdir"); for (i = 0; mkdir_dirs[i] != NULL; i++) { error = mkdir(mkdir_dirs[i], 0755); check_error(error, "mkdir"); do_stats(mkdir_dirs[i]); } while (--i >= 0) { do_stats(mkdir_dirs[i]); error = rmdir(mkdir_dirs[i]); check_error(error, "rmdir"); } error = rmdir("mkdir"); check_error(error, "rmdir"); } char *link_file_a = "link/a"; char *link_file_b = "link/b"; void link_test() { int error; int fd; error = mkdir("link", 0755); check_error(error, "mkdir"); fd = open(link_file_a, O_CREAT, 0644); check_error(fd, "open"); (void) close(fd); do_stats(link_file_a); error = link(link_file_a, link_file_b); check_error(error, "link"); do_stats(link_file_a); do_stats(link_file_b); error = unlink(link_file_a); check_error(error, "unlink"); do_stats(link_file_a); do_stats(link_file_b); error = link(link_file_b, link_file_a); check_error(error, "link"); do_stats(link_file_a); do_stats(link_file_b); error = unlink(link_file_b); check_error(error, "unlink"); do_stats(link_file_a); do_stats(link_file_b); error = unlink(link_file_a); check_error(error, "unlink"); do_stats(link_file_a); do_stats(link_file_b); error = rmdir("link"); check_error(error, "rmdir"); } char *open_file = "open/a"; void open_test() { int error; int fd; error = mkdir("open", 0755); check_error(error, "mkdir"); fd = open(open_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(open_file); fd = open(open_file, O_RDWR); check_error(fd, "open: O_RDWR"); (void) close(fd); do_stats(open_file); error = unlink(open_file); check_error(error, "unlink"); error = rmdir("open"); check_error(error, "rmdir"); } char *access_file = "access/a"; void access_test() { int error; int fd; error = mkdir("access", 0755); check_error(error, "mkdir"); fd = open(access_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(access_file); error = access(access_file, F_OK); check_error(error, "access"); do_stats(access_file); error = unlink(access_file); check_error(error, "unlink"); error = rmdir("access"); check_error(error, "rmdir"); } char *chmod_file = "chmod/a"; void chmod_test() { int error; int fd; error = mkdir("chmod", 0755); check_error(error, "mkdir"); fd = open(chmod_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(chmod_file); error = chmod(chmod_file, 0600); check_error(error, "chmod"); do_stats(chmod_file); error = unlink(chmod_file); check_error(error, "unlink"); error = rmdir("chmod"); check_error(error, "rmdir"); } char *chown_file = "chown/a"; void chown_test() { int error; int fd; error = mkdir("chown", 0755); check_error(error, "mkdir"); fd = open(chown_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(chown_file); error = chown(chown_file, 4597, 4597); check_error(error, "chown"); do_stats(chown_file); error = lchown(chown_file, 4596, 4596); check_error(error, "lchown"); do_stats(chown_file); error = unlink(chown_file); check_error(error, "unlink"); error = rmdir("chown"); check_error(error, "rmdir"); } char *readlink_file = "readlink/a"; void readlink_test() { int error; char buf[BUFSIZ]; error = mkdir("readlink", 0755); check_error(error, "mkdir"); error = symlink("b", readlink_file); check_error(error, "symlink"); do_stats(readlink_file); error = readlink(readlink_file, buf, sizeof(buf)); check_error(error, "readlink"); do_stats(readlink_file); error = unlink(readlink_file); check_error(error, "unlink"); error = rmdir("readlink"); check_error(error, "rmdir"); } char *utimes_file = "utimes/a"; void utimes_test() { int error; int fd; error = mkdir("utimes", 0755); check_error(error, "mkdir"); fd = open(utimes_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(utimes_file); error = utime(utimes_file, NULL); check_error(error, "utime"); do_stats(utimes_file); error = utimes(utimes_file, NULL); check_error(error, "utimes"); do_stats(utimes_file); error = unlink(utimes_file); check_error(error, "unlink"); error = rmdir("utimes"); check_error(error, "rmdir"); } char *chdir_dir = "chdir/dir"; void chdir_test() { int error; int pid; int status; error = mkdir("chdir", 0755); check_error(error, "mkdir"); pid = fork(); if (pid == 0) { error = mkdir(chdir_dir, 0755); check_error_child(error, "mkdir"); do_stats_child(chdir_dir); error = chdir(chdir_dir); check_error_child(error, "chdir"); do_stats_child(chdir_dir); exit(0); } (void) wait(&status); do_stats(chdir_dir); error = rmdir(chdir_dir); check_error(error, "rmdir"); error = rmdir("chdir"); check_error(error, "rmdir"); } char *chroot_dir = "chroot/dir"; void chroot_test() { int error; int pid; int status; error = mkdir("chroot", 0755); check_error(error, "mkdir"); pid = fork(); if (pid == 0) { error = mkdir(chroot_dir, 0755); check_error_child(error, "mkdir"); do_stats_child(chroot_dir); error = chroot(chroot_dir); check_error_child(error, "chroot"); do_stats_child(chroot_dir); exit(0); } (void) wait(&status); do_stats(chroot_dir); error = rmdir(chroot_dir); check_error(error, "rmdir"); error = rmdir("chroot"); check_error(error, "rmdir"); } char *rename_file_a = "rename/a"; char *rename_file_b = "rename/b"; void rename_test() { int error; int fd; error = mkdir("rename", 0755); check_error(error, "mkdir"); fd = open(rename_file_a, O_CREAT, 0644); check_error(fd, "open"); (void) close(fd); do_stats(rename_file_a); error = rename(rename_file_a, rename_file_b); check_error(error, "rename"); do_stats(rename_file_a); do_stats(rename_file_b); error = rename(rename_file_b, rename_file_a); check_error(error, "rename"); do_stats(rename_file_a); do_stats(rename_file_b); error = unlink(rename_file_a); check_error(error, "unlink"); error = rmdir("rename"); check_error(error, "rmdir"); } char *exec_file = "exec/a"; char *exec_source_file = "exec_test"; void exec_test() { int error; int pid; int status; error = mkdir("exec", 0755); check_error(error, "mkdir"); error = link(exec_source_file, exec_file); check_error(error, "link"); do_stats(exec_file); pid = fork(); if (pid == 0) { error = execl(exec_file, exec_file, NULL); check_error_child(error, "execl"); exit(1); } wait(&status); do_stats(exec_file); error = unlink(exec_file); check_error(error, "unlink"); error = rmdir("exec"); check_error(error, "rmdir"); } char *mknod_file = "mknod/a"; void mknod_test() { int error; error = mkdir("mknod", 0755); check_error(error, "mkdir"); error = mknod(mknod_file, S_IFCHR | 0644, 0); check_error(error, "mknod"); do_stats(mknod_file); error = unlink(mknod_file); check_error(error, "unlink"); error = rmdir("mknod"); check_error(error, "rmdir"); } void statfs_test() { int error; struct statfs stbuf; struct statfs64 stbuf64; error = mkdir("statfs", 0755); check_error(error, "mkdir"); do_stats("statfs"); error = statfs("statfs", &stbuf); check_error(error, "statfs"); error = statfs64("statfs", &stbuf64); check_error(error, "statfs64"); error = rmdir("statfs"); check_error(error, "rmdir"); } char *truncate_file = "truncate/a"; void truncate_test() { int error; int fd; error = mkdir("truncate", 0755); check_error(error, "mkdir"); fd = open(truncate_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(truncate_file); error = truncate(truncate_file, 1024); check_error(error, "truncate"); do_stats(truncate_file); error = unlink(truncate_file); check_error(error, "unlink"); error = rmdir("truncate"); check_error(error, "rmdir"); } char *xattr_file = "xattr/a"; #define ACL_USER_OBJ (0x01) #define ACL_USER (0x02) #define ACL_GROUP_OBJ (0x04) #define ACL_MASK (0x10) #define ACL_OTHER (0x20) struct posix_acl_xattr_entry { unsigned short e_tag; unsigned short e_perm; unsigned int e_id; }; #define POSIX_ACL_XATTR_VERSION 0x0002 struct posix_acl_xattr_header { unsigned int a_version; struct posix_acl_xattr_entry a_entries[5]; }; void xattr_test() { int error; int fd; char buf[1024]; struct posix_acl_xattr_header ents; error = mkdir("xattr", 0755); check_error(error, "mkdir"); fd = open(xattr_file, O_CREAT | O_RDWR, 0444); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(xattr_file); error = getxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "getxattr"); error = lgetxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "lgetxattr"); ents.a_version = POSIX_ACL_XATTR_VERSION; ents.a_entries[0].e_tag = ACL_USER_OBJ; ents.a_entries[0].e_perm = 06; ents.a_entries[0].e_id = -1; ents.a_entries[1].e_tag = ACL_USER; ents.a_entries[1].e_perm = 06; ents.a_entries[1].e_id = 10; ents.a_entries[2].e_tag = ACL_GROUP_OBJ; ents.a_entries[2].e_perm = 06; ents.a_entries[2].e_id = -1; ents.a_entries[3].e_tag = ACL_MASK; ents.a_entries[3].e_perm = 06; ents.a_entries[3].e_id = -1; ents.a_entries[4].e_tag = ACL_OTHER; ents.a_entries[4].e_perm = 06; ents.a_entries[4].e_id = -1; error = setxattr(xattr_file, "system.posix_acl_access", &ents, sizeof (ents), 0); check_error(error, "setxattr"); do_stats(xattr_file); error = lsetxattr(xattr_file, "system.posix_acl_access", &ents, sizeof (ents), 0); check_error(error, "lsetxattr"); do_stats(xattr_file); error = getxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "getxattr"); error = lgetxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "lgetxattr"); error = listxattr(xattr_file, buf, sizeof (buf)); check_error(error, "listxattr"); error = llistxattr(xattr_file, buf, sizeof (buf)); check_error(error, "llistxattr"); error = removexattr(xattr_file, "system.posix_acl_access"); check_error(error, "removexattr"); do_stats(xattr_file); error = setxattr(xattr_file, "system.posix_acl_access", &ents, sizeof (ents), 0); check_error(error, "setxattr"); do_stats(xattr_file); error = lremovexattr(xattr_file, "system.posix_acl_access"); check_error(error, "lremovexattr"); do_stats(xattr_file); error = unlink(xattr_file); check_error(error, "unlink"); error = rmdir("xattr"); check_error(error, "rmdir"); } char *inotify_file = "inotify/a"; void inotify_test() { int error; int fd; int wd; error = mkdir("inotify", 0755); check_error(error, "mkdir"); fd = open(inotify_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(inotify_file); fd = inotify_init(); check_error(error, "inotify_init"); do_stats(inotify_file); wd = inotify_add_watch(fd, inotify_file, IN_ALL_EVENTS); check_error(wd, "inotify_add_watch"); do_stats(inotify_file); error = inotify_rm_watch(fd, wd); check_error(error, "inotify_rm_watch"); (void) close(fd); do_stats(inotify_file); error = unlink(inotify_file); check_error(error, "unlink"); error = rmdir("inotify"); check_error(error, "rmdir"); } [-- Attachment #3: exec_test.c --] [-- Type: text/x-csrc, Size: 42 bytes --] #include <stdlib.h> main() { exit(0); } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 15:35 [PATCH 0/3] enhanced ESTALE error handling Peter Staubach @ 2008-01-18 15:46 ` J. Bruce Fields 2008-01-18 16:41 ` Chuck Lever 2008-02-01 20:57 ` [PATCH 0/3] enhanced ESTALE error handling (v2) Peter Staubach 2 siblings, 0 replies; 14+ messages in thread From: J. Bruce Fields @ 2008-01-18 15:46 UTC (permalink / raw) To: Peter Staubach Cc: Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel On Fri, Jan 18, 2008 at 10:35:50AM -0500, Peter Staubach wrote: > Hi. > > Here is a patch set which modifies the system to enhance the > ESTALE error handling for system calls which take pathnames > as arguments. I think your cover letter may be bigger than any of the actual patches.... I'm not complaining! But would it be worth adding this explanation and test code to Documentation/filesystems/ just to keep it around? --b. > > The error, ESTALE, was originally introduced to handle the > situation where a file handle, which NFS uses to uniquely > identify a file on the server, no longer refers to a valid file > on the server. This can happen when the file is removed on the > server, either by an application on the server, some other > client accessing the server, or sometimes even by another > mounted file system from the same client. It can also happen > when the file resides upon a file system which is no longer > exported. > > The error, ESTALE, is usually seen when cached directory > information is used to convert a pathname to a dentry/inode pair. > The information is discovered to be out of date or stale when a > subsequent operation is sent to the NFS server. This can easily > happen in system calls such as stat(2) when the pathname is > converted a dentry/inode pair using cached information, but then > a subsequent GETATTR call to the server discovers that the file > handle is no longer valid. > > System calls which take pathnames as arguments should never see > ESTALE errors from situations like this. These system calls > should either fail with an ENOENT error if the pathname can not > be successfully be translated to a dentry/inode pair or succeed > or fail based on their own semantics. > > ESTALE errors which occur during the lookup process can be > handled by dropping the dentry which refers to the non-existent > file from the dcache and then restarting the lookup process. > Care can be taken to ensure that forward progress is always > being made in order to avoiding infinite loops. > > ESTALE errors which occur during operations subsequent to the > lookup process can be handled by unwinding appropriately and > then performing the lookup process again. Eventually, either > the lookup process will succeed or fail correctly or the > subsequent operation will succeed or fail on its own merits. > > This support is desired in order to tighten up recovery from > discovering stale resources due to the loose cache consistency > semantics that file systems such as NFS employ. In particular, > there are several large Red Hat customers, converting from > Solaris to Linux, who desire this support in order that their > applications environments continue to work. > > Please note that system calls which do not take pathnames as > arguments or perhaps use file descriptors to identify the > file to be manipulated may still fail with ESTALE errors. > There is no recovery possible with these systems calls like > there is with system calls which take pathnames as arguments. > > This support was tested using the attached programs and > running multiple copies on mounted file systems which do not > share superblocks. When two or more copies of this program > are running, many ESTALE errors can be seen over the network. > > Comments? > > Thanx... > > ps > # > #define _XOPEN_SOURCE 500 > #define _LARGEFILE64_SOURCE > #include <sys/types.h> > #include <sys/stat.h> > #include <sys/statfs.h> > #include <sys/inotify.h> > #include <errno.h> > #include <fcntl.h> > #include <stdio.h> > #include <stdlib.h> > #include <unistd.h> > #include <signal.h> > > void mkdir_test(void); > void link_test(void); > void open_test(void); > void access_test(void); > void chmod_test(void); > void chown_test(void); > void readlink_test(void); > void utimes_test(void); > void chdir_test(void); > void chroot_test(void); > void rename_test(void); > void exec_test(void); > void mknod_test(void); > void statfs_test(void); > void truncate_test(void); > void xattr_test(void); > void inotify_test(void); > > struct tests { > void (*test)(void); > }; > > struct tests tests[] = { > mkdir_test, > link_test, > open_test, > access_test, > chmod_test, > chown_test, > readlink_test, > utimes_test, > chdir_test, > chroot_test, > rename_test, > exec_test, > mknod_test, > statfs_test, > truncate_test, > xattr_test, > inotify_test > }; > > pid_t test_pids[sizeof(tests) / sizeof(tests[0])]; > > pid_t parent_pid; > > void kill_tests(int); > > int > main(int argc, char *argv[]) > { > int i; > > parent_pid = getpid(); > > sigset(SIGINT, kill_tests); > > sighold(SIGINT); > > for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { > test_pids[i] = fork(); > if (test_pids[i] == 0) { > for (;;) > (*tests[i].test)(); > /* NOTREACHED */ > } > } > > sigrelse(SIGINT); > > pause(); > } > > void > kill_tests(int sig) > { > int i; > > for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { > if (test_pids[i] != -1) { > if (kill(test_pids[i], SIGTERM) < 0) > perror("kill"); > } > } > > exit(0); > } > > void > check_error(int error, char *operation) > { > > if (error < 0 && errno == ESTALE) { > perror(operation); > kill(parent_pid, SIGINT); > pause(); > } > } > > void > check_error_child(int error, char *operation) > { > > if (error < 0 && errno == ESTALE) { > perror(operation); > kill(parent_pid, SIGINT); > exit(1); > } > } > > void > do_stats(char *file) > { > int error; > struct stat stbuf; > struct stat64 stbuf64; > > error = stat(file, &stbuf); > check_error(error, "stat"); > > error = stat64(file, &stbuf64); > check_error(error, "stat64"); > > error = lstat(file, &stbuf); > check_error(error, "lstat"); > > error = lstat64(file, &stbuf64); > check_error(error, "lstat64"); > } > > void > do_stats_child(char *file) > { > int error; > struct stat stbuf; > struct stat64 stbuf64; > > error = stat(file, &stbuf); > check_error_child(error, "stat"); > > error = stat64(file, &stbuf64); > check_error_child(error, "stat64"); > > error = lstat(file, &stbuf); > check_error_child(error, "lstat"); > > error = lstat64(file, &stbuf64); > check_error_child(error, "lstat64"); > } > > char *mkdir_dirs[] = { > "mkdir/a", > "mkdir/a/b", > "mkdir/a/b/c", > "mkdir/a/b/c/d", > "mkdir/a/b/c/d/e", > "mkdir/a/b/c/d/e/f", > "mkdir/a/b/c/d/e/f/g", > "mkdir/a/b/c/d/e/f/g/h", > "mkdir/a/b/c/d/e/f/g/h/i", > "mkdir/a/b/c/d/e/f/g/h/i/j", > "mkdir/a/b/c/d/e/f/g/h/i/j/k", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y", > "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z", > NULL > }; > > void > mkdir_test() > { > int i; > int error; > > error = mkdir("mkdir", 0755); > check_error(error, "mkdir"); > > for (i = 0; mkdir_dirs[i] != NULL; i++) { > error = mkdir(mkdir_dirs[i], 0755); > check_error(error, "mkdir"); > do_stats(mkdir_dirs[i]); > } > > while (--i >= 0) { > do_stats(mkdir_dirs[i]); > error = rmdir(mkdir_dirs[i]); > check_error(error, "rmdir"); > } > > error = rmdir("mkdir"); > check_error(error, "rmdir"); > } > > char *link_file_a = "link/a"; > char *link_file_b = "link/b"; > > void > link_test() > { > int error; > int fd; > > error = mkdir("link", 0755); > check_error(error, "mkdir"); > > fd = open(link_file_a, O_CREAT, 0644); > check_error(fd, "open"); > > (void) close(fd); > > do_stats(link_file_a); > > error = link(link_file_a, link_file_b); > check_error(error, "link"); > do_stats(link_file_a); > do_stats(link_file_b); > > error = unlink(link_file_a); > check_error(error, "unlink"); > do_stats(link_file_a); > do_stats(link_file_b); > > error = link(link_file_b, link_file_a); > check_error(error, "link"); > do_stats(link_file_a); > do_stats(link_file_b); > > error = unlink(link_file_b); > check_error(error, "unlink"); > do_stats(link_file_a); > do_stats(link_file_b); > > error = unlink(link_file_a); > check_error(error, "unlink"); > do_stats(link_file_a); > do_stats(link_file_b); > > error = rmdir("link"); > check_error(error, "rmdir"); > } > > char *open_file = "open/a"; > > void > open_test() > { > int error; > int fd; > > error = mkdir("open", 0755); > check_error(error, "mkdir"); > > fd = open(open_file, O_CREAT | O_RDWR, 0644); > check_error(fd, "open: O_CREAT"); > > (void) close(fd); > > do_stats(open_file); > > fd = open(open_file, O_RDWR); > check_error(fd, "open: O_RDWR"); > > (void) close(fd); > > do_stats(open_file); > > error = unlink(open_file); > check_error(error, "unlink"); > > error = rmdir("open"); > check_error(error, "rmdir"); > } > > char *access_file = "access/a"; > > void > access_test() > { > int error; > int fd; > > error = mkdir("access", 0755); > check_error(error, "mkdir"); > > fd = open(access_file, O_CREAT | O_RDWR, 0644); > check_error(fd, "open: O_CREAT"); > > (void) close(fd); > > do_stats(access_file); > > error = access(access_file, F_OK); > check_error(error, "access"); > > do_stats(access_file); > > error = unlink(access_file); > check_error(error, "unlink"); > > error = rmdir("access"); > check_error(error, "rmdir"); > } > > char *chmod_file = "chmod/a"; > > void > chmod_test() > { > int error; > int fd; > > error = mkdir("chmod", 0755); > check_error(error, "mkdir"); > > fd = open(chmod_file, O_CREAT | O_RDWR, 0644); > check_error(fd, "open: O_CREAT"); > > (void) close(fd); > > do_stats(chmod_file); > > error = chmod(chmod_file, 0600); > check_error(error, "chmod"); > > do_stats(chmod_file); > > error = unlink(chmod_file); > check_error(error, "unlink"); > > error = rmdir("chmod"); > check_error(error, "rmdir"); > } > > char *chown_file = "chown/a"; > > void > chown_test() > { > int error; > int fd; > > error = mkdir("chown", 0755); > check_error(error, "mkdir"); > > fd = open(chown_file, O_CREAT | O_RDWR, 0644); > check_error(fd, "open: O_CREAT"); > > (void) close(fd); > > do_stats(chown_file); > > error = chown(chown_file, 4597, 4597); > check_error(error, "chown"); > > do_stats(chown_file); > > error = lchown(chown_file, 4596, 4596); > check_error(error, "lchown"); > > do_stats(chown_file); > > error = unlink(chown_file); > check_error(error, "unlink"); > > error = rmdir("chown"); > check_error(error, "rmdir"); > } > > char *readlink_file = "readlink/a"; > > void > readlink_test() > { > int error; > char buf[BUFSIZ]; > > error = mkdir("readlink", 0755); > check_error(error, "mkdir"); > > error = symlink("b", readlink_file); > check_error(error, "symlink"); > > do_stats(readlink_file); > > error = readlink(readlink_file, buf, sizeof(buf)); > check_error(error, "readlink"); > > do_stats(readlink_file); > > error = unlink(readlink_file); > check_error(error, "unlink"); > > error = rmdir("readlink"); > check_error(error, "rmdir"); > } > > char *utimes_file = "utimes/a"; > > void > utimes_test() > { > int error; > int fd; > > error = mkdir("utimes", 0755); > check_error(error, "mkdir"); > > fd = open(utimes_file, O_CREAT | O_RDWR, 0644); > check_error(fd, "open: O_CREAT"); > > (void) close(fd); > > do_stats(utimes_file); > > error = utime(utimes_file, NULL); > check_error(error, "utime"); > > do_stats(utimes_file); > > error = utimes(utimes_file, NULL); > check_error(error, "utimes"); > > do_stats(utimes_file); > > error = unlink(utimes_file); > check_error(error, "unlink"); > > error = rmdir("utimes"); > check_error(error, "rmdir"); > } > > char *chdir_dir = "chdir/dir"; > > void > chdir_test() > { > int error; > int pid; > int status; > > error = mkdir("chdir", 0755); > check_error(error, "mkdir"); > > pid = fork(); > if (pid == 0) { > error = mkdir(chdir_dir, 0755); > check_error_child(error, "mkdir"); > > do_stats_child(chdir_dir); > > error = chdir(chdir_dir); > check_error_child(error, "chdir"); > > do_stats_child(chdir_dir); > > exit(0); > } > > (void) wait(&status); > > do_stats(chdir_dir); > > error = rmdir(chdir_dir); > check_error(error, "rmdir"); > > error = rmdir("chdir"); > check_error(error, "rmdir"); > } > > char *chroot_dir = "chroot/dir"; > > void > chroot_test() > { > int error; > int pid; > int status; > > error = mkdir("chroot", 0755); > check_error(error, "mkdir"); > > pid = fork(); > if (pid == 0) { > error = mkdir(chroot_dir, 0755); > check_error_child(error, "mkdir"); > > do_stats_child(chroot_dir); > > error = chroot(chroot_dir); > check_error_child(error, "chroot"); > > do_stats_child(chroot_dir); > > exit(0); > } > > (void) wait(&status); > > do_stats(chroot_dir); > > error = rmdir(chroot_dir); > check_error(error, "rmdir"); > > error = rmdir("chroot"); > check_error(error, "rmdir"); > } > > char *rename_file_a = "rename/a"; > char *rename_file_b = "rename/b"; > > void > rename_test() > { > int error; > int fd; > > error = mkdir("rename", 0755); > check_error(error, "mkdir"); > > fd = open(rename_file_a, O_CREAT, 0644); > check_error(fd, "open"); > > (void) close(fd); > > do_stats(rename_file_a); > > error = rename(rename_file_a, rename_file_b); > check_error(error, "rename"); > > do_stats(rename_file_a); > do_stats(rename_file_b); > > error = rename(rename_file_b, rename_file_a); > check_error(error, "rename"); > > do_stats(rename_file_a); > do_stats(rename_file_b); > > error = unlink(rename_file_a); > check_error(error, "unlink"); > > error = rmdir("rename"); > check_error(error, "rmdir"); > } > > char *exec_file = "exec/a"; > char *exec_source_file = "exec_test"; > > void > exec_test() > { > int error; > int pid; > int status; > > error = mkdir("exec", 0755); > check_error(error, "mkdir"); > > error = link(exec_source_file, exec_file); > check_error(error, "link"); > do_stats(exec_file); > > pid = fork(); > if (pid == 0) { > error = execl(exec_file, exec_file, NULL); > check_error_child(error, "execl"); > > exit(1); > } > > wait(&status); > > do_stats(exec_file); > > error = unlink(exec_file); > check_error(error, "unlink"); > > error = rmdir("exec"); > check_error(error, "rmdir"); > } > > char *mknod_file = "mknod/a"; > > void > mknod_test() > { > int error; > > error = mkdir("mknod", 0755); > check_error(error, "mkdir"); > > error = mknod(mknod_file, S_IFCHR | 0644, 0); > check_error(error, "mknod"); > > do_stats(mknod_file); > > error = unlink(mknod_file); > check_error(error, "unlink"); > > error = rmdir("mknod"); > check_error(error, "rmdir"); > } > > void > statfs_test() > { > int error; > struct statfs stbuf; > struct statfs64 stbuf64; > > error = mkdir("statfs", 0755); > check_error(error, "mkdir"); > > do_stats("statfs"); > > error = statfs("statfs", &stbuf); > check_error(error, "statfs"); > > error = statfs64("statfs", &stbuf64); > check_error(error, "statfs64"); > > error = rmdir("statfs"); > check_error(error, "rmdir"); > } > > char *truncate_file = "truncate/a"; > > void > truncate_test() > { > int error; > int fd; > > error = mkdir("truncate", 0755); > check_error(error, "mkdir"); > > fd = open(truncate_file, O_CREAT | O_RDWR, 0644); > check_error(fd, "open: O_CREAT"); > > (void) close(fd); > > do_stats(truncate_file); > > error = truncate(truncate_file, 1024); > check_error(error, "truncate"); > > do_stats(truncate_file); > > error = unlink(truncate_file); > check_error(error, "unlink"); > > error = rmdir("truncate"); > check_error(error, "rmdir"); > } > > char *xattr_file = "xattr/a"; > > #define ACL_USER_OBJ (0x01) > #define ACL_USER (0x02) > #define ACL_GROUP_OBJ (0x04) > #define ACL_MASK (0x10) > #define ACL_OTHER (0x20) > > struct posix_acl_xattr_entry { > unsigned short e_tag; > unsigned short e_perm; > unsigned int e_id; > }; > > #define POSIX_ACL_XATTR_VERSION 0x0002 > > struct posix_acl_xattr_header { > unsigned int a_version; > struct posix_acl_xattr_entry a_entries[5]; > }; > > void > xattr_test() > { > int error; > int fd; > char buf[1024]; > struct posix_acl_xattr_header ents; > > error = mkdir("xattr", 0755); > check_error(error, "mkdir"); > > fd = open(xattr_file, O_CREAT | O_RDWR, 0444); > check_error(fd, "open: O_CREAT"); > > (void) close(fd); > > do_stats(xattr_file); > > error = getxattr(xattr_file, "system.posix_acl_access", buf, > sizeof (buf)); > check_error(error, "getxattr"); > error = lgetxattr(xattr_file, "system.posix_acl_access", buf, > sizeof (buf)); > check_error(error, "lgetxattr"); > > ents.a_version = POSIX_ACL_XATTR_VERSION; > ents.a_entries[0].e_tag = ACL_USER_OBJ; > ents.a_entries[0].e_perm = 06; > ents.a_entries[0].e_id = -1; > ents.a_entries[1].e_tag = ACL_USER; > ents.a_entries[1].e_perm = 06; > ents.a_entries[1].e_id = 10; > ents.a_entries[2].e_tag = ACL_GROUP_OBJ; > ents.a_entries[2].e_perm = 06; > ents.a_entries[2].e_id = -1; > ents.a_entries[3].e_tag = ACL_MASK; > ents.a_entries[3].e_perm = 06; > ents.a_entries[3].e_id = -1; > ents.a_entries[4].e_tag = ACL_OTHER; > ents.a_entries[4].e_perm = 06; > ents.a_entries[4].e_id = -1; > > error = setxattr(xattr_file, "system.posix_acl_access", > &ents, sizeof (ents), 0); > check_error(error, "setxattr"); > > do_stats(xattr_file); > > error = lsetxattr(xattr_file, "system.posix_acl_access", > &ents, sizeof (ents), 0); > check_error(error, "lsetxattr"); > > do_stats(xattr_file); > > error = getxattr(xattr_file, "system.posix_acl_access", buf, > sizeof (buf)); > check_error(error, "getxattr"); > error = lgetxattr(xattr_file, "system.posix_acl_access", buf, > sizeof (buf)); > check_error(error, "lgetxattr"); > > error = listxattr(xattr_file, buf, sizeof (buf)); > check_error(error, "listxattr"); > error = llistxattr(xattr_file, buf, sizeof (buf)); > check_error(error, "llistxattr"); > > error = removexattr(xattr_file, "system.posix_acl_access"); > check_error(error, "removexattr"); > > do_stats(xattr_file); > > error = setxattr(xattr_file, "system.posix_acl_access", > &ents, sizeof (ents), 0); > check_error(error, "setxattr"); > > do_stats(xattr_file); > > error = lremovexattr(xattr_file, "system.posix_acl_access"); > check_error(error, "lremovexattr"); > > do_stats(xattr_file); > > error = unlink(xattr_file); > check_error(error, "unlink"); > > error = rmdir("xattr"); > check_error(error, "rmdir"); > } > > char *inotify_file = "inotify/a"; > > void > inotify_test() > { > int error; > int fd; > int wd; > > error = mkdir("inotify", 0755); > check_error(error, "mkdir"); > > fd = open(inotify_file, O_CREAT | O_RDWR, 0644); > check_error(fd, "open: O_CREAT"); > > (void) close(fd); > > do_stats(inotify_file); > > fd = inotify_init(); > check_error(error, "inotify_init"); > > do_stats(inotify_file); > > wd = inotify_add_watch(fd, inotify_file, IN_ALL_EVENTS); > check_error(wd, "inotify_add_watch"); > > do_stats(inotify_file); > > error = inotify_rm_watch(fd, wd); > check_error(error, "inotify_rm_watch"); > > (void) close(fd); > > do_stats(inotify_file); > > error = unlink(inotify_file); > check_error(error, "unlink"); > > error = rmdir("inotify"); > check_error(error, "rmdir"); > } > #include <stdlib.h> > > main() > { > exit(0); > } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 15:35 [PATCH 0/3] enhanced ESTALE error handling Peter Staubach 2008-01-18 15:46 ` J. Bruce Fields @ 2008-01-18 16:41 ` Chuck Lever 2008-01-18 16:55 ` Peter Staubach 2008-02-01 20:57 ` [PATCH 0/3] enhanced ESTALE error handling (v2) Peter Staubach 2 siblings, 1 reply; 14+ messages in thread From: Chuck Lever @ 2008-01-18 16:41 UTC (permalink / raw) To: Peter Staubach Cc: Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel Hi Peter- On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote: > Hi. > > Here is a patch set which modifies the system to enhance the > ESTALE error handling for system calls which take pathnames > as arguments. The VFS already handles ESTALE. If a pathname resolution encounters an ESTALE at any point, the resolution is restarted exactly once, and an additional flag is passed to the file system during each lookup that forces each component in the path to be revalidated on the server. This has no possibility of causing an infinite loop. Is there some part of this logic that is no longer working? > -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 16:41 ` Chuck Lever @ 2008-01-18 16:55 ` Peter Staubach 2008-01-18 17:17 ` Chuck Lever 0 siblings, 1 reply; 14+ messages in thread From: Peter Staubach @ 2008-01-18 16:55 UTC (permalink / raw) To: Chuck Lever Cc: Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel Chuck Lever wrote: > Hi Peter- > > On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote: >> Hi. >> >> Here is a patch set which modifies the system to enhance the >> ESTALE error handling for system calls which take pathnames >> as arguments. > > The VFS already handles ESTALE. > > If a pathname resolution encounters an ESTALE at any point, the > resolution is restarted exactly once, and an additional flag is passed > to the file system during each lookup that forces each component in > the path to be revalidated on the server. This has no possibility of > causing an infinite loop. > > Is there some part of this logic that is no longer working? The VFS does not fully handle ESTALE. An ESTALE error can occur during the second pathname resolution attempt. There are lots of reasons, some of which are the 1 second resolution from some file systems on the server and the window in between the revalidation and the actual use of the file handle associated with each dentry/inode pair. Also, there was no support for ESTALE errors which occur during subsequent operations to the pathname resolution process. For example, during a mkdir(2) operation, the ESTALE can occur from the over the wire MKDIR operation after the LOOKUP operations have all succeeded. Thanx... ps ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 16:55 ` Peter Staubach @ 2008-01-18 17:17 ` Chuck Lever 2008-01-18 17:30 ` Peter Staubach 0 siblings, 1 reply; 14+ messages in thread From: Chuck Lever @ 2008-01-18 17:17 UTC (permalink / raw) To: Peter Staubach Cc: Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel On Jan 18, 2008, at 11:55 AM, Peter Staubach wrote: > Chuck Lever wrote: >> Hi Peter- >> >> On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote: >>> Hi. >>> >>> Here is a patch set which modifies the system to enhance the >>> ESTALE error handling for system calls which take pathnames >>> as arguments. >> >> The VFS already handles ESTALE. >> >> If a pathname resolution encounters an ESTALE at any point, the >> resolution is restarted exactly once, and an additional flag is >> passed to the file system during each lookup that forces each >> component in the path to be revalidated on the server. This has >> no possibility of causing an infinite loop. >> >> Is there some part of this logic that is no longer working? > > The VFS does not fully handle ESTALE. An ESTALE error can occur > during the second pathname resolution attempt. If an ESTALE occurs during the second resolution attempt, we should give up. When I addressed this issue two years ago, the two-try logic was the only acceptable solution because there's no way to guarantee the pathname resolution will ever finish unless we put a hard limit on it. > There are lots of > reasons, some of which are the 1 second resolution from some file > systems on the server Which is a server bug, AFAICS. It's simply impossible to close all the windows that result from sloppy file time stamps without completely disabling client-side caching. The NFS protocol relies on file time stamps to manage cache coherence. If the server is lying about time stamps, there's no way the client can cache coherently. > and the window in between the revalidation > and the actual use of the file handle associated with each > dentry/inode pair. A use case or two would be useful to explore (on linux-nfs or linux- fsdevel, rather than lkml). > Also, there was no support for ESTALE errors which occur during > subsequent operations to the pathname resolution process. For > example, during a mkdir(2) operation, the ESTALE can occur from > the over the wire MKDIR operation after the LOOKUP operations > have all succeeded. If the final operation fails after a pathname resolution, then it's a real error. Is there a fixed and valid recovery script for the client in this case that will allow the mkdir to proceed? Admittedly, the NFS client could recover more cleanly from some of these problems, but given the architecture of the Linux VFS, it will be difficult to address some of the corner cases. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 17:17 ` Chuck Lever @ 2008-01-18 17:30 ` Peter Staubach 2008-01-18 17:52 ` Chuck Lever 2008-01-18 18:17 ` Chuck Lever 0 siblings, 2 replies; 14+ messages in thread From: Peter Staubach @ 2008-01-18 17:30 UTC (permalink / raw) To: Chuck Lever Cc: Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel Chuck Lever wrote: > On Jan 18, 2008, at 11:55 AM, Peter Staubach wrote: >> Chuck Lever wrote: >>> Hi Peter- >>> >>> On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote: >>>> Hi. >>>> >>>> Here is a patch set which modifies the system to enhance the >>>> ESTALE error handling for system calls which take pathnames >>>> as arguments. >>> >>> The VFS already handles ESTALE. >>> >>> If a pathname resolution encounters an ESTALE at any point, the >>> resolution is restarted exactly once, and an additional flag is >>> passed to the file system during each lookup that forces each >>> component in the path to be revalidated on the server. This has no >>> possibility of causing an infinite loop. >>> >>> Is there some part of this logic that is no longer working? >> >> The VFS does not fully handle ESTALE. An ESTALE error can occur >> during the second pathname resolution attempt. > > If an ESTALE occurs during the second resolution attempt, we should > give up. When I addressed this issue two years ago, the two-try logic > was the only acceptable solution because there's no way to guarantee > the pathname resolution will ever finish unless we put a hard limit on > it. > I can probably imagine a situation where the pathname resolution would never finish, but I am not sure that it could ever happen in nature. >> There are lots of >> reasons, some of which are the 1 second resolution from some file >> systems on the server > > Which is a server bug, AFAICS. It's simply impossible to close all > the windows that result from sloppy file time stamps without > completely disabling client-side caching. The NFS protocol relies on > file time stamps to manage cache coherence. If the server is lying > about time stamps, there's no way the client can cache coherently. > Server bug or not, it is something that the client has to live with. We can't get the server file system fixed, so it is something that we should find a way to live with. This support can help. >> and the window in between the revalidation >> and the actual use of the file handle associated with each >> dentry/inode pair. > > A use case or two would be useful to explore (on linux-nfs or > linux-fsdevel, rather than lkml). > I created a bunch of use cases in the gensyscall.c program that I attached to the original description of the problem and my proposed solution. It was very useful in generating many, many ESTALE errors over the wire from a variety of different over the wire operations, which were originally getting returned to the user level. >> Also, there was no support for ESTALE errors which occur during >> subsequent operations to the pathname resolution process. For >> example, during a mkdir(2) operation, the ESTALE can occur from >> the over the wire MKDIR operation after the LOOKUP operations >> have all succeeded. > > If the final operation fails after a pathname resolution, then it's a > real error. Is there a fixed and valid recovery script for the client > in this case that will allow the mkdir to proceed? > Why do you think that it is an error? It can easily occur if the directory in which the new directory is to be created disppears after it is looked up and before the MKDIR is issued. The recovery is to perform the lookup again. > Admittedly, the NFS client could recover more cleanly from some of > these problems, but given the architecture of the Linux VFS, it will > be difficult to address some of the corner cases. Could you outline some of these corner cases that this proposal would not address, please? I ran the test program for many hours, against several different servers, and although I can't prove completeness, was not able to show any ESTALE errors being returned unexpectedly. Thanx... ps ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 17:30 ` Peter Staubach @ 2008-01-18 17:52 ` Chuck Lever 2008-01-18 18:12 ` Peter Staubach 2008-01-18 18:17 ` Chuck Lever 1 sibling, 1 reply; 14+ messages in thread From: Chuck Lever @ 2008-01-18 17:52 UTC (permalink / raw) To: Peter Staubach Cc: Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel On Jan 18, 2008, at 12:30 PM, Peter Staubach wrote: > Chuck Lever wrote: >> On Jan 18, 2008, at 11:55 AM, Peter Staubach wrote: >>> Chuck Lever wrote: >>>> Hi Peter- >>>> >>>> On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote: >>>>> Hi. >>>>> >>>>> Here is a patch set which modifies the system to enhance the >>>>> ESTALE error handling for system calls which take pathnames >>>>> as arguments. >>>> >>>> The VFS already handles ESTALE. >>>> >>>> If a pathname resolution encounters an ESTALE at any point, the >>>> resolution is restarted exactly once, and an additional flag is >>>> passed to the file system during each lookup that forces each >>>> component in the path to be revalidated on the server. This has >>>> no possibility of causing an infinite loop. >>>> >>>> Is there some part of this logic that is no longer working? >>> >>> The VFS does not fully handle ESTALE. An ESTALE error can occur >>> during the second pathname resolution attempt. >> >> If an ESTALE occurs during the second resolution attempt, we >> should give up. When I addressed this issue two years ago, the >> two-try logic was the only acceptable solution because there's no >> way to guarantee the pathname resolution will ever finish unless >> we put a hard limit on it. >> > > I can probably imagine a situation where the pathname resolution > would never finish, but I am not sure that it could ever happen > in nature. Unless someone is doing something malicious. Or if the server is repeatedly returning ESTALE for some reason. >>> There are lots of >>> reasons, some of which are the 1 second resolution from some file >>> systems on the server >> >> Which is a server bug, AFAICS. It's simply impossible to close >> all the windows that result from sloppy file time stamps without >> completely disabling client-side caching. The NFS protocol relies >> on file time stamps to manage cache coherence. If the server is >> lying about time stamps, there's no way the client can cache >> coherently. >> > > Server bug or not, it is something that the client has to live > with. We can't get the server file system fixed, so it is > something that we should find a way to live with. This support > can help. We haven't identified a server-side solution yet, but that doesn't mean it doesn't exist. If we address the time stamp problem in the client, should we also go to lengths to address it in every other corner of the NFS client? Should we also address every other server bug we discover with a client side fix? >>> Also, there was no support for ESTALE errors which occur during >>> subsequent operations to the pathname resolution process. For >>> example, during a mkdir(2) operation, the ESTALE can occur from >>> the over the wire MKDIR operation after the LOOKUP operations >>> have all succeeded. >> >> If the final operation fails after a pathname resolution, then >> it's a real error. Is there a fixed and valid recovery script for >> the client in this case that will allow the mkdir to proceed? >> > > Why do you think that it is an error? Because this is a problem that sometimes requires application-level recovery. Can we guarantee that retrying the mkdir is the right thing to do every time? > It can easily occur if the directory in which the new directory > is to be created disppears after it is looked up and before the > MKDIR is issued. > > The recovery is to perform the lookup again. Have you tried this client against a file server when you unexport the filesystem under test? The server returns ESTALE no matter what the client does. Should the client continue to retry the request if the file system has been permanently taken offline? >> Admittedly, the NFS client could recover more cleanly from some of >> these problems, but given the architecture of the Linux VFS, it >> will be difficult to address some of the corner cases. > > Could you outline some of these corner cases that this proposal > would not address, please? I think we have one right here: should the client retry a mkdir if gets an ESTALE? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 17:52 ` Chuck Lever @ 2008-01-18 18:12 ` Peter Staubach 2008-01-18 18:37 ` J. Bruce Fields 0 siblings, 1 reply; 14+ messages in thread From: Peter Staubach @ 2008-01-18 18:12 UTC (permalink / raw) To: Chuck Lever Cc: Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel Chuck Lever wrote: > On Jan 18, 2008, at 12:30 PM, Peter Staubach wrote: >> Chuck Lever wrote: >>> On Jan 18, 2008, at 11:55 AM, Peter Staubach wrote: >>>> Chuck Lever wrote: >>>>> Hi Peter- >>>>> >>>>> On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote: >>>>>> Hi. >>>>>> >>>>>> Here is a patch set which modifies the system to enhance the >>>>>> ESTALE error handling for system calls which take pathnames >>>>>> as arguments. >>>>> >>>>> The VFS already handles ESTALE. >>>>> >>>>> If a pathname resolution encounters an ESTALE at any point, the >>>>> resolution is restarted exactly once, and an additional flag is >>>>> passed to the file system during each lookup that forces each >>>>> component in the path to be revalidated on the server. This has >>>>> no possibility of causing an infinite loop. >>>>> >>>>> Is there some part of this logic that is no longer working? >>>> >>>> The VFS does not fully handle ESTALE. An ESTALE error can occur >>>> during the second pathname resolution attempt. >>> >>> If an ESTALE occurs during the second resolution attempt, we should >>> give up. When I addressed this issue two years ago, the two-try >>> logic was the only acceptable solution because there's no way to >>> guarantee the pathname resolution will ever finish unless we put a >>> hard limit on it. >>> >> >> I can probably imagine a situation where the pathname resolution >> would never finish, but I am not sure that it could ever happen >> in nature. > > Unless someone is doing something malicious. Or if the server is > repeatedly returning ESTALE for some reason. > If the server is repeatedly returning ESTALE, then the pathname resolution will fail to make progress and give up, return ENOENT to the user level. A malicious user on the network can cause so many other problems than just something like this too. But, in this case, the user would have to predict why and when the client was issuing a specific operation and know whether or not to return ESTALE. This seems quite far fetched and quite unlikely to me. >>>> There are lots of >>>> reasons, some of which are the 1 second resolution from some file >>>> systems on the server >>> >>> Which is a server bug, AFAICS. It's simply impossible to close all >>> the windows that result from sloppy file time stamps without >>> completely disabling client-side caching. The NFS protocol relies >>> on file time stamps to manage cache coherence. If the server is >>> lying about time stamps, there's no way the client can cache >>> coherently. >>> >> >> Server bug or not, it is something that the client has to live >> with. We can't get the server file system fixed, so it is >> something that we should find a way to live with. This support >> can help. > > We haven't identified a server-side solution yet, but that doesn't > mean it doesn't exist. > No, it doesn't and I, and most everyone else, would also like to see such a solution. That said, I am pretty sure that we are not going to get a fix for ext3 and forcing everyone to move away from ext3 is not a good solution either. > If we address the time stamp problem in the client, should we also go > to lengths to address it in every other corner of the NFS client? > Should we also address every other server bug we discover with a > client side fix? > These aren't asked seriously, are they? When possible, we get the server bug fixed. When not possible, such as the time stamp issue with ext3, we attempt work around it as best as possible. >>>> Also, there was no support for ESTALE errors which occur during >>>> subsequent operations to the pathname resolution process. For >>>> example, during a mkdir(2) operation, the ESTALE can occur from >>>> the over the wire MKDIR operation after the LOOKUP operations >>>> have all succeeded. >>> >>> If the final operation fails after a pathname resolution, then it's >>> a real error. Is there a fixed and valid recovery script for the >>> client in this case that will allow the mkdir to proceed? >>> >> >> Why do you think that it is an error? > > Because this is a problem that sometimes requires application-level > recovery. Can we guarantee that retrying the mkdir is the right thing > to do every time? > When would not retrying the MKDIR be the right thing to do? When doing a mkdir("a/b"), the user can not tell nor cares which instance of directory "a" is the one that gets "b" created in it. Which cases are the ones that you see that require user level recovery? >> It can easily occur if the directory in which the new directory >> is to be created disppears after it is looked up and before the >> MKDIR is issued. >> >> The recovery is to perform the lookup again. > > Have you tried this client against a file server when you unexport the > filesystem under test? The server returns ESTALE no matter what the > client does. Should the client continue to retry the request if the > file system has been permanently taken offline? > Since the NFS client supports "intr", then why not continue to retry the request? It certainly won't hurt the network, trying at most once every acdirmin timeout seconds. This, by default, would be once every 30 seconds. This would alleviate a long standing complaint that when an admin uses a poor administrative procedure that users become completely hosed. >>> Admittedly, the NFS client could recover more cleanly from some of >>> these problems, but given the architecture of the Linux VFS, it will >>> be difficult to address some of the corner cases. >> >> Could you outline some of these corner cases that this proposal >> would not address, please? > > I think we have one right here: should the client retry a mkdir if > gets an ESTALE? Yes. Why not? Please describe more specifically why you think that it should not. Thanx... ps ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 18:12 ` Peter Staubach @ 2008-01-18 18:37 ` J. Bruce Fields 2008-01-18 19:12 ` Peter Staubach 0 siblings, 1 reply; 14+ messages in thread From: J. Bruce Fields @ 2008-01-18 18:37 UTC (permalink / raw) To: Peter Staubach Cc: Chuck Lever, Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel On Fri, Jan 18, 2008 at 01:12:03PM -0500, Peter Staubach wrote: > Chuck Lever wrote: >> On Jan 18, 2008, at 12:30 PM, Peter Staubach wrote: >>> I can probably imagine a situation where the pathname resolution >>> would never finish, but I am not sure that it could ever happen >>> in nature. >> >> Unless someone is doing something malicious. Or if the server is >> repeatedly returning ESTALE for some reason. >> > > If the server is repeatedly returning ESTALE, then the pathname > resolution will fail to make progress and give up, return ENOENT > to the user level. > > A malicious user on the network can cause so many other problems > than just something like this too. But, in this case, the user > would have to predict why and when the client was issuing a > specific operation and know whether or not to return ESTALE. > This seems quite far fetched and quite unlikely to me. Any idea what the consequences would be in this case? It at least shouldn't overflow the stack, or freeze the whole machine (because it spins indefinitely under some crucial lock), or panic, etc. (If the one filesystem just becomes unusable--well, fine, what better can you hope for in the presence of a malicious server or network?) --b. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 18:37 ` J. Bruce Fields @ 2008-01-18 19:12 ` Peter Staubach 0 siblings, 0 replies; 14+ messages in thread From: Peter Staubach @ 2008-01-18 19:12 UTC (permalink / raw) To: J. Bruce Fields Cc: Chuck Lever, Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel J. Bruce Fields wrote: > On Fri, Jan 18, 2008 at 01:12:03PM -0500, Peter Staubach wrote: > >> Chuck Lever wrote: >> >>> On Jan 18, 2008, at 12:30 PM, Peter Staubach wrote: >>> >>>> I can probably imagine a situation where the pathname resolution >>>> would never finish, but I am not sure that it could ever happen >>>> in nature. >>>> >>> Unless someone is doing something malicious. Or if the server is >>> repeatedly returning ESTALE for some reason. >>> >>> >> If the server is repeatedly returning ESTALE, then the pathname >> resolution will fail to make progress and give up, return ENOENT >> to the user level. >> >> A malicious user on the network can cause so many other problems >> than just something like this too. But, in this case, the user >> would have to predict why and when the client was issuing a >> specific operation and know whether or not to return ESTALE. >> This seems quite far fetched and quite unlikely to me. >> > > Any idea what the consequences would be in this case? It at least > shouldn't overflow the stack, or freeze the whole machine (because it > spins indefinitely under some crucial lock), or panic, etc. (If the one > filesystem just becomes unusable--well, fine, what better can you hope > for in the presence of a malicious server or network?) Assuming that such a user could precisely and accurately predict when to return ESTALE, the particular system call would just stay in the kernel, sending out requests to the NFS server. It wouldn't overflow the stack because the recovery is done by looping and not by recursion and unless there is a bug that needs to be fixed, all necessary resources are released before the retries occur. The machine wouldn't freeze because as soon as the request is sent, the process blocks and some other process can be scheduled. The process should be interruptible, so even it could be signaled to stop the activity. It seems to me that mostly, the file system will become unusable, but as Bruce points out, what do you expect in the presence of a malicious entity? If such are a concern, then measures such as stronger security can be employed to prevent them from wreaking havoc. Thanx... ps ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling 2008-01-18 17:30 ` Peter Staubach 2008-01-18 17:52 ` Chuck Lever @ 2008-01-18 18:17 ` Chuck Lever 1 sibling, 0 replies; 14+ messages in thread From: Chuck Lever @ 2008-01-18 18:17 UTC (permalink / raw) To: Peter Staubach Cc: Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel Hi Peter- On Jan 18, 2008, at 12:30 PM, Peter Staubach wrote: > Chuck Lever wrote: >> On Jan 18, 2008, at 11:55 AM, Peter Staubach wrote: >>> Chuck Lever wrote: >>>> Hi Peter- >>>> >>>> On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote: >>> and the window in between the revalidation >>> and the actual use of the file handle associated with each >>> dentry/inode pair. >> >> A use case or two would be useful to explore (on linux-nfs or >> linux-fsdevel, rather than lkml). > > I created a bunch of use cases in the gensyscall.c program that > I attached to the original description of the problem and my > proposed solution. It was very useful in generating many, many > ESTALE errors over the wire from a variety of different over the > wire operations, which were originally getting returned to the > user level. The gensyscall.c program is what I would call a set of unit test, btw. This is not the same as a use case, which would include information about the application environment, its users, a detailed description of current system behavior, and some discussion of alternatives for improving it (including doing nothing). A test case is written in a programming language, a use case is written in a natural language. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 0/3] enhanced ESTALE error handling (v2) 2008-01-18 15:35 [PATCH 0/3] enhanced ESTALE error handling Peter Staubach 2008-01-18 15:46 ` J. Bruce Fields 2008-01-18 16:41 ` Chuck Lever @ 2008-02-01 20:57 ` Peter Staubach 2008-03-10 20:23 ` [PATCH 0/3] enhanced ESTALE error handling (v3) Peter Staubach 2 siblings, 1 reply; 14+ messages in thread From: Peter Staubach @ 2008-02-01 20:57 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 4144 bytes --] Hi. Here is version 2 of a patch set which modifies the system to enhance the ESTALE error handling for system calls which take pathnames as arguments. The error, ESTALE, was originally introduced to handle the situation where a file handle, which NFS uses to uniquely identify a file on the server, no longer refers to a valid file on the server. This can happen when the file is removed on the server, either by an application on the server, some other client accessing the server, or sometimes even by another mounted file system from the same client. The NFS server also returns this error when the file resides upon a file system which is no longer exported. Additionally, some NFS servers even change the file handle when a file is renamed, although this practice is discouraged. This error occurs even if a file or directory, with the same name, is recreated on the server without the client being aware of it. The file handle refers to a specific instance of a file and deleting the file and then recreating it creates a new instance of the file. The error, ESTALE, is usually seen when cached directory information is used to convert a pathname to a dentry/inode pair. The information is discovered to be out of date or stale when a subsequent operation is sent to the NFS server. This can easily happen in system calls such as stat(2) when the pathname is converted a dentry/inode pair using cached information, but then a subsequent GETATTR call to the server discovers that the file handle is no longer valid. This error can also occur when a change is made on the server in between looking up different components of the pathname to be looked up or between a successful lookup and a subsequent operation. System calls which take pathnames as arguments should never see ESTALE errors from situations like this. These system calls should either fail with an ENOENT error if the pathname can not be successfully be translated to a dentry/inode pair or succeed or fail based on their own semantics. In the above example, stat(2), restarting at the pathname lookup will either cause the system call to succeed or fail, depending upon whether the file really exists or not. ESTALE errors which occur during the lookup process can be handled by dropping the dentry which refers to the non-existent file from the dcache and then restarting the lookup process. Care is taken to ensure that forward progress is always being made in order to avoiding infinite loops. ESTALE errors which occur during operations subsequent to the lookup process can be handled by unwinding appropriately and then performing the lookup process again. Eventually, either the lookup process will succeed or fail correctly or the subsequent operation will succeed or fail on its own merits. This support is desired in order to tighten up recovery from discovering stale resources due to the loose cache consistency semantics that file systems such as NFS employ. In particular, there are several large Red Hat customers, converting from Solaris to Linux, who desire this support in order that their applications environments continue to work. The loose consistency model of file systems such as NFS is exacerbated by the large granularity of timestamps available for files on file systems such ext3. The NFS client may not be able to detect changes in directories due to multiple changes occurring in the same second, for example. Please note that system calls which do not take pathnames as arguments or perhaps use file descriptors to identify the file to be manipulated may still fail with ESTALE errors. There is no recovery possible with these systems calls like there is with system calls which take pathnames as arguments. This support was tested using the attached programs and running multiple copies on mounted file systems which do not share superblocks. When two or more copies of this program are running, many ESTALE errors can be seen over the network. Without these patches, the test program errors out almost immediately. With these patches, the test program runs for as long one desires. Comments? Thanx... ps [-- Attachment #2: syscallgen.c --] [-- Type: text/x-csrc, Size: 15188 bytes --] # #define _XOPEN_SOURCE 500 #define _LARGEFILE64_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <sys/statfs.h> #include <sys/inotify.h> #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> void mkdir_test(void); void link_test(void); void open_test(void); void access_test(void); void chmod_test(void); void chown_test(void); void readlink_test(void); void utimes_test(void); void chdir_test(void); void chroot_test(void); void rename_test(void); void exec_test(void); void mknod_test(void); void statfs_test(void); void truncate_test(void); void xattr_test(void); void inotify_test(void); struct tests { void (*test)(void); }; struct tests tests[] = { mkdir_test, link_test, open_test, access_test, chmod_test, chown_test, readlink_test, utimes_test, chdir_test, chroot_test, rename_test, exec_test, mknod_test, statfs_test, truncate_test, xattr_test, inotify_test }; pid_t test_pids[sizeof(tests) / sizeof(tests[0])]; pid_t parent_pid; void kill_tests(int); int main(int argc, char *argv[]) { int i; parent_pid = getpid(); sigset(SIGINT, kill_tests); sighold(SIGINT); for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { test_pids[i] = fork(); if (test_pids[i] == 0) { for (;;) (*tests[i].test)(); /* NOTREACHED */ } } sigrelse(SIGINT); pause(); } void kill_tests(int sig) { int i; for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { if (test_pids[i] != -1) { if (kill(test_pids[i], SIGTERM) < 0) perror("kill"); } } exit(0); } void check_error(int error, char *operation) { if (error < 0 && errno == ESTALE) { perror(operation); kill(parent_pid, SIGINT); pause(); } } void check_error_child(int error, char *operation) { if (error < 0 && errno == ESTALE) { perror(operation); kill(parent_pid, SIGINT); exit(1); } } void do_stats(char *file) { int error; struct stat stbuf; struct stat64 stbuf64; error = stat(file, &stbuf); check_error(error, "stat"); error = stat64(file, &stbuf64); check_error(error, "stat64"); error = lstat(file, &stbuf); check_error(error, "lstat"); error = lstat64(file, &stbuf64); check_error(error, "lstat64"); } void do_stats_child(char *file) { int error; struct stat stbuf; struct stat64 stbuf64; error = stat(file, &stbuf); check_error_child(error, "stat"); error = stat64(file, &stbuf64); check_error_child(error, "stat64"); error = lstat(file, &stbuf); check_error_child(error, "lstat"); error = lstat64(file, &stbuf64); check_error_child(error, "lstat64"); } char *mkdir_dirs[] = { "mkdir/a", "mkdir/a/b", "mkdir/a/b/c", "mkdir/a/b/c/d", "mkdir/a/b/c/d/e", "mkdir/a/b/c/d/e/f", "mkdir/a/b/c/d/e/f/g", "mkdir/a/b/c/d/e/f/g/h", "mkdir/a/b/c/d/e/f/g/h/i", "mkdir/a/b/c/d/e/f/g/h/i/j", "mkdir/a/b/c/d/e/f/g/h/i/j/k", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z", NULL }; void mkdir_test() { int i; int error; error = mkdir("mkdir", 0755); check_error(error, "mkdir"); for (i = 0; mkdir_dirs[i] != NULL; i++) { error = mkdir(mkdir_dirs[i], 0755); check_error(error, "mkdir"); do_stats(mkdir_dirs[i]); } while (--i >= 0) { do_stats(mkdir_dirs[i]); error = rmdir(mkdir_dirs[i]); check_error(error, "rmdir"); } error = rmdir("mkdir"); check_error(error, "rmdir"); } char *link_file_a = "link/a"; char *link_file_b = "link/b"; void link_test() { int error; int fd; error = mkdir("link", 0755); check_error(error, "mkdir"); fd = open(link_file_a, O_CREAT, 0644); check_error(fd, "open"); (void) close(fd); do_stats(link_file_a); error = link(link_file_a, link_file_b); check_error(error, "link"); do_stats(link_file_a); do_stats(link_file_b); error = unlink(link_file_a); check_error(error, "unlink"); do_stats(link_file_a); do_stats(link_file_b); error = link(link_file_b, link_file_a); check_error(error, "link"); do_stats(link_file_a); do_stats(link_file_b); error = unlink(link_file_b); check_error(error, "unlink"); do_stats(link_file_a); do_stats(link_file_b); error = unlink(link_file_a); check_error(error, "unlink"); do_stats(link_file_a); do_stats(link_file_b); error = rmdir("link"); check_error(error, "rmdir"); } char *open_file = "open/a"; void open_test() { int error; int fd; error = mkdir("open", 0755); check_error(error, "mkdir"); fd = open(open_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(open_file); fd = open(open_file, O_RDWR); check_error(fd, "open: O_RDWR"); (void) close(fd); do_stats(open_file); error = unlink(open_file); check_error(error, "unlink"); error = rmdir("open"); check_error(error, "rmdir"); } char *access_file = "access/a"; void access_test() { int error; int fd; error = mkdir("access", 0755); check_error(error, "mkdir"); fd = open(access_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(access_file); error = access(access_file, F_OK); check_error(error, "access"); do_stats(access_file); error = unlink(access_file); check_error(error, "unlink"); error = rmdir("access"); check_error(error, "rmdir"); } char *chmod_file = "chmod/a"; void chmod_test() { int error; int fd; error = mkdir("chmod", 0755); check_error(error, "mkdir"); fd = open(chmod_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(chmod_file); error = chmod(chmod_file, 0600); check_error(error, "chmod"); do_stats(chmod_file); error = unlink(chmod_file); check_error(error, "unlink"); error = rmdir("chmod"); check_error(error, "rmdir"); } char *chown_file = "chown/a"; void chown_test() { int error; int fd; error = mkdir("chown", 0755); check_error(error, "mkdir"); fd = open(chown_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(chown_file); error = chown(chown_file, 4597, 4597); check_error(error, "chown"); do_stats(chown_file); error = lchown(chown_file, 4596, 4596); check_error(error, "lchown"); do_stats(chown_file); error = unlink(chown_file); check_error(error, "unlink"); error = rmdir("chown"); check_error(error, "rmdir"); } char *readlink_file = "readlink/a"; void readlink_test() { int error; char buf[BUFSIZ]; error = mkdir("readlink", 0755); check_error(error, "mkdir"); error = symlink("b", readlink_file); check_error(error, "symlink"); do_stats(readlink_file); error = readlink(readlink_file, buf, sizeof(buf)); check_error(error, "readlink"); do_stats(readlink_file); error = unlink(readlink_file); check_error(error, "unlink"); error = rmdir("readlink"); check_error(error, "rmdir"); } char *utimes_file = "utimes/a"; void utimes_test() { int error; int fd; error = mkdir("utimes", 0755); check_error(error, "mkdir"); fd = open(utimes_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(utimes_file); error = utime(utimes_file, NULL); check_error(error, "utime"); do_stats(utimes_file); error = utimes(utimes_file, NULL); check_error(error, "utimes"); do_stats(utimes_file); error = unlink(utimes_file); check_error(error, "unlink"); error = rmdir("utimes"); check_error(error, "rmdir"); } char *chdir_dir = "chdir/dir"; void chdir_test() { int error; int pid; int status; error = mkdir("chdir", 0755); check_error(error, "mkdir"); pid = fork(); if (pid == 0) { error = mkdir(chdir_dir, 0755); check_error_child(error, "mkdir"); do_stats_child(chdir_dir); error = chdir(chdir_dir); check_error_child(error, "chdir"); do_stats_child(chdir_dir); exit(0); } (void) wait(&status); do_stats(chdir_dir); error = rmdir(chdir_dir); check_error(error, "rmdir"); error = rmdir("chdir"); check_error(error, "rmdir"); } char *chroot_dir = "chroot/dir"; void chroot_test() { int error; int pid; int status; error = mkdir("chroot", 0755); check_error(error, "mkdir"); pid = fork(); if (pid == 0) { error = mkdir(chroot_dir, 0755); check_error_child(error, "mkdir"); do_stats_child(chroot_dir); error = chroot(chroot_dir); check_error_child(error, "chroot"); do_stats_child(chroot_dir); exit(0); } (void) wait(&status); do_stats(chroot_dir); error = rmdir(chroot_dir); check_error(error, "rmdir"); error = rmdir("chroot"); check_error(error, "rmdir"); } char *rename_file_a = "rename/a"; char *rename_file_b = "rename/b"; void rename_test() { int error; int fd; error = mkdir("rename", 0755); check_error(error, "mkdir"); fd = open(rename_file_a, O_CREAT, 0644); check_error(fd, "open"); (void) close(fd); do_stats(rename_file_a); error = rename(rename_file_a, rename_file_b); check_error(error, "rename"); do_stats(rename_file_a); do_stats(rename_file_b); error = rename(rename_file_b, rename_file_a); check_error(error, "rename"); do_stats(rename_file_a); do_stats(rename_file_b); error = unlink(rename_file_a); check_error(error, "unlink"); error = rmdir("rename"); check_error(error, "rmdir"); } char *exec_file = "exec/a"; char *exec_source_file = "exec_test"; void exec_test() { int error; int pid; int status; error = mkdir("exec", 0755); check_error(error, "mkdir"); error = link(exec_source_file, exec_file); check_error(error, "link"); do_stats(exec_file); pid = fork(); if (pid == 0) { error = execl(exec_file, exec_file, NULL); check_error_child(error, "execl"); exit(1); } wait(&status); do_stats(exec_file); error = unlink(exec_file); check_error(error, "unlink"); error = rmdir("exec"); check_error(error, "rmdir"); } char *mknod_file = "mknod/a"; void mknod_test() { int error; error = mkdir("mknod", 0755); check_error(error, "mkdir"); error = mknod(mknod_file, S_IFCHR | 0644, 0); check_error(error, "mknod"); do_stats(mknod_file); error = unlink(mknod_file); check_error(error, "unlink"); error = rmdir("mknod"); check_error(error, "rmdir"); } char *statfs_dir = "statfs/a"; void statfs_test() { int error; struct statfs stbuf; struct statfs64 stbuf64; error = mkdir("statfs", 0755); check_error(error, "mkdir"); do_stats("statfs"); error = mkdir(statfs_dir, 0755); check_error(error, "mkdir"); do_stats(statfs_dir); error = statfs(statfs_dir, &stbuf); check_error(error, "statfs"); error = statfs64(statfs_dir, &stbuf64); check_error(error, "statfs64"); error = rmdir(statfs_dir); check_error(error, "rmdir"); error = rmdir("statfs"); check_error(error, "rmdir"); } char *truncate_file = "truncate/a"; void truncate_test() { int error; int fd; error = mkdir("truncate", 0755); check_error(error, "mkdir"); fd = open(truncate_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(truncate_file); error = truncate(truncate_file, 1024); check_error(error, "truncate"); do_stats(truncate_file); error = unlink(truncate_file); check_error(error, "unlink"); error = rmdir("truncate"); check_error(error, "rmdir"); } char *xattr_file = "xattr/a"; #define ACL_USER_OBJ (0x01) #define ACL_USER (0x02) #define ACL_GROUP_OBJ (0x04) #define ACL_MASK (0x10) #define ACL_OTHER (0x20) struct posix_acl_xattr_entry { unsigned short e_tag; unsigned short e_perm; unsigned int e_id; }; #define POSIX_ACL_XATTR_VERSION 0x0002 struct posix_acl_xattr_header { unsigned int a_version; struct posix_acl_xattr_entry a_entries[5]; }; void xattr_test() { int error; int fd; char buf[1024]; struct posix_acl_xattr_header ents; error = mkdir("xattr", 0755); check_error(error, "mkdir"); fd = open(xattr_file, O_CREAT | O_RDWR, 0444); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(xattr_file); error = getxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "getxattr"); error = lgetxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "lgetxattr"); ents.a_version = POSIX_ACL_XATTR_VERSION; ents.a_entries[0].e_tag = ACL_USER_OBJ; ents.a_entries[0].e_perm = 06; ents.a_entries[0].e_id = -1; ents.a_entries[1].e_tag = ACL_USER; ents.a_entries[1].e_perm = 06; ents.a_entries[1].e_id = 10; ents.a_entries[2].e_tag = ACL_GROUP_OBJ; ents.a_entries[2].e_perm = 06; ents.a_entries[2].e_id = -1; ents.a_entries[3].e_tag = ACL_MASK; ents.a_entries[3].e_perm = 06; ents.a_entries[3].e_id = -1; ents.a_entries[4].e_tag = ACL_OTHER; ents.a_entries[4].e_perm = 06; ents.a_entries[4].e_id = -1; error = setxattr(xattr_file, "system.posix_acl_access", &ents, sizeof (ents), 0); check_error(error, "setxattr"); do_stats(xattr_file); error = lsetxattr(xattr_file, "system.posix_acl_access", &ents, sizeof (ents), 0); check_error(error, "lsetxattr"); do_stats(xattr_file); error = getxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "getxattr"); error = lgetxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "lgetxattr"); error = listxattr(xattr_file, buf, sizeof (buf)); check_error(error, "listxattr"); error = llistxattr(xattr_file, buf, sizeof (buf)); check_error(error, "llistxattr"); error = removexattr(xattr_file, "system.posix_acl_access"); check_error(error, "removexattr"); do_stats(xattr_file); error = setxattr(xattr_file, "system.posix_acl_access", &ents, sizeof (ents), 0); check_error(error, "setxattr"); do_stats(xattr_file); error = lremovexattr(xattr_file, "system.posix_acl_access"); check_error(error, "lremovexattr"); do_stats(xattr_file); error = unlink(xattr_file); check_error(error, "unlink"); error = rmdir("xattr"); check_error(error, "rmdir"); } char *inotify_file = "inotify/a"; void inotify_test() { int error; int fd; int wd; error = mkdir("inotify", 0755); check_error(error, "mkdir"); fd = open(inotify_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(inotify_file); fd = inotify_init(); check_error(error, "inotify_init"); do_stats(inotify_file); wd = inotify_add_watch(fd, inotify_file, IN_ALL_EVENTS); check_error(wd, "inotify_add_watch"); do_stats(inotify_file); error = inotify_rm_watch(fd, wd); check_error(error, "inotify_rm_watch"); (void) close(fd); do_stats(inotify_file); error = unlink(inotify_file); check_error(error, "unlink"); error = rmdir("inotify"); check_error(error, "rmdir"); } [-- Attachment #3: exec_test.c --] [-- Type: text/x-csrc, Size: 42 bytes --] #include <stdlib.h> main() { exit(0); } ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 0/3] enhanced ESTALE error handling (v3) 2008-02-01 20:57 ` [PATCH 0/3] enhanced ESTALE error handling (v2) Peter Staubach @ 2008-03-10 20:23 ` Peter Staubach 2008-03-10 22:42 ` Andreas Dilger 0 siblings, 1 reply; 14+ messages in thread From: Peter Staubach @ 2008-03-10 20:23 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 4272 bytes --] Hi. Here is version 3 of a patch set which modifies the system to enhance the ESTALE error handling for system calls which take pathnames as arguments. This patch set is essentially the same as the v2 patches, but updated to reflect the current state of the code around them. The error, ESTALE, was originally introduced to handle the situation where a file handle, which NFS uses to uniquely identify a file on the server, no longer refers to a valid file on the server. This can happen when the file is removed on the server, either by an application on the server, some other client accessing the server, or sometimes even by another mounted file system from the same client. The NFS server also returns this error when the file resides upon a file system which is no longer exported. Additionally, some NFS servers even change the file handle when a file is renamed, although this practice is discouraged. This error occurs even if a file or directory, with the same name, is recreated on the server without the client being aware of it. The file handle refers to a specific instance of a file and deleting the file and then recreating it creates a new instance of the file. The error, ESTALE, is usually seen when cached directory information is used to convert a pathname to a dentry/inode pair. The information is discovered to be out of date or stale when a subsequent operation is sent to the NFS server. This can easily happen in system calls such as stat(2) when the pathname is converted a dentry/inode pair using cached information, but then a subsequent GETATTR call to the server discovers that the file handle is no longer valid. This error can also occur when a change is made on the server in between looking up different components of the pathname to be looked up or between a successful lookup and a subsequent operation. System calls which take pathnames as arguments should never see ESTALE errors from situations like this. These system calls should either fail with an ENOENT error if the pathname can not be successfully be translated to a dentry/inode pair or succeed or fail based on their own semantics. In the above example, stat(2), restarting at the pathname lookup will either cause the system call to succeed or fail, depending upon whether the file really exists or not. ESTALE errors which occur during the lookup process can be handled by dropping the dentry which refers to the non-existent file from the dcache and then restarting the lookup process. Care is taken to ensure that forward progress is always being made in order to avoiding infinite loops. ESTALE errors which occur during operations subsequent to the lookup process can be handled by unwinding appropriately and then performing the lookup process again. Eventually, either the lookup process will succeed or fail correctly or the subsequent operation will succeed or fail on its own merits. This support is desired in order to tighten up recovery from discovering stale resources due to the loose cache consistency semantics that file systems such as NFS employ. In particular, there are several large Red Hat customers, converting from Solaris to Linux, who desire this support in order that their applications environments continue to work. The loose consistency model of file systems such as NFS is exacerbated by the large granularity of timestamps available for files on file systems such ext3. The NFS client may not be able to detect changes in directories due to multiple changes occurring in the same second, for example. Please note that system calls which do not take pathnames as arguments or perhaps use file descriptors to identify the file to be manipulated may still fail with ESTALE errors. There is no recovery possible with these systems calls like there is with system calls which take pathnames as arguments. This support was tested using the attached programs and running multiple copies on mounted file systems which do not share superblocks. When two or more copies of this program are running, many ESTALE errors can be seen over the network. Without these patches, the test program errors out almost immediately. With these patches, the test program runs for as long one desires. Comments? Thanx... ps [-- Attachment #2: syscallgen.c --] [-- Type: text/x-csrc, Size: 15188 bytes --] # #define _XOPEN_SOURCE 500 #define _LARGEFILE64_SOURCE #include <sys/types.h> #include <sys/stat.h> #include <sys/statfs.h> #include <sys/inotify.h> #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> void mkdir_test(void); void link_test(void); void open_test(void); void access_test(void); void chmod_test(void); void chown_test(void); void readlink_test(void); void utimes_test(void); void chdir_test(void); void chroot_test(void); void rename_test(void); void exec_test(void); void mknod_test(void); void statfs_test(void); void truncate_test(void); void xattr_test(void); void inotify_test(void); struct tests { void (*test)(void); }; struct tests tests[] = { mkdir_test, link_test, open_test, access_test, chmod_test, chown_test, readlink_test, utimes_test, chdir_test, chroot_test, rename_test, exec_test, mknod_test, statfs_test, truncate_test, xattr_test, inotify_test }; pid_t test_pids[sizeof(tests) / sizeof(tests[0])]; pid_t parent_pid; void kill_tests(int); int main(int argc, char *argv[]) { int i; parent_pid = getpid(); sigset(SIGINT, kill_tests); sighold(SIGINT); for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { test_pids[i] = fork(); if (test_pids[i] == 0) { for (;;) (*tests[i].test)(); /* NOTREACHED */ } } sigrelse(SIGINT); pause(); } void kill_tests(int sig) { int i; for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) { if (test_pids[i] != -1) { if (kill(test_pids[i], SIGTERM) < 0) perror("kill"); } } exit(0); } void check_error(int error, char *operation) { if (error < 0 && errno == ESTALE) { perror(operation); kill(parent_pid, SIGINT); pause(); } } void check_error_child(int error, char *operation) { if (error < 0 && errno == ESTALE) { perror(operation); kill(parent_pid, SIGINT); exit(1); } } void do_stats(char *file) { int error; struct stat stbuf; struct stat64 stbuf64; error = stat(file, &stbuf); check_error(error, "stat"); error = stat64(file, &stbuf64); check_error(error, "stat64"); error = lstat(file, &stbuf); check_error(error, "lstat"); error = lstat64(file, &stbuf64); check_error(error, "lstat64"); } void do_stats_child(char *file) { int error; struct stat stbuf; struct stat64 stbuf64; error = stat(file, &stbuf); check_error_child(error, "stat"); error = stat64(file, &stbuf64); check_error_child(error, "stat64"); error = lstat(file, &stbuf); check_error_child(error, "lstat"); error = lstat64(file, &stbuf64); check_error_child(error, "lstat64"); } char *mkdir_dirs[] = { "mkdir/a", "mkdir/a/b", "mkdir/a/b/c", "mkdir/a/b/c/d", "mkdir/a/b/c/d/e", "mkdir/a/b/c/d/e/f", "mkdir/a/b/c/d/e/f/g", "mkdir/a/b/c/d/e/f/g/h", "mkdir/a/b/c/d/e/f/g/h/i", "mkdir/a/b/c/d/e/f/g/h/i/j", "mkdir/a/b/c/d/e/f/g/h/i/j/k", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y", "mkdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z", NULL }; void mkdir_test() { int i; int error; error = mkdir("mkdir", 0755); check_error(error, "mkdir"); for (i = 0; mkdir_dirs[i] != NULL; i++) { error = mkdir(mkdir_dirs[i], 0755); check_error(error, "mkdir"); do_stats(mkdir_dirs[i]); } while (--i >= 0) { do_stats(mkdir_dirs[i]); error = rmdir(mkdir_dirs[i]); check_error(error, "rmdir"); } error = rmdir("mkdir"); check_error(error, "rmdir"); } char *link_file_a = "link/a"; char *link_file_b = "link/b"; void link_test() { int error; int fd; error = mkdir("link", 0755); check_error(error, "mkdir"); fd = open(link_file_a, O_CREAT, 0644); check_error(fd, "open"); (void) close(fd); do_stats(link_file_a); error = link(link_file_a, link_file_b); check_error(error, "link"); do_stats(link_file_a); do_stats(link_file_b); error = unlink(link_file_a); check_error(error, "unlink"); do_stats(link_file_a); do_stats(link_file_b); error = link(link_file_b, link_file_a); check_error(error, "link"); do_stats(link_file_a); do_stats(link_file_b); error = unlink(link_file_b); check_error(error, "unlink"); do_stats(link_file_a); do_stats(link_file_b); error = unlink(link_file_a); check_error(error, "unlink"); do_stats(link_file_a); do_stats(link_file_b); error = rmdir("link"); check_error(error, "rmdir"); } char *open_file = "open/a"; void open_test() { int error; int fd; error = mkdir("open", 0755); check_error(error, "mkdir"); fd = open(open_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(open_file); fd = open(open_file, O_RDWR); check_error(fd, "open: O_RDWR"); (void) close(fd); do_stats(open_file); error = unlink(open_file); check_error(error, "unlink"); error = rmdir("open"); check_error(error, "rmdir"); } char *access_file = "access/a"; void access_test() { int error; int fd; error = mkdir("access", 0755); check_error(error, "mkdir"); fd = open(access_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(access_file); error = access(access_file, F_OK); check_error(error, "access"); do_stats(access_file); error = unlink(access_file); check_error(error, "unlink"); error = rmdir("access"); check_error(error, "rmdir"); } char *chmod_file = "chmod/a"; void chmod_test() { int error; int fd; error = mkdir("chmod", 0755); check_error(error, "mkdir"); fd = open(chmod_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(chmod_file); error = chmod(chmod_file, 0600); check_error(error, "chmod"); do_stats(chmod_file); error = unlink(chmod_file); check_error(error, "unlink"); error = rmdir("chmod"); check_error(error, "rmdir"); } char *chown_file = "chown/a"; void chown_test() { int error; int fd; error = mkdir("chown", 0755); check_error(error, "mkdir"); fd = open(chown_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(chown_file); error = chown(chown_file, 4597, 4597); check_error(error, "chown"); do_stats(chown_file); error = lchown(chown_file, 4596, 4596); check_error(error, "lchown"); do_stats(chown_file); error = unlink(chown_file); check_error(error, "unlink"); error = rmdir("chown"); check_error(error, "rmdir"); } char *readlink_file = "readlink/a"; void readlink_test() { int error; char buf[BUFSIZ]; error = mkdir("readlink", 0755); check_error(error, "mkdir"); error = symlink("b", readlink_file); check_error(error, "symlink"); do_stats(readlink_file); error = readlink(readlink_file, buf, sizeof(buf)); check_error(error, "readlink"); do_stats(readlink_file); error = unlink(readlink_file); check_error(error, "unlink"); error = rmdir("readlink"); check_error(error, "rmdir"); } char *utimes_file = "utimes/a"; void utimes_test() { int error; int fd; error = mkdir("utimes", 0755); check_error(error, "mkdir"); fd = open(utimes_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(utimes_file); error = utime(utimes_file, NULL); check_error(error, "utime"); do_stats(utimes_file); error = utimes(utimes_file, NULL); check_error(error, "utimes"); do_stats(utimes_file); error = unlink(utimes_file); check_error(error, "unlink"); error = rmdir("utimes"); check_error(error, "rmdir"); } char *chdir_dir = "chdir/dir"; void chdir_test() { int error; int pid; int status; error = mkdir("chdir", 0755); check_error(error, "mkdir"); pid = fork(); if (pid == 0) { error = mkdir(chdir_dir, 0755); check_error_child(error, "mkdir"); do_stats_child(chdir_dir); error = chdir(chdir_dir); check_error_child(error, "chdir"); do_stats_child(chdir_dir); exit(0); } (void) wait(&status); do_stats(chdir_dir); error = rmdir(chdir_dir); check_error(error, "rmdir"); error = rmdir("chdir"); check_error(error, "rmdir"); } char *chroot_dir = "chroot/dir"; void chroot_test() { int error; int pid; int status; error = mkdir("chroot", 0755); check_error(error, "mkdir"); pid = fork(); if (pid == 0) { error = mkdir(chroot_dir, 0755); check_error_child(error, "mkdir"); do_stats_child(chroot_dir); error = chroot(chroot_dir); check_error_child(error, "chroot"); do_stats_child(chroot_dir); exit(0); } (void) wait(&status); do_stats(chroot_dir); error = rmdir(chroot_dir); check_error(error, "rmdir"); error = rmdir("chroot"); check_error(error, "rmdir"); } char *rename_file_a = "rename/a"; char *rename_file_b = "rename/b"; void rename_test() { int error; int fd; error = mkdir("rename", 0755); check_error(error, "mkdir"); fd = open(rename_file_a, O_CREAT, 0644); check_error(fd, "open"); (void) close(fd); do_stats(rename_file_a); error = rename(rename_file_a, rename_file_b); check_error(error, "rename"); do_stats(rename_file_a); do_stats(rename_file_b); error = rename(rename_file_b, rename_file_a); check_error(error, "rename"); do_stats(rename_file_a); do_stats(rename_file_b); error = unlink(rename_file_a); check_error(error, "unlink"); error = rmdir("rename"); check_error(error, "rmdir"); } char *exec_file = "exec/a"; char *exec_source_file = "exec_test"; void exec_test() { int error; int pid; int status; error = mkdir("exec", 0755); check_error(error, "mkdir"); error = link(exec_source_file, exec_file); check_error(error, "link"); do_stats(exec_file); pid = fork(); if (pid == 0) { error = execl(exec_file, exec_file, NULL); check_error_child(error, "execl"); exit(1); } wait(&status); do_stats(exec_file); error = unlink(exec_file); check_error(error, "unlink"); error = rmdir("exec"); check_error(error, "rmdir"); } char *mknod_file = "mknod/a"; void mknod_test() { int error; error = mkdir("mknod", 0755); check_error(error, "mkdir"); error = mknod(mknod_file, S_IFCHR | 0644, 0); check_error(error, "mknod"); do_stats(mknod_file); error = unlink(mknod_file); check_error(error, "unlink"); error = rmdir("mknod"); check_error(error, "rmdir"); } char *statfs_dir = "statfs/a"; void statfs_test() { int error; struct statfs stbuf; struct statfs64 stbuf64; error = mkdir("statfs", 0755); check_error(error, "mkdir"); do_stats("statfs"); error = mkdir(statfs_dir, 0755); check_error(error, "mkdir"); do_stats(statfs_dir); error = statfs(statfs_dir, &stbuf); check_error(error, "statfs"); error = statfs64(statfs_dir, &stbuf64); check_error(error, "statfs64"); error = rmdir(statfs_dir); check_error(error, "rmdir"); error = rmdir("statfs"); check_error(error, "rmdir"); } char *truncate_file = "truncate/a"; void truncate_test() { int error; int fd; error = mkdir("truncate", 0755); check_error(error, "mkdir"); fd = open(truncate_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(truncate_file); error = truncate(truncate_file, 1024); check_error(error, "truncate"); do_stats(truncate_file); error = unlink(truncate_file); check_error(error, "unlink"); error = rmdir("truncate"); check_error(error, "rmdir"); } char *xattr_file = "xattr/a"; #define ACL_USER_OBJ (0x01) #define ACL_USER (0x02) #define ACL_GROUP_OBJ (0x04) #define ACL_MASK (0x10) #define ACL_OTHER (0x20) struct posix_acl_xattr_entry { unsigned short e_tag; unsigned short e_perm; unsigned int e_id; }; #define POSIX_ACL_XATTR_VERSION 0x0002 struct posix_acl_xattr_header { unsigned int a_version; struct posix_acl_xattr_entry a_entries[5]; }; void xattr_test() { int error; int fd; char buf[1024]; struct posix_acl_xattr_header ents; error = mkdir("xattr", 0755); check_error(error, "mkdir"); fd = open(xattr_file, O_CREAT | O_RDWR, 0444); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(xattr_file); error = getxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "getxattr"); error = lgetxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "lgetxattr"); ents.a_version = POSIX_ACL_XATTR_VERSION; ents.a_entries[0].e_tag = ACL_USER_OBJ; ents.a_entries[0].e_perm = 06; ents.a_entries[0].e_id = -1; ents.a_entries[1].e_tag = ACL_USER; ents.a_entries[1].e_perm = 06; ents.a_entries[1].e_id = 10; ents.a_entries[2].e_tag = ACL_GROUP_OBJ; ents.a_entries[2].e_perm = 06; ents.a_entries[2].e_id = -1; ents.a_entries[3].e_tag = ACL_MASK; ents.a_entries[3].e_perm = 06; ents.a_entries[3].e_id = -1; ents.a_entries[4].e_tag = ACL_OTHER; ents.a_entries[4].e_perm = 06; ents.a_entries[4].e_id = -1; error = setxattr(xattr_file, "system.posix_acl_access", &ents, sizeof (ents), 0); check_error(error, "setxattr"); do_stats(xattr_file); error = lsetxattr(xattr_file, "system.posix_acl_access", &ents, sizeof (ents), 0); check_error(error, "lsetxattr"); do_stats(xattr_file); error = getxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "getxattr"); error = lgetxattr(xattr_file, "system.posix_acl_access", buf, sizeof (buf)); check_error(error, "lgetxattr"); error = listxattr(xattr_file, buf, sizeof (buf)); check_error(error, "listxattr"); error = llistxattr(xattr_file, buf, sizeof (buf)); check_error(error, "llistxattr"); error = removexattr(xattr_file, "system.posix_acl_access"); check_error(error, "removexattr"); do_stats(xattr_file); error = setxattr(xattr_file, "system.posix_acl_access", &ents, sizeof (ents), 0); check_error(error, "setxattr"); do_stats(xattr_file); error = lremovexattr(xattr_file, "system.posix_acl_access"); check_error(error, "lremovexattr"); do_stats(xattr_file); error = unlink(xattr_file); check_error(error, "unlink"); error = rmdir("xattr"); check_error(error, "rmdir"); } char *inotify_file = "inotify/a"; void inotify_test() { int error; int fd; int wd; error = mkdir("inotify", 0755); check_error(error, "mkdir"); fd = open(inotify_file, O_CREAT | O_RDWR, 0644); check_error(fd, "open: O_CREAT"); (void) close(fd); do_stats(inotify_file); fd = inotify_init(); check_error(error, "inotify_init"); do_stats(inotify_file); wd = inotify_add_watch(fd, inotify_file, IN_ALL_EVENTS); check_error(wd, "inotify_add_watch"); do_stats(inotify_file); error = inotify_rm_watch(fd, wd); check_error(error, "inotify_rm_watch"); (void) close(fd); do_stats(inotify_file); error = unlink(inotify_file); check_error(error, "unlink"); error = rmdir("inotify"); check_error(error, "rmdir"); } [-- Attachment #3: exec_test.c --] [-- Type: text/x-csrc, Size: 42 bytes --] #include <stdlib.h> main() { exit(0); } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/3] enhanced ESTALE error handling (v3) 2008-03-10 20:23 ` [PATCH 0/3] enhanced ESTALE error handling (v3) Peter Staubach @ 2008-03-10 22:42 ` Andreas Dilger 0 siblings, 0 replies; 14+ messages in thread From: Andreas Dilger @ 2008-03-10 22:42 UTC (permalink / raw) To: Peter Staubach Cc: Linux Kernel Mailing List, linux-nfs, Andrew Morton, Trond Myklebust, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 1329 bytes --] On Mar 10, 2008 16:23 -0400, Peter Staubach wrote: > Here is version 3 of a patch set which modifies the system to > enhance the ESTALE error handling for system calls which take > pathnames as arguments. This patch set is essentially the > same as the v2 patches, but updated to reflect the current > state of the code around them. [snip long discussion of ESTALE causes] > This support was tested using the attached programs and > running multiple copies on mounted file systems which do not > share superblocks. When two or more copies of this program > are running, many ESTALE errors can be seen over the network. > Without these patches, the test program errors out almost > immediately. With these patches, the test program runs > for as long one desires. Have you tried "racer.sh"? That is a very stressful metadata tester that does random operations on a handful of file and directory names. It can be run on a single client, or on multiple clients and needs no coordination between the clients. I guess it won't tell you if you are getting ESTALE back correctly or not, but it can quickly find if there are any problems with the retrying code. I've attached an updated tarball of the original scripts here. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. [-- Attachment #2: racer-lustre.tar.gz --] [-- Type: application/x-gzip, Size: 2003 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-03-10 22:43 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-01-18 15:35 [PATCH 0/3] enhanced ESTALE error handling Peter Staubach 2008-01-18 15:46 ` J. Bruce Fields 2008-01-18 16:41 ` Chuck Lever 2008-01-18 16:55 ` Peter Staubach 2008-01-18 17:17 ` Chuck Lever 2008-01-18 17:30 ` Peter Staubach 2008-01-18 17:52 ` Chuck Lever 2008-01-18 18:12 ` Peter Staubach 2008-01-18 18:37 ` J. Bruce Fields 2008-01-18 19:12 ` Peter Staubach 2008-01-18 18:17 ` Chuck Lever 2008-02-01 20:57 ` [PATCH 0/3] enhanced ESTALE error handling (v2) Peter Staubach 2008-03-10 20:23 ` [PATCH 0/3] enhanced ESTALE error handling (v3) Peter Staubach 2008-03-10 22:42 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).