LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls @ 2008-02-26 3:21 Ian Kent 2008-02-26 3:22 ` [PATCH 1/4] autofs4 - check for invalid dentry in getpath Ian Kent ` (4 more replies) 0 siblings, 5 replies; 40+ messages in thread From: Ian Kent @ 2008-02-26 3:21 UTC (permalink / raw) To: Andrew Morton; +Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel Hi Andrew, There is a problem with active restarts in autofs (that is to say restarting autofs when there are busy mounts). Currently autofs uses "umount -l" to clear active mounts at restart. While using lazy umount works for most cases, anything that needs to walk back up the mount tree to construct a path, such as getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works because the point from which the path is constructed has been detached from the mount tree. The actual problem with autofs is that it can't reconnect to existing mounts. Immediately one things of just adding the ability to remount autofs file systems would solve it, but alas, that can't work. This is because autofs direct mounts and the implementation of "on demand mount and expire" of nested mount trees have the file system mounted on top of the mount trigger dentry. To resolve this a miscellaneous device node for routing ioctl commands to these mount points has been implemented for the autofs4 kernel module. For those wishing to test this out an updated user space daemon is needed. Checking out and building from the git repo or applying all the current patches to the 5.0.3 tar distribution will do the trick. This is all available at the usual location on kernel.org. Ian ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 1/4] autofs4 - check for invalid dentry in getpath 2008-02-26 3:21 [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Ian Kent @ 2008-02-26 3:22 ` Ian Kent 2008-02-26 3:23 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor Ian Kent ` (3 subsequent siblings) 4 siblings, 0 replies; 40+ messages in thread From: Ian Kent @ 2008-02-26 3:22 UTC (permalink / raw) To: Andrew Morton; +Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel Hi Andrew, Patch to catch invalid dentry when calculating it's path. Signed-off-by: Ian Kent <raven@themaw.net> Ian --- diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.getpath-check-valid-dentry linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c --- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.getpath-check-valid-dentry 2008-02-20 12:55:39.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 12:54:46.000000000 +0900 @@ -171,7 +171,7 @@ static int autofs4_getpath(struct autofs for (tmp = dentry ; tmp != root ; tmp = tmp->d_parent) len += tmp->d_name.len + 1; - if (--len > NAME_MAX) { + if (!len || --len > NAME_MAX) { spin_unlock(&dcache_lock); return 0; } ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-26 3:21 [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Ian Kent 2008-02-26 3:22 ` [PATCH 1/4] autofs4 - check for invalid dentry in getpath Ian Kent @ 2008-02-26 3:23 ` Ian Kent 2008-02-26 5:14 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor - correction Ian Kent 2008-02-28 4:45 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor Andrew Morton 2008-02-26 3:23 ` [PATCH 4/4] autofs4 - add miscelaneous device for ioctls Ian Kent ` (2 subsequent siblings) 4 siblings, 2 replies; 40+ messages in thread From: Ian Kent @ 2008-02-26 3:23 UTC (permalink / raw) To: Andrew Morton; +Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel Hi Andrew, Patch to track the uid and gid of the last process to request a mount for on an autofs dentry. Signed-off-by: Ian Kent < raven@themaw.net> Ian --- diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/inode.c --- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids 2008-02-20 13:11:28.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:12:23.000000000 +0900 @@ -43,6 +43,8 @@ struct autofs_info *autofs4_init_ino(str ino->flags = 0; ino->mode = mode; + ino->uid = 0; + ino->gid = 0; ino->inode = NULL; ino->dentry = NULL; ino->size = 0; diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids 2008-02-20 13:14:03.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-20 13:14:34.000000000 +0900 @@ -58,6 +58,9 @@ struct autofs_info { unsigned long last_used; atomic_t count; + uid_t uid; + gid_t gid; + mode_t mode; size_t size; diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c --- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids 2008-02-20 13:06:20.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900 @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info * status = wq->status; + /* + * For direct and offset mounts we need to track the requestor + * uid and gid in the dentry info struct. This is so it can be + * supplied, on request, by the misc device ioctl interface. + * This is needed during daemon resatart when reconnecting + * to existing, active, autofs mounts. The uid and gid (and + * related string values) may be used for macro substitution + * in autofs mount maps. + */ + if (!status) { + struct dentry *de = NULL; + + /* direct mount or browsable map */ + ino = autofs4_dentry_ino(dentry); + if (!ino) { + /* If not lookup actual dentry used */ + de = d_lookup(dentry->d_parent, &dentry->d_name); + ino = autofs4_dentry_ino(de); + } + + /* Set mount requestor */ + if (ino) { + if (ino) { + ino->uid = wq->uid; + ino->gid = wq->gid; + } + } + + if (de) + dput(de); + } + /* Are we the last process to need status? */ if (atomic_dec_and_test(&wq->wait_ctr)) kfree(wq); ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor - correction 2008-02-26 3:23 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor Ian Kent @ 2008-02-26 5:14 ` Ian Kent 2008-02-28 4:45 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor Andrew Morton 1 sibling, 0 replies; 40+ messages in thread From: Ian Kent @ 2008-02-26 5:14 UTC (permalink / raw) To: Andrew Morton; +Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel On Tue, 26 Feb 2008, Ian Kent wrote: > + > + /* Set mount requestor */ > + if (ino) { > + if (ino) { > + ino->uid = wq->uid; > + ino->gid = wq->gid; > + } > + } > + As has been spotted, this is obviously wrong. And here is the correction. Signed-off-by: Ian Kent <raven@themaw.net> Ian --- diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids-fix linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c --- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids-fix 2008-02-26 14:02:05.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-26 14:02:20.000000000 +0900 @@ -385,10 +385,8 @@ int autofs4_wait(struct autofs_sb_info * /* Set mount requestor */ if (ino) { - if (ino) { - ino->uid = wq->uid; - ino->gid = wq->gid; - } + ino->uid = wq->uid; + ino->gid = wq->gid; } if (de) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-26 3:23 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor Ian Kent 2008-02-26 5:14 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor - correction Ian Kent @ 2008-02-28 4:45 ` Andrew Morton 2008-02-28 6:22 ` Ian Kent 1 sibling, 1 reply; 40+ messages in thread From: Andrew Morton @ 2008-02-28 4:45 UTC (permalink / raw) To: Ian Kent Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman On Tue, 26 Feb 2008 12:23:20 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > Hi Andrew, > > Patch to track the uid and gid of the last process to request > a mount for on an autofs dentry. > > Signed-off-by: Ian Kent < raven@themaw.net> > > Ian > > --- > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/inode.c > --- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids 2008-02-20 13:11:28.000000000 +0900 > +++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:12:23.000000000 +0900 > @@ -43,6 +43,8 @@ struct autofs_info *autofs4_init_ino(str > > ino->flags = 0; > ino->mode = mode; > + ino->uid = 0; > + ino->gid = 0; > ino->inode = NULL; > ino->dentry = NULL; > ino->size = 0; > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h > --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids 2008-02-20 13:14:03.000000000 +0900 > +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-20 13:14:34.000000000 +0900 > @@ -58,6 +58,9 @@ struct autofs_info { > unsigned long last_used; > atomic_t count; > > + uid_t uid; > + gid_t gid; > + > mode_t mode; > size_t size; > > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c > --- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids 2008-02-20 13:06:20.000000000 +0900 > +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900 > @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info * > > status = wq->status; > > + /* > + * For direct and offset mounts we need to track the requestor > + * uid and gid in the dentry info struct. This is so it can be > + * supplied, on request, by the misc device ioctl interface. > + * This is needed during daemon resatart when reconnecting > + * to existing, active, autofs mounts. The uid and gid (and > + * related string values) may be used for macro substitution > + * in autofs mount maps. > + */ > + if (!status) { > + struct dentry *de = NULL; > + > + /* direct mount or browsable map */ > + ino = autofs4_dentry_ino(dentry); > + if (!ino) { > + /* If not lookup actual dentry used */ > + de = d_lookup(dentry->d_parent, &dentry->d_name); > + ino = autofs4_dentry_ino(de); > + } > + > + /* Set mount requestor */ > + if (ino) { > + if (ino) { > + ino->uid = wq->uid; > + ino->gid = wq->gid; > + } > + } > + > + if (de) > + dput(de); > + } > + But uids and gids are no longer system-wide-unique. Two different users can have the same identifiers in different namespaces. What happens then? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 4:45 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor Andrew Morton @ 2008-02-28 6:22 ` Ian Kent 2008-02-28 6:37 ` Andrew Morton 0 siblings, 1 reply; 40+ messages in thread From: Ian Kent @ 2008-02-28 6:22 UTC (permalink / raw) To: Andrew Morton Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman On Wed, 2008-02-27 at 20:45 -0800, Andrew Morton wrote: > On Tue, 26 Feb 2008 12:23:20 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > > > Hi Andrew, > > > > Patch to track the uid and gid of the last process to request > > a mount for on an autofs dentry. > > > > Signed-off-by: Ian Kent < raven@themaw.net> > > > > Ian > > > > --- > > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/inode.c > > --- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.track-last-mount-ids 2008-02-20 13:11:28.000000000 +0900 > > +++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:12:23.000000000 +0900 > > @@ -43,6 +43,8 @@ struct autofs_info *autofs4_init_ino(str > > > > ino->flags = 0; > > ino->mode = mode; > > + ino->uid = 0; > > + ino->gid = 0; > > ino->inode = NULL; > > ino->dentry = NULL; > > ino->size = 0; > > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h > > --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.track-last-mount-ids 2008-02-20 13:14:03.000000000 +0900 > > +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-20 13:14:34.000000000 +0900 > > @@ -58,6 +58,9 @@ struct autofs_info { > > unsigned long last_used; > > atomic_t count; > > > > + uid_t uid; > > + gid_t gid; > > + > > mode_t mode; > > size_t size; > > > > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c > > --- linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c.track-last-mount-ids 2008-02-20 13:06:20.000000000 +0900 > > +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900 > > @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info * > > > > status = wq->status; > > > > + /* > > + * For direct and offset mounts we need to track the requestor > > + * uid and gid in the dentry info struct. This is so it can be > > + * supplied, on request, by the misc device ioctl interface. > > + * This is needed during daemon resatart when reconnecting > > + * to existing, active, autofs mounts. The uid and gid (and > > + * related string values) may be used for macro substitution > > + * in autofs mount maps. > > + */ > > + if (!status) { > > + struct dentry *de = NULL; > > + > > + /* direct mount or browsable map */ > > + ino = autofs4_dentry_ino(dentry); > > + if (!ino) { > > + /* If not lookup actual dentry used */ > > + de = d_lookup(dentry->d_parent, &dentry->d_name); > > + ino = autofs4_dentry_ino(de); > > + } > > + > > + /* Set mount requestor */ > > + if (ino) { > > + if (ino) { > > + ino->uid = wq->uid; > > + ino->gid = wq->gid; > > + } > > + } > > + > > + if (de) > > + dput(de); > > + } > > + > > But uids and gids are no longer system-wide-unique. Two different users > can have the same identifiers in different namespaces. What happens then? That's a tricky question. Presumably, the process requesting the mount has the user space daemon running in the namespace within which the uid and gid are to be looked up, by the daemon. Am I missing something? Ian ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 6:22 ` Ian Kent @ 2008-02-28 6:37 ` Andrew Morton 2008-02-28 7:08 ` Ian Kent 0 siblings, 1 reply; 40+ messages in thread From: Andrew Morton @ 2008-02-28 6:37 UTC (permalink / raw) To: Ian Kent Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman On Thu, 28 Feb 2008 15:22:27 +0900 Ian Kent <raven@themaw.net> wrote: > > > > +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900 > > > @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info * > > > > > > status = wq->status; > > > > > > + /* > > > + * For direct and offset mounts we need to track the requestor > > > + * uid and gid in the dentry info struct. This is so it can be > > > + * supplied, on request, by the misc device ioctl interface. > > > + * This is needed during daemon resatart when reconnecting > > > + * to existing, active, autofs mounts. The uid and gid (and > > > + * related string values) may be used for macro substitution > > > + * in autofs mount maps. > > > + */ > > > + if (!status) { > > > + struct dentry *de = NULL; > > > + > > > + /* direct mount or browsable map */ > > > + ino = autofs4_dentry_ino(dentry); > > > + if (!ino) { > > > + /* If not lookup actual dentry used */ > > > + de = d_lookup(dentry->d_parent, &dentry->d_name); > > > + ino = autofs4_dentry_ino(de); > > > + } > > > + > > > + /* Set mount requestor */ > > > + if (ino) { > > > + if (ino) { > > > + ino->uid = wq->uid; > > > + ino->gid = wq->gid; > > > + } > > > + } > > > + > > > + if (de) > > > + dput(de); > > > + } > > > + > > > > But uids and gids are no longer system-wide-unique. Two different users > > can have the same identifiers in different namespaces. What happens then? > > That's a tricky question. > > Presumably, the process requesting the mount has the user space daemon > running in the namespace within which the uid and gid are to be looked > up, by the daemon. > > Am I missing something? > err, you assume more knowledge at this end about what you're trying to do than actually exists :) You seem to imply that if a machine is running 100 user namespaces then it needs to run 100 mount daemons. Doesn't seem good. What problem are you actually trying to solve here? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 6:37 ` Andrew Morton @ 2008-02-28 7:08 ` Ian Kent 2008-02-28 7:23 ` Andrew Morton 2008-02-28 7:51 ` Pavel Emelyanov 0 siblings, 2 replies; 40+ messages in thread From: Ian Kent @ 2008-02-28 7:08 UTC (permalink / raw) To: Andrew Morton Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman On Wed, 2008-02-27 at 22:37 -0800, Andrew Morton wrote: > On Thu, 28 Feb 2008 15:22:27 +0900 Ian Kent <raven@themaw.net> wrote: > > > > > > > +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900 > > > > @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info * > > > > > > > > status = wq->status; > > > > > > > > + /* > > > > + * For direct and offset mounts we need to track the requestor > > > > + * uid and gid in the dentry info struct. This is so it can be > > > > + * supplied, on request, by the misc device ioctl interface. > > > > + * This is needed during daemon resatart when reconnecting > > > > + * to existing, active, autofs mounts. The uid and gid (and > > > > + * related string values) may be used for macro substitution > > > > + * in autofs mount maps. > > > > + */ > > > > + if (!status) { > > > > + struct dentry *de = NULL; > > > > + > > > > + /* direct mount or browsable map */ > > > > + ino = autofs4_dentry_ino(dentry); > > > > + if (!ino) { > > > > + /* If not lookup actual dentry used */ > > > > + de = d_lookup(dentry->d_parent, &dentry->d_name); > > > > + ino = autofs4_dentry_ino(de); > > > > + } > > > > + > > > > + /* Set mount requestor */ > > > > + if (ino) { > > > > + if (ino) { > > > > + ino->uid = wq->uid; > > > > + ino->gid = wq->gid; > > > > + } > > > > + } > > > > + > > > > + if (de) > > > > + dput(de); > > > > + } > > > > + > > > > > > But uids and gids are no longer system-wide-unique. Two different users > > > can have the same identifiers in different namespaces. What happens then? > > > > That's a tricky question. > > > > Presumably, the process requesting the mount has the user space daemon > > running in the namespace within which the uid and gid are to be looked > > up, by the daemon. > > > > Am I missing something? > > > > err, you assume more knowledge at this end about what you're trying to do > than actually exists :) > > You seem to imply that if a machine is running 100 user namespaces then it > needs to run 100 mount daemons. Doesn't seem good. More likely my lack of understanding of how namespaces are meant to work. > > What problem are you actually trying to solve here? The basic problem arises only when we want to restart the user space daemon and there are active autofs managed mounts in place at exit (ie. autofs mounts that have busy user mounts). They are left mounted and processes using them continue to function. But then, when we startup autofs we need to reconnect to these autofs mounts, some of which can covered the by mounted file systems, and hence the need for another way to open an ioctl descriptor to them. It may have been overkill to re-implement all the current ioctls (and add a couple of other much needed ones) but I though it sensible for completeness, and we get to identify any possible problems the current ioctls might have had due to the use of the BKL (by the VFS when calling the ioctls). So, why do we need the uid and gid? When someone walks over an autofs dentry that is meant to cause a mount we send a request packet to the daemon via a pipe which includes the process uid and gid, and as part of the lookup we set macros for several mount map substitution variables, derived from the uid and gid of the process requesting the mount and they can be used within autofs maps. This is all fine as long as we don't need to re-connect to these mounts when starting up, since we don't get kernel requests for the mounts, we need to obtain that information from the active mount itself. Ian ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 7:08 ` Ian Kent @ 2008-02-28 7:23 ` Andrew Morton 2008-02-28 8:00 ` Ian Kent 2008-02-28 7:51 ` Pavel Emelyanov 1 sibling, 1 reply; 40+ messages in thread From: Andrew Morton @ 2008-02-28 7:23 UTC (permalink / raw) To: Ian Kent Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <raven@themaw.net> wrote: > > > > > What problem are you actually trying to solve here? > > The basic problem arises only when we want to restart the user space > daemon and there are active autofs managed mounts in place at exit (ie. > autofs mounts that have busy user mounts). They are left mounted and > processes using them continue to function. But then, when we startup > autofs we need to reconnect to these autofs mounts, some of which can > covered the by mounted file systems, and hence the need for another way > to open an ioctl descriptor to them. So we want to store persistant state in the kernel across userspace process invokations. That's normally done with a thing called a "file" ;) Could we stick all the necessary state into files in a pseudo-fs and have the daemon go and open and read them all when it restarts? > It may have been overkill to re-implement all the current ioctls (and > add a couple of other much needed ones) I don't understand that bit. But then, I don't have a clue how autofs works. > but I though it sensible for > completeness, and we get to identify any possible problems the current > ioctls might have had due to the use of the BKL (by the VFS when calling > the ioctls). It isn't a good idea to wait for races to reveal themselves. It will take years, especially with a system which has as low a call frequency as autofs mounting. And once a bug _does_ reveal itself, by then we'll have tens of millions of machines out there running that bug. > So, why do we need the uid and gid? When someone walks over an autofs > dentry that is meant to cause a mount we send a request packet to the > daemon via a pipe connector or genetlink would be more fashionable transports. > which includes the process uid and gid, and as part of > the lookup we set macros for several mount map substitution variables, > derived from the uid and gid of the process requesting the mount and > they can be used within autofs maps. yeah, could be a problem. Hopefully the namespace people can advise. Perhaps we need a concept of an exportable-to-userspace namespace-id+uid, namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has come up before. Recently, but I forget what the context was. > This is all fine as long as we don't need to re-connect to these mounts > when starting up, since we don't get kernel requests for the mounts, we > need to obtain that information from the active mount itself. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 7:23 ` Andrew Morton @ 2008-02-28 8:00 ` Ian Kent 2008-02-28 17:13 ` Jeff Moyer 0 siblings, 1 reply; 40+ messages in thread From: Ian Kent @ 2008-02-28 8:00 UTC (permalink / raw) To: Andrew Morton Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote: > On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <raven@themaw.net> wrote: > > > > > > > > > What problem are you actually trying to solve here? > > > > The basic problem arises only when we want to restart the user space > > daemon and there are active autofs managed mounts in place at exit (ie. > > autofs mounts that have busy user mounts). They are left mounted and > > processes using them continue to function. But then, when we startup > > autofs we need to reconnect to these autofs mounts, some of which can > > covered the by mounted file systems, and hence the need for another way > > to open an ioctl descriptor to them. > > So we want to store persistant state in the kernel across userspace process > invokations. That's normally done with a thing called a "file" ;) Could we > stick all the necessary state into files in a pseudo-fs and have the daemon > go and open and read them all when it restarts? Nearly sent of a reply without thinking about this and it sounds like a good idea initially. But, if we have a large number of autofs file systems mounted (thousands?), duplicating the information already present in the autofs file system seems untidy and unnecessary. I thought of exposing this information in in sysfs, but that would make the autofs module part of sysfs have many files, and there is the problem that the same path could have more than one autofs file system stacked on top of it, so isn't unique. The other obvious place is in the mount options but that is one of the reasons that only the device number is exposed their. If we have 5000 - 10000 mounts then scanning /proc/mounts becomes a big problem from a CPU usage perspective (and is already a big problem for me, which is partly addressed by the new bits in the implementation here). If I could think of another way to expose the device number as well I would use it, even if it was an additional ioctl, to avoid scanning /proc/mounts. > > > It may have been overkill to re-implement all the current ioctls (and > > add a couple of other much needed ones) > > I don't understand that bit. But then, I don't have a clue how autofs > works. > > > but I though it sensible for > > completeness, and we get to identify any possible problems the current > > ioctls might have had due to the use of the BKL (by the VFS when calling > > the ioctls). > > It isn't a good idea to wait for races to reveal themselves. It will take > years, especially with a system which has as low a call frequency as autofs > mounting. And once a bug _does_ reveal itself, by then we'll have tens of > millions of machines out there running that bug. Not sure I agree about the low call frequency. It's quite normal for smaller sites to have hundreds of entries in maps and equally common for them to do stupid things like run scripts that scan the file systems, mounting every mount. It's not quite the same order of magnitude, but I regularly use the autofs connectathon suite for testing. It only ends up mounting several hundred mounts but I can get mounting and expiring happening at the same time so it's not an unreasonable way to test. > > > So, why do we need the uid and gid? When someone walks over an autofs > > dentry that is meant to cause a mount we send a request packet to the > > daemon via a pipe > > connector or genetlink would be more fashionable transports. Right, I'll see what I can find on those topics. My concern is that this will require considerable work in the daemon which would be fine for version 5.1 but I need to resolve this problem for the existing 5.0 implementation. > > > which includes the process uid and gid, and as part of > > the lookup we set macros for several mount map substitution variables, > > derived from the uid and gid of the process requesting the mount and > > they can be used within autofs maps. > > yeah, could be a problem. Hopefully the namespace people can advise. > Perhaps we need a concept of an exportable-to-userspace namespace-id+uid, > namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has > come up before. Recently, but I forget what the context was. I'm all ears to any feedback from others on this, please. > > > This is all fine as long as we don't need to re-connect to these mounts > > when starting up, since we don't get kernel requests for the mounts, we > > need to obtain that information from the active mount itself. > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 8:00 ` Ian Kent @ 2008-02-28 17:13 ` Jeff Moyer 2008-02-28 19:51 ` Serge E. Hallyn 0 siblings, 1 reply; 40+ messages in thread From: Jeff Moyer @ 2008-02-28 17:13 UTC (permalink / raw) To: Ian Kent Cc: Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman Ian Kent <raven@themaw.net> writes: > On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote: >> On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <raven@themaw.net> wrote: >> > which includes the process uid and gid, and as part of >> > the lookup we set macros for several mount map substitution variables, >> > derived from the uid and gid of the process requesting the mount and >> > they can be used within autofs maps. >> >> yeah, could be a problem. Hopefully the namespace people can advise. >> Perhaps we need a concept of an exportable-to-userspace namespace-id+uid, >> namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has >> come up before. Recently, but I forget what the context was. > > I'm all ears to any feedback from others on this, please. I think there is some confusion surrounding what the UID and GID are used for in this context. I'll try to explain it as best I can. When the automount daemon parses a map entry, it will do some amount of variable substitution. So, let's say you're running on an i386 box, and you want to mount a library directory from a server. You might have a map entry that looks like this: lib server:/export/$ARCH/lib In this case, the automount daemon will replace $ARCH with i386, and will try the following mount command: mount -t nfs server:/export/i386/lib /automountdir/lib There are cases where it would be helpful to use the requesting process's UID in such a variable substitution. Consider the case of a CIFS share, where the automount daemon runs as user root, but we want to mount the share using the credentials of the requesting user. In this case, the UID and GID can be helpful in formatting the mount options for mounting the share. So, the UID and GID are used only for map substitutions. Now, having said all of that, I'll have to look more closely at why we even need to keep track of it, given that it only needs to be used when performing the lookup, and at that time we have information on the requesting UID and GID. Cheers, Jeff ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 17:13 ` Jeff Moyer @ 2008-02-28 19:51 ` Serge E. Hallyn 2008-02-29 3:32 ` Ian Kent 0 siblings, 1 reply; 40+ messages in thread From: Serge E. Hallyn @ 2008-02-28 19:51 UTC (permalink / raw) To: Jeff Moyer Cc: Ian Kent, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman Quoting Jeff Moyer (jmoyer@redhat.com): > Ian Kent <raven@themaw.net> writes: > > > On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote: > >> On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <raven@themaw.net> wrote: > >> > which includes the process uid and gid, and as part of > >> > the lookup we set macros for several mount map substitution variables, > >> > derived from the uid and gid of the process requesting the mount and > >> > they can be used within autofs maps. > >> > >> yeah, could be a problem. Hopefully the namespace people can advise. > >> Perhaps we need a concept of an exportable-to-userspace namespace-id+uid, > >> namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has > >> come up before. Recently, but I forget what the context was. > > > > I'm all ears to any feedback from others on this, please. > > I think there is some confusion surrounding what the UID and GID are > used for in this context. I'll try to explain it as best I can. > > When the automount daemon parses a map entry, it will do some amount of > variable substitution. So, let's say you're running on an i386 box, and > you want to mount a library directory from a server. You might have a > map entry that looks like this: > > lib server:/export/$ARCH/lib > > In this case, the automount daemon will replace $ARCH with i386, and > will try the following mount command: > > mount -t nfs server:/export/i386/lib /automountdir/lib > > There are cases where it would be helpful to use the requesting > process's UID in such a variable substitution. Consider the case of a > CIFS share, where the automount daemon runs as user root, but we want to > mount the share using the credentials of the requesting user. In this > case, the UID and GID can be helpful in formatting the mount options for > mounting the share. > > So, the UID and GID are used only for map substitutions. Now, having > said all of that, I'll have to look more closely at why we even need to > keep track of it, given that it only needs to be used when performing > the lookup, and at that time we have information on the requesting UID > and GID. Thanks Jeff. If that's the case then user namespaces don't affect this at all. (Still trying to follow the rest of the thread bc i definately feel like I'm missing something. I swear I understood autofs 10+ years ago :) thanks, -serge ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 19:51 ` Serge E. Hallyn @ 2008-02-29 3:32 ` Ian Kent 2008-02-29 16:09 ` Serge E. Hallyn 0 siblings, 1 reply; 40+ messages in thread From: Ian Kent @ 2008-02-29 3:32 UTC (permalink / raw) To: Serge E. Hallyn Cc: Jeff Moyer, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman On Thu, 2008-02-28 at 13:51 -0600, Serge E. Hallyn wrote: > Quoting Jeff Moyer (jmoyer@redhat.com): > > Ian Kent <raven@themaw.net> writes: > > > > > On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote: > > >> On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <raven@themaw.net> wrote: > > >> > which includes the process uid and gid, and as part of > > >> > the lookup we set macros for several mount map substitution variables, > > >> > derived from the uid and gid of the process requesting the mount and > > >> > they can be used within autofs maps. > > >> > > >> yeah, could be a problem. Hopefully the namespace people can advise. > > >> Perhaps we need a concept of an exportable-to-userspace namespace-id+uid, > > >> namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has > > >> come up before. Recently, but I forget what the context was. > > > > > > I'm all ears to any feedback from others on this, please. > > > > I think there is some confusion surrounding what the UID and GID are > > used for in this context. I'll try to explain it as best I can. > > > > When the automount daemon parses a map entry, it will do some amount of > > variable substitution. So, let's say you're running on an i386 box, and > > you want to mount a library directory from a server. You might have a > > map entry that looks like this: > > > > lib server:/export/$ARCH/lib > > > > In this case, the automount daemon will replace $ARCH with i386, and > > will try the following mount command: > > > > mount -t nfs server:/export/i386/lib /automountdir/lib > > > > There are cases where it would be helpful to use the requesting > > process's UID in such a variable substitution. Consider the case of a > > CIFS share, where the automount daemon runs as user root, but we want to > > mount the share using the credentials of the requesting user. In this > > case, the UID and GID can be helpful in formatting the mount options for > > mounting the share. > > > > So, the UID and GID are used only for map substitutions. Now, having > > said all of that, I'll have to look more closely at why we even need to > > keep track of it, given that it only needs to be used when performing > > the lookup, and at that time we have information on the requesting UID > > and GID. > > Thanks Jeff. If that's the case then user namespaces don't affect this > at all. Yep, that's precisely the way this is used, by autofs anyway. I guess the problem we face is that since this is a public interface other applications could use this in a different way and we can't control that. I think I need more information so I can document the defined usage in my revised patch set. In version 5 I set $UID, $GID, $USER, $GROUP and $HOME in addition to the standard autofs macros, $ARCH, $CPU, $HOST, $OSNAME, $OSREL and $OSVERS, and then expand the map entry. The question that Jeff is asking himself is, why do we need this information when we re-connect at startup, since the mounts are already present. The answer isn't easy to explain and will be lengthy, sorry, but let me try anyway. There are two types on maps, direct (in the module source you will see a third type called an offset, which is just a direct mount in disguise) and indirect. For example, here is master map with direct and indirect map entries: /- /etc/auto.direct /test /etc/auto.indirect /etc/auto.direct: /automount/dparse/g6 budgie:/autofs/export1 /automount/dparse/g1 shark:/autofs/export1 and so on. /etc/auto.indirect: g1 shark:/autofs/export1 g6 budgie:/autofs/export1 and so on. For the above indirect map an autofs file system is mounted on /test and mounts are triggered by the inode lookup operation. So we see a mount of shark:/autofs/export1 on /test/g1, for example. The way that direct mounts are handled is by makeing an autofs mount on each full path, such as /automount/dparse/g1, and using it as a mount trigger. So when we walk on the path we mount shark:/autofs/export1 on top of this mount point, for example. Since these are always a directories we can use the follow_link inode operation to trigger the mount. But, each entry in direct and indirect maps can have offsets (often called multi-mount map entries). For example, a direct mount map entry could also be: /automount/dparse/g1 \ / shark:/autofs/export5/testing/test \ /s1 shark:/autofs/export/testing/test/s1 \ /s2 shark:/autofs/export5/testing/test/s2 \ /s1/ss1 shark:/autofs/export2 \ /s2/ss2 shark:/autofs/export2 and a similar indirect mount: g1 \ / shark:/autofs/export5/testing/test \ /s1 shark:/autofs/export/testing/test/s1 \ /s2 shark:/autofs/export5/testing/test/s2 \ /s1/ss1 shark:/autofs/export1 \ /s2/ss2 shark:/autofs/export2 One of the issues with version 4 of autofs was that, when mounting an entry with a large number of offsets, possibly with nesting, we needed to mount and umount all of them as a single unit. Not really a problem, except for people with a large number of offsets in map entries. This mechanism is used for the well known "hosts" map and we have seen cases (in 2.4) where the available number of mounts are exhausted or where the number of privileged ports available is exhausted. In version 5 we mount only as we go down the tree of offsets and similarly for expiring them which resolves the above problem. There is somewhat more detail to the implementation but it isn't needed for the sake of the explanation. The one important detail is that these offsets are implemented using the same mechanism as the direct mounts above and so can also be covered by another mount. To be able to do this I need to maintain context. In the daemon I use a list to represent these offsets and use it to manage the mounting and expiration of the mount tree, something which can only be discovered from the original map entry. So, to rebuild this context at startup for existing mounts I need to do the lookup to get the map entry as part of the process of re-connecting to the mounts. Hence the need to get the uid and gid of the original requester. Few, that was hard work, just for those last couple of sentences. Please, lets not go into the issue that the maps can change during a restart, that's an issue for another time and isn't a kernel issue anyway. > > (Still trying to follow the rest of the thread bc i definately feel like > I'm missing something. I swear I understood autofs 10+ years ago :) Me too, but now I've change it so much even I'm confused most of the time, ;) Hopefully the above explanation is useful in some small way. Ian ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-29 3:32 ` Ian Kent @ 2008-02-29 16:09 ` Serge E. Hallyn 2008-02-29 16:20 ` Pavel Emelyanov ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Serge E. Hallyn @ 2008-02-29 16:09 UTC (permalink / raw) To: Ian Kent Cc: Serge E. Hallyn, Jeff Moyer, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman Quoting Ian Kent (raven@themaw.net): > > On Thu, 2008-02-28 at 13:51 -0600, Serge E. Hallyn wrote: > > Quoting Jeff Moyer (jmoyer@redhat.com): > > > Ian Kent <raven@themaw.net> writes: > > > > > > > On Wed, 2008-02-27 at 23:23 -0800, Andrew Morton wrote: > > > >> On Thu, 28 Feb 2008 16:08:20 +0900 Ian Kent <raven@themaw.net> wrote: > > > >> > which includes the process uid and gid, and as part of > > > >> > the lookup we set macros for several mount map substitution variables, > > > >> > derived from the uid and gid of the process requesting the mount and > > > >> > they can be used within autofs maps. > > > >> > > > >> yeah, could be a problem. Hopefully the namespace people can advise. > > > >> Perhaps we need a concept of an exportable-to-userspace namespace-id+uid, > > > >> namespace-id+gid, namespace-id+pid, etc for this sort of thing. It has > > > >> come up before. Recently, but I forget what the context was. > > > > > > > > I'm all ears to any feedback from others on this, please. > > > > > > I think there is some confusion surrounding what the UID and GID are > > > used for in this context. I'll try to explain it as best I can. > > > > > > When the automount daemon parses a map entry, it will do some amount of > > > variable substitution. So, let's say you're running on an i386 box, and > > > you want to mount a library directory from a server. You might have a > > > map entry that looks like this: > > > > > > lib server:/export/$ARCH/lib > > > > > > In this case, the automount daemon will replace $ARCH with i386, and > > > will try the following mount command: > > > > > > mount -t nfs server:/export/i386/lib /automountdir/lib > > > > > > There are cases where it would be helpful to use the requesting > > > process's UID in such a variable substitution. Consider the case of a > > > CIFS share, where the automount daemon runs as user root, but we want to > > > mount the share using the credentials of the requesting user. In this > > > case, the UID and GID can be helpful in formatting the mount options for > > > mounting the share. > > > > > > So, the UID and GID are used only for map substitutions. Now, having > > > said all of that, I'll have to look more closely at why we even need to > > > keep track of it, given that it only needs to be used when performing > > > the lookup, and at that time we have information on the requesting UID > > > and GID. > > > > Thanks Jeff. If that's the case then user namespaces don't affect this > > at all. > > Yep, that's precisely the way this is used, by autofs anyway. > > I guess the problem we face is that since this is a public interface > other applications could use this in a different way and we can't > control that. I think I need more information so I can document the > defined usage in my revised patch set. > > In version 5 I set $UID, $GID, $USER, $GROUP and $HOME in addition to > the standard autofs macros, $ARCH, $CPU, $HOST, $OSNAME, $OSREL and > $OSVERS, and then expand the map entry. > > The question that Jeff is asking himself is, why do we need this > information when we re-connect at startup, since the mounts are already > present. > > The answer isn't easy to explain and will be lengthy, sorry, but let me > try anyway. > > There are two types on maps, direct (in the module source you will see a > third type called an offset, which is just a direct mount in disguise) > and indirect. > > For example, here is master map with direct and indirect map entries: > > /- /etc/auto.direct > /test /etc/auto.indirect > > /etc/auto.direct: > > /automount/dparse/g6 budgie:/autofs/export1 > /automount/dparse/g1 shark:/autofs/export1 > and so on. > > /etc/auto.indirect: > > g1 shark:/autofs/export1 > g6 budgie:/autofs/export1 > and so on. > > For the above indirect map an autofs file system is mounted on /test and > mounts are triggered by the inode lookup operation. So we see a mount of > shark:/autofs/export1 on /test/g1, for example. > > The way that direct mounts are handled is by makeing an autofs mount on > each full path, such as /automount/dparse/g1, and using it as a mount > trigger. So when we walk on the path we mount shark:/autofs/export1 on > top of this mount point, for example. Since these are always a > directories we can use the follow_link inode operation to trigger the > mount. > > But, each entry in direct and indirect maps can have offsets (often > called multi-mount map entries). > > For example, > > a direct mount map entry could also be: > > /automount/dparse/g1 \ > / shark:/autofs/export5/testing/test \ > /s1 shark:/autofs/export/testing/test/s1 \ > /s2 shark:/autofs/export5/testing/test/s2 \ > /s1/ss1 shark:/autofs/export2 \ > /s2/ss2 shark:/autofs/export2 > > and a similar indirect mount: > > g1 \ > / shark:/autofs/export5/testing/test \ > /s1 shark:/autofs/export/testing/test/s1 \ > /s2 shark:/autofs/export5/testing/test/s2 \ > /s1/ss1 shark:/autofs/export1 \ > /s2/ss2 shark:/autofs/export2 > > One of the issues with version 4 of autofs was that, when mounting an > entry with a large number of offsets, possibly with nesting, we needed > to mount and umount all of them as a single unit. Not really a problem, > except for people with a large number of offsets in map entries. This > mechanism is used for the well known "hosts" map and we have seen cases > (in 2.4) where the available number of mounts are exhausted or where the > number of privileged ports available is exhausted. > > In version 5 we mount only as we go down the tree of offsets and > similarly for expiring them which resolves the above problem. There is > somewhat more detail to the implementation but it isn't needed for the > sake of the explanation. The one important detail is that these offsets > are implemented using the same mechanism as the direct mounts above and > so can also be covered by another mount. > > To be able to do this I need to maintain context. In the daemon I use a > list to represent these offsets and use it to manage the mounting and > expiration of the mount tree, something which can only be discovered > from the original map entry. So, to rebuild this context at startup for > existing mounts I need to do the lookup to get the map entry as part of > the process of re-connecting to the mounts. Hence the need to get the > uid and gid of the original requester. The way the user namespaces work right now is similar to say the IPC namespace - a task belongs to one user, that user belongs to precisely one user namespace. Even in my additional userns patches, I was changing uid to store the (uid, userns) so a struct user still belonged to just one user namespace. In contrast, with pid namespaces a task is associated with a 'struct pid' which links it to multiple process ids, one in each pid namespace to which it belongs. Perhaps we should be treating user namespaces like pid namespaces? For autofs this would mean that when autofs wants a uid for some task, it would be given the uid in the user namespace which autofs 'knows'. It would also help me fix the siginfo problems I haven't solved yet - rather than having to worry about user namespace lifetimes with siginfos (which last a little while but have no clearly defined lifespan) we could send the uid in an init user namespace or the uid in the target uid namespace, or just a lightweight user struct proxy akin to 'struct pid'. And it also obviates the need for any sort of delegation. So if I'm user 500 in what I think is the initial user namespace, I can create a container with a new user namespace, the init task of which is both uid 0 in the child userns, and uid 500 in the higher level, automatically giving the container access to any files I own. Eric, when you get a chance (I know you're overloaded atm) I'd love to hear your thoughts on this... > Few, that was hard work, just for those last couple of sentences. > > Please, lets not go into the issue that the maps can change during a > restart, that's an issue for another time and isn't a kernel issue > anyway. > > > > > (Still trying to follow the rest of the thread bc i definately feel like > > I'm missing something. I swear I understood autofs 10+ years ago :) > > Me too, but now I've change it so much even I'm confused most of the > time, ;) > > Hopefully the above explanation is useful in some small way. > > Ian > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-29 16:09 ` Serge E. Hallyn @ 2008-02-29 16:20 ` Pavel Emelyanov 2008-02-29 17:42 ` Serge E. Hallyn 2008-03-02 0:49 ` Eric W. Biederman 2008-03-02 1:13 ` Eric W. Biederman 2 siblings, 1 reply; 40+ messages in thread From: Pavel Emelyanov @ 2008-02-29 16:20 UTC (permalink / raw) To: Serge E. Hallyn Cc: Ian Kent, Jeff Moyer, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Eric W. Biederman > The way the user namespaces work right now is similar to say the IPC > namespace - a task belongs to one user, that user belongs to precisely > one user namespace. > > Even in my additional userns patches, I was changing uid to store the > (uid, userns) so a struct user still belonged to just one user > namespace. > > In contrast, with pid namespaces a task is associated with a 'struct > pid' which links it to multiple process ids, one in each pid namespace > to which it belongs. > > Perhaps we should be treating user namespaces like pid namespaces? I'm afraid, that I'm just starting a new thread of discussion in a wrong place, but I can't refrain from asking "what for?" > So if I'm user 500 in what I think is the initial user namespace, I can > create a container with a new user namespace, the init task of which is > both uid 0 in the child userns, and uid 500 in the higher level, > automatically giving the container access to any files I own. So do you mean that I can become a root, by calling clone()? Thanks, Pavel ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-29 16:20 ` Pavel Emelyanov @ 2008-02-29 17:42 ` Serge E. Hallyn 0 siblings, 0 replies; 40+ messages in thread From: Serge E. Hallyn @ 2008-02-29 17:42 UTC (permalink / raw) To: Pavel Emelyanov Cc: Serge E. Hallyn, Ian Kent, Jeff Moyer, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Eric W. Biederman Quoting Pavel Emelyanov (xemul@openvz.org): > > The way the user namespaces work right now is similar to say the IPC > > namespace - a task belongs to one user, that user belongs to precisely > > one user namespace. > > > > Even in my additional userns patches, I was changing uid to store the > > (uid, userns) so a struct user still belonged to just one user > > namespace. > > > > In contrast, with pid namespaces a task is associated with a 'struct > > pid' which links it to multiple process ids, one in each pid namespace > > to which it belongs. > > > > Perhaps we should be treating user namespaces like pid namespaces? > > I'm afraid, that I'm just starting a new thread of discussion in a > wrong place, but I can't refrain from asking "what for?" For the reasons I listed there :) > > So if I'm user 500 in what I think is the initial user namespace, I can > > create a container with a new user namespace, the init task of which is > > both uid 0 in the child userns, and uid 500 in the higher level, > > automatically giving the container access to any files I own. > > So do you mean that I can become a root, by calling clone()? You can become root in the new container. Your capabilities are meaningful only to targets (users, files) which exist in the user namespace in which you are root. It becomes more precise than the CAP_NS_OVERRIDE approach in my last patchset. > Thanks, > Pavel ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-29 16:09 ` Serge E. Hallyn 2008-02-29 16:20 ` Pavel Emelyanov @ 2008-03-02 0:49 ` Eric W. Biederman 2008-03-02 1:13 ` Eric W. Biederman 2 siblings, 0 replies; 40+ messages in thread From: Eric W. Biederman @ 2008-03-02 0:49 UTC (permalink / raw) To: Serge E. Hallyn Cc: Ian Kent, Jeff Moyer, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov "Serge E. Hallyn" <serue@us.ibm.com> writes: > The way the user namespaces work right now is similar to say the IPC > namespace - a task belongs to one user, that user belongs to precisely > one user namespace. > > Even in my additional userns patches, I was changing uid to store the > (uid, userns) so a struct user still belonged to just one user > namespace. > > In contrast, with pid namespaces a task is associated with a 'struct > pid' which links it to multiple process ids, one in each pid namespace > to which it belongs. > > Perhaps we should be treating user namespaces like pid namespaces? I definitely think we should have some support like that. We already have code in nfsv4 and p9fs if I remember correctly to make between the user namespace of the server (which is string based) and the uids of the local machine. We also have a similar issue with security keys. I don't know if the strict hierarchical nature we have makes a lot of sense. One of the things that we should account for is that frequently user namespaces are kept in sync between multiple machines by system administrators. So in the real world user namespaces are a system administrator boundary. > For autofs this would mean that when autofs wants a uid for some task, > it would be given the uid in the user namespace which autofs > 'knows'. Yes. The user namespace of the process that opened the pipe I believe is the right choice there. > It would also help me fix the siginfo problems I haven't solved yet - > rather than having to worry about user namespace lifetimes with siginfos > (which last a little while but have no clearly defined lifespan) we > could send the uid in an init user namespace or the uid in the target > uid namespace, or just a lightweight user struct proxy akin to 'struct > pid'. Yes. For the pids we have been looking at sending the pid in the target namespace and sending the uid in the target namespace should be no more difficult. > And it also obviates the need for any sort of delegation. > > So if I'm user 500 in what I think is the initial user namespace, I can > create a container with a new user namespace, the init task of which is > both uid 0 in the child userns, and uid 500 in the higher level, > automatically giving the container access to any files I own. Right. Long term we want to look at making this an unprivileged operation. Allowing a user to run less privileged processes. My impression has always been that going from comparing (userns, uid) to a more sophisticated mapping approach was a compatible extension. However if it looks like we need user namespace mapping support up front to get the semantics clean I have no problem with that. > Eric, when you get a chance (I know you're overloaded atm) I'd love to > hear your thoughts on this... I think you are on the right track. In a lot of ways the user namespace is the trickiest, as this is where we change the security rules. If it is only at the level of who is who. Since we already have user namespace mapping infrastructure in the kernel and ways to call back to user space to ask what the mapping should be, I feel performing mapping in the user namespace and generalizing the existing capability is a good idea. We still want to tell users if you can get away with it synchronize your user namespaces across file systems, and kernels, and machines. If they can't having good general tools in the kernel that you only need to learn once and not once per instance sounds good. Eric ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-29 16:09 ` Serge E. Hallyn 2008-02-29 16:20 ` Pavel Emelyanov 2008-03-02 0:49 ` Eric W. Biederman @ 2008-03-02 1:13 ` Eric W. Biederman 2008-03-03 15:28 ` Serge E. Hallyn 2 siblings, 1 reply; 40+ messages in thread From: Eric W. Biederman @ 2008-03-02 1:13 UTC (permalink / raw) To: Serge E. Hallyn Cc: Ian Kent, Jeff Moyer, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov, Eric W. Biederman "Serge E. Hallyn" <serue@us.ibm.com> writes: > > The way the user namespaces work right now is similar to say the IPC > namespace - a task belongs to one user, that user belongs to precisely > one user namespace. > > Even in my additional userns patches, I was changing uid to store the > (uid, userns) so a struct user still belonged to just one user > namespace. > > In contrast, with pid namespaces a task is associated with a 'struct > pid' which links it to multiple process ids, one in each pid namespace > to which it belongs. > > Perhaps we should be treating user namespaces like pid namespaces? > > For autofs this would mean that when autofs wants a uid for some task, > it would be given the uid in the user namespace which autofs 'knows'. > > It would also help me fix the siginfo problems I haven't solved yet - > rather than having to worry about user namespace lifetimes with siginfos > (which last a little while but have no clearly defined lifespan) we > could send the uid in an init user namespace or the uid in the target > uid namespace, or just a lightweight user struct proxy akin to 'struct > pid'. > > And it also obviates the need for any sort of delegation. > > So if I'm user 500 in what I think is the initial user namespace, I can > create a container with a new user namespace, the init task of which is > both uid 0 in the child userns, and uid 500 in the higher level, > automatically giving the container access to any files I own. > > Eric, when you get a chance (I know you're overloaded atm) I'd love to > hear your thoughts on this... Succinctly. I think the concept of mapping uids between user namespaces is fundamental to properly describing and thinking about the semantics of user namespaces correct. We don't have to start out anything except handling the case when no mapping exists, but asking the question how does this uid map between from one namespace to another is fundamental. Eric ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-03-02 1:13 ` Eric W. Biederman @ 2008-03-03 15:28 ` Serge E. Hallyn 2008-03-04 22:16 ` Eric W. Biederman 0 siblings, 1 reply; 40+ messages in thread From: Serge E. Hallyn @ 2008-03-03 15:28 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Ian Kent, Jeff Moyer, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov Quoting Eric W. Biederman (ebiederm@xmission.com): > "Serge E. Hallyn" <serue@us.ibm.com> writes: > > > > The way the user namespaces work right now is similar to say the IPC > > namespace - a task belongs to one user, that user belongs to precisely > > one user namespace. > > > > Even in my additional userns patches, I was changing uid to store the > > (uid, userns) so a struct user still belonged to just one user > > namespace. > > > > In contrast, with pid namespaces a task is associated with a 'struct > > pid' which links it to multiple process ids, one in each pid namespace > > to which it belongs. > > > > Perhaps we should be treating user namespaces like pid namespaces? > > > > For autofs this would mean that when autofs wants a uid for some task, > > it would be given the uid in the user namespace which autofs 'knows'. > > > > It would also help me fix the siginfo problems I haven't solved yet - > > rather than having to worry about user namespace lifetimes with siginfos > > (which last a little while but have no clearly defined lifespan) we > > could send the uid in an init user namespace or the uid in the target > > uid namespace, or just a lightweight user struct proxy akin to 'struct > > pid'. > > > > And it also obviates the need for any sort of delegation. > > > > So if I'm user 500 in what I think is the initial user namespace, I can > > create a container with a new user namespace, the init task of which is > > both uid 0 in the child userns, and uid 500 in the higher level, > > automatically giving the container access to any files I own. > > > > Eric, when you get a chance (I know you're overloaded atm) I'd love to > > hear your thoughts on this... > > Succinctly. > > I think the concept of mapping uids between user namespaces is > fundamental to properly describing and thinking about the semantics of > user namespaces correct. Earlier I had thought this could just be done using a special keyring, but atm I'm thinking that would be far uglier than just having a struct pid-like credential proxy in the kernel to pass around in place of uids. > We don't have to start out anything except handling the case when > no mapping exists, but asking the question how does this uid map > between from one namespace to another is fundamental. True. But in any case I'm happy letting other things like netns and related sys be completed before prototyping this. thanks, -serge ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-03-03 15:28 ` Serge E. Hallyn @ 2008-03-04 22:16 ` Eric W. Biederman 0 siblings, 0 replies; 40+ messages in thread From: Eric W. Biederman @ 2008-03-04 22:16 UTC (permalink / raw) To: Serge E. Hallyn Cc: Ian Kent, Jeff Moyer, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Pavel Emelyanov "Serge E. Hallyn" <serue@us.ibm.com> writes: > Earlier I had thought this could just be done using a special keyring, > but atm I'm thinking that would be far uglier than just having a > struct pid-like credential proxy in the kernel to pass around in place > of uids. I have not looked at many of the implementation possibilities so unfortunately I don't know what makes for a good implementation. What I do know is that uids are serialized in filesystems, and their mapping between namespaces is defined by system administrators. Both of those properties are different from struct pid. Which means a generalized struct user in the kernel can at best hold a cache of the mappings. My preliminary investigations suggested that for the kernel filesystem boundary generating a struct user or a struct group just to use for a permission check and then to throw it away was wasteful. However for inkernel entities a struct user sounds practical. All of which is to say that we can learn lessons from the implementation of struct pid but that we also have different requirements so we can only use those lessons in a limited fashion. Eric ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 7:08 ` Ian Kent 2008-02-28 7:23 ` Andrew Morton @ 2008-02-28 7:51 ` Pavel Emelyanov 2008-02-28 7:59 ` Andrew Morton 2008-02-28 20:33 ` Eric W. Biederman 1 sibling, 2 replies; 40+ messages in thread From: Pavel Emelyanov @ 2008-02-28 7:51 UTC (permalink / raw) To: Ian Kent Cc: Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel, Eric W. Biederman Ian Kent wrote: > On Wed, 2008-02-27 at 22:37 -0800, Andrew Morton wrote: >> On Thu, 28 Feb 2008 15:22:27 +0900 Ian Kent <raven@themaw.net> wrote: >> >>>>> +++ linux-2.6.25-rc2-mm1/fs/autofs4/waitq.c 2008-02-20 13:10:23.000000000 +0900 >>>>> @@ -363,6 +363,38 @@ int autofs4_wait(struct autofs_sb_info * >>>>> >>>>> status = wq->status; >>>>> >>>>> + /* >>>>> + * For direct and offset mounts we need to track the requestor >>>>> + * uid and gid in the dentry info struct. This is so it can be >>>>> + * supplied, on request, by the misc device ioctl interface. >>>>> + * This is needed during daemon resatart when reconnecting >>>>> + * to existing, active, autofs mounts. The uid and gid (and >>>>> + * related string values) may be used for macro substitution >>>>> + * in autofs mount maps. >>>>> + */ >>>>> + if (!status) { >>>>> + struct dentry *de = NULL; >>>>> + >>>>> + /* direct mount or browsable map */ >>>>> + ino = autofs4_dentry_ino(dentry); >>>>> + if (!ino) { >>>>> + /* If not lookup actual dentry used */ >>>>> + de = d_lookup(dentry->d_parent, &dentry->d_name); >>>>> + ino = autofs4_dentry_ino(de); >>>>> + } >>>>> + >>>>> + /* Set mount requestor */ >>>>> + if (ino) { >>>>> + if (ino) { >>>>> + ino->uid = wq->uid; >>>>> + ino->gid = wq->gid; >>>>> + } >>>>> + } >>>>> + >>>>> + if (de) >>>>> + dput(de); >>>>> + } >>>>> + >>>> But uids and gids are no longer system-wide-unique. Two different users >>>> can have the same identifiers in different namespaces. What happens then? >>> That's a tricky question. >>> >>> Presumably, the process requesting the mount has the user space daemon >>> running in the namespace within which the uid and gid are to be looked >>> up, by the daemon. >>> >>> Am I missing something? >>> >> err, you assume more knowledge at this end about what you're trying to do >> than actually exists :) >> >> You seem to imply that if a machine is running 100 user namespaces then it >> needs to run 100 mount daemons. Doesn't seem good. > > More likely my lack of understanding of how namespaces are meant to > work. > >> What problem are you actually trying to solve here? > > The basic problem arises only when we want to restart the user space > daemon and there are active autofs managed mounts in place at exit (ie. > autofs mounts that have busy user mounts). They are left mounted and > processes using them continue to function. But then, when we startup > autofs we need to reconnect to these autofs mounts, some of which can > covered the by mounted file systems, and hence the need for another way > to open an ioctl descriptor to them. > > It may have been overkill to re-implement all the current ioctls (and > add a couple of other much needed ones) but I though it sensible for > completeness, and we get to identify any possible problems the current > ioctls might have had due to the use of the BKL (by the VFS when calling > the ioctls). > > So, why do we need the uid and gid? When someone walks over an autofs > dentry that is meant to cause a mount we send a request packet to the > daemon via a pipe which includes the process uid and gid, and as part of > the lookup we set macros for several mount map substitution variables, > derived from the uid and gid of the process requesting the mount and > they can be used within autofs maps. Why do we need the uid then? Is just pid not enough to uniquely identify a task? Assuming we can get by with a pid only, this problem can be solved by sending a pid_nr() of a task, i.e. the pid by which this task is seen from an initial namespace. This pid is unique across the system even when pid namespaces are created. But this ... trick is only valid if the daemon, that receives the pid doesn't try to communicate with this task (e.g. send him a signal), but just uses this as a key to lookup in some hash. This is not about security - even having someone's global pid task can do nothing useful with it - this is about the consistency - when sending a signal to a task, giving its _global_ pid to sys_kill() the signal may arrive to a wrong task if the sender lives in a sub-namespace. > This is all fine as long as we don't need to re-connect to these mounts > when starting up, since we don't get kernel requests for the mounts, we > need to obtain that information from the active mount itself. > > Ian > > > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 7:51 ` Pavel Emelyanov @ 2008-02-28 7:59 ` Andrew Morton 2008-02-28 8:06 ` Ian Kent 2008-02-28 20:33 ` Eric W. Biederman 1 sibling, 1 reply; 40+ messages in thread From: Andrew Morton @ 2008-02-28 7:59 UTC (permalink / raw) To: Pavel Emelyanov Cc: Ian Kent, Kernel Mailing List, autofs mailing list, linux-fsdevel, Eric W. Biederman On Thu, 28 Feb 2008 10:51:08 +0300 Pavel Emelyanov <xemul@openvz.org> wrote: > > So, why do we need the uid and gid? When someone walks over an autofs > > dentry that is meant to cause a mount we send a request packet to the > > daemon via a pipe which includes the process uid and gid, and as part of > > the lookup we set macros for several mount map substitution variables, > > derived from the uid and gid of the process requesting the mount and > > they can be used within autofs maps. > > Why do we need the uid then? Is just pid not enough to uniquely > identify a task? The problem is that the userspace daemon is restarted. ie: it exits and is re-run. It then needs to pick up various state from its previous run. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 7:59 ` Andrew Morton @ 2008-02-28 8:06 ` Ian Kent 2008-02-28 12:31 ` [autofs] " Fabio Olive Leite 0 siblings, 1 reply; 40+ messages in thread From: Ian Kent @ 2008-02-28 8:06 UTC (permalink / raw) To: Andrew Morton Cc: Pavel Emelyanov, Kernel Mailing List, autofs mailing list, linux-fsdevel, Eric W. Biederman On Wed, 2008-02-27 at 23:59 -0800, Andrew Morton wrote: > On Thu, 28 Feb 2008 10:51:08 +0300 Pavel Emelyanov <xemul@openvz.org> wrote: > > > > So, why do we need the uid and gid? When someone walks over an autofs > > > dentry that is meant to cause a mount we send a request packet to the > > > daemon via a pipe which includes the process uid and gid, and as part of > > > the lookup we set macros for several mount map substitution variables, > > > derived from the uid and gid of the process requesting the mount and > > > they can be used within autofs maps. > > > > Why do we need the uid then? Is just pid not enough to uniquely > > identify a task? > > The problem is that the userspace daemon is restarted. ie: it exits > and is re-run. It then needs to pick up various state from its previous > run. Dumb old me, I really only need the uid. The gid can come from the get user info functions of glibc. DOH! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [autofs] [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 8:06 ` Ian Kent @ 2008-02-28 12:31 ` Fabio Olive Leite 0 siblings, 0 replies; 40+ messages in thread From: Fabio Olive Leite @ 2008-02-28 12:31 UTC (permalink / raw) To: Ian Kent Cc: Andrew Morton, autofs mailing list, linux-fsdevel, Eric W. Biederman, Kernel Mailing List, Pavel Emelyanov On Thu, Feb 28, 2008 at 05:06:11PM +0900, Ian Kent wrote: > > Dumb old me, I really only need the uid. > The gid can come from the get user info functions of glibc. In case the process was executed from a setgid executable, you might have a different gid from what the user has. In fact, I don't know why you may need more than the pid, since with the pid you can get to the task's effective uid/gid and maybe other such information you need. Cheers, Fábio Olivé -- ex sed lex awk yacc, e pluribus unix, amem ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 3/4] autofs4 - track uid and gid of last mount requestor 2008-02-28 7:51 ` Pavel Emelyanov 2008-02-28 7:59 ` Andrew Morton @ 2008-02-28 20:33 ` Eric W. Biederman 1 sibling, 0 replies; 40+ messages in thread From: Eric W. Biederman @ 2008-02-28 20:33 UTC (permalink / raw) To: Pavel Emelyanov Cc: Ian Kent, Andrew Morton, Kernel Mailing List, autofs mailing list, linux-fsdevel Pavel Emelyanov <xemul@openvz.org> writes: > Why do we need the uid then? Is just pid not enough to uniquely > identify a task? > > Assuming we can get by with a pid only, this problem can be solved > by sending a pid_nr() of a task, i.e. the pid by which this task is > seen from an initial namespace. This pid is unique across the system > even when pid namespaces are created. Pavel it is never correct to use a global pid when talking to user space. In fact the concept is just a bit dubious. We must always translate the pid into the pid namespace of the task we are talking to, or at least into the pid namespace of the process that opened the file handle, (essentially the same, but does not have races in the corner cases). Even in the kernel using global ids is dubious. When dealing with user space it is just wrong. Speaking of. I think we still need work on autofs in this regard. I know last I looked we had some outstanding issues there. Eric ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-02-26 3:21 [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Ian Kent 2008-02-26 3:22 ` [PATCH 1/4] autofs4 - check for invalid dentry in getpath Ian Kent 2008-02-26 3:23 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor Ian Kent @ 2008-02-26 3:23 ` Ian Kent 2008-02-28 5:17 ` Andrew Morton 2008-02-26 4:29 ` [PATCH 2/4] autofs4 - add mount option to display mount device Ian Kent 2008-02-28 4:40 ` [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Andrew Morton 4 siblings, 1 reply; 40+ messages in thread From: Ian Kent @ 2008-02-26 3:23 UTC (permalink / raw) To: Andrew Morton; +Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel Hi Andrew, Patch to add miscellaneous device to autofs4 module for ioctls. Signed-off-by: Ian Kent < raven@themaw.net> Ian --- diff -up linux-2.6.25-rc2-mm1/fs/autofs4/expire.c.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/expire.c --- linux-2.6.25-rc2-mm1/fs/autofs4/expire.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/expire.c 2008-02-22 11:51:41.000000000 +0900 @@ -244,10 +244,10 @@ cont: } /* Check if we can expire a direct mount (possibly a tree) */ -static struct dentry *autofs4_expire_direct(struct super_block *sb, - struct vfsmount *mnt, - struct autofs_sb_info *sbi, - int how) +struct dentry *autofs4_expire_direct(struct super_block *sb, + struct vfsmount *mnt, + struct autofs_sb_info *sbi, + int how) { unsigned long timeout; struct dentry *root = dget(sb->s_root); @@ -281,10 +281,10 @@ static struct dentry *autofs4_expire_dir * - it is unused by any user process * - it has been unused for exp_timeout time */ -static struct dentry *autofs4_expire_indirect(struct super_block *sb, - struct vfsmount *mnt, - struct autofs_sb_info *sbi, - int how) +struct dentry *autofs4_expire_indirect(struct super_block *sb, + struct vfsmount *mnt, + struct autofs_sb_info *sbi, + int how) { unsigned long timeout; struct dentry *root = sb->s_root; diff -up linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/init.c --- linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/init.c 2008-02-22 11:51:41.000000000 +0900 @@ -29,11 +29,20 @@ static struct file_system_type autofs_fs static int __init init_autofs4_fs(void) { - return register_filesystem(&autofs_fs_type); + int err; + + err = register_filesystem(&autofs_fs_type); + if (err) + return err; + + err = autofs_dev_ioctl_init(); + + return err; } static void __exit exit_autofs4_fs(void) { + autofs_dev_ioctl_exit(); unregister_filesystem(&autofs_fs_type); } diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl 2008-02-22 11:51:41.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-22 11:51:41.000000000 +0900 @@ -14,6 +14,7 @@ /* Internal header file for autofs */ #include <linux/auto_fs4.h> +#include <linux/auto_dev-ioctl.h> #include <linux/mutex.h> #include <linux/list.h> @@ -40,6 +41,16 @@ #define DPRINTK(fmt,args...) do {} while(0) #endif +#define WARN(fmt,args...) \ +do { \ + printk("KERN_WARNING pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \ +} while(0) + +#define ERROR(fmt,args...) \ +do { \ + printk("KERN_ERR pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \ +} while(0) + /* Unified info structure. This is pointed to by both the dentry and inode structures. Each file in the filesystem has an instance of this structure. It holds a reference to the dentry, so dentries are never @@ -172,6 +183,17 @@ int autofs4_expire_run(struct super_bloc struct autofs_packet_expire __user *); int autofs4_expire_multi(struct super_block *, struct vfsmount *, struct autofs_sb_info *, int __user *); +struct dentry *autofs4_expire_direct(struct super_block *sb, + struct vfsmount *mnt, + struct autofs_sb_info *sbi, int how); +struct dentry *autofs4_expire_indirect(struct super_block *sb, + struct vfsmount *mnt, + struct autofs_sb_info *sbi, int how); + +/* Device node initialization */ + +int autofs_dev_ioctl_init(void); +void autofs_dev_ioctl_exit(void); /* Operations structures */ diff -up linux-2.6.25-rc2-mm1/fs/autofs4/Makefile.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/Makefile --- linux-2.6.25-rc2-mm1/fs/autofs4/Makefile.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/Makefile 2008-02-22 11:51:41.000000000 +0900 @@ -4,4 +4,4 @@ obj-$(CONFIG_AUTOFS4_FS) += autofs4.o -autofs4-objs := init.o inode.o root.o symlink.o waitq.o expire.o +autofs4-objs := init.o inode.o root.o symlink.o waitq.o expire.o dev-ioctl.o diff -up /dev/null linux-2.6.25-rc2-mm1/fs/autofs4/dev-ioctl.c --- /dev/null 2008-02-22 18:55:57.149000956 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/dev-ioctl.c 2008-02-22 11:53:33.000000000 +0900 @@ -0,0 +1,788 @@ +/* + * linux/fs/autofs4/dev-ioctl.c + * + * Copyright 2008 Red Hat, Inc. All rights reserved. + * Copyright 2008 Ian Kent <raven@themaw.net> + * + * This file is part of the Linux kernel and is made available under + * the terms of the GNU General Public License, version 2, or at your + * option, any later version, incorporated herein by reference. + */ + +#include <linux/module.h> +#include <linux/vmalloc.h> +#include <linux/miscdevice.h> +#include <linux/init.h> +#include <linux/wait.h> +#include <linux/namei.h> +#include <linux/fcntl.h> +#include <linux/file.h> +#include <linux/sched.h> +#include <linux/compat.h> +#include <linux/syscalls.h> +#include <linux/smp_lock.h> +#include <linux/magic.h> +#include <linux/dcache.h> +#include <asm/uaccess.h> + +#include "autofs_i.h" + +/* + * This module implements an interface for routing autofs ioctl control + * commands via a miscellaneous device file. + * + * The alternate interface is needed because we need to be able open + * an ioctl file descriptor on an autofs mount that may be covered by + * another mount. This situation arises when starting automount(8) + * or other user space daemon which uses direct mounts or offset + * mounts (used for autofs lazy mount/umount of nested mount trees), + * which have been left busy at at service shutdown. + */ + +#define AUTOFS_DEVICE_NAME "autofs" + +#define AUTOFS_DEV_IOCTL_IOC_FIRST AUTOFS_DEV_IOCTL_VERSION +#define AUTOFS_DEV_IOCTL_IOC_COUNT AUTOFS_IOC_COUNT - 11 + +#define AUTOFS_DEV_IOCTL_SIZE sizeof(struct autofs_dev_ioctl) + +typedef int (*ioctl_fn)(struct file *, struct autofs_sb_info *, struct autofs_dev_ioctl *); + +static int check_name(const char *name) +{ + if (!strchr(name, '/')) + return -EINVAL; + return 0; +} + +/* + * Check a string doesn't overrun the chunk of + * memory we copied from user land. + */ +static int invalid_str(char *str, void *end) +{ + while ((void *) str <= end) + if (!*str++) + return 0; + return -EINVAL; +} + +/* + * Check that the user compiled against correct version of autofs + * misc device code. + * + * As well as checking the version compatibility this always copies + * the kernel interface version out. + */ +static int check_dev_ioctl_version(int cmd, struct autofs_dev_ioctl *param) +{ + int err = 0; + + if ((AUTOFS_DEV_IOCTL_VERSION_MAJOR != param->ver_major) || + (AUTOFS_DEV_IOCTL_VERSION_MINOR < param->ver_minor)) { + WARN("ioctl control interface version mismatch: " + "kernel(%u.%u), user(%u.%u), cmd(%d)", + AUTOFS_DEV_IOCTL_VERSION_MAJOR, + AUTOFS_DEV_IOCTL_VERSION_MINOR, + param->ver_major, param->ver_minor, cmd); + err = -EINVAL; + } + + /* Fill in the kernel version. */ + param->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR; + param->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR; + + return err; +} + +/* + * Copy parameter control struct, including a possible path allocated + * at the end of the struct. + */ +static struct autofs_dev_ioctl *copy_dev_ioctl(struct autofs_dev_ioctl __user *in) +{ + struct autofs_dev_ioctl tmp, *ads; + + if (copy_from_user(&tmp, in, sizeof(tmp))) + return ERR_PTR(-EFAULT); + + if (tmp.size < sizeof(tmp)) + return ERR_PTR(-EINVAL); + + ads = kmalloc(tmp.size, GFP_KERNEL); + if (!ads) + return ERR_PTR(-ENOMEM); + + if (copy_from_user(ads, in, tmp.size)) { + vfree(ads); + return ERR_PTR(-EFAULT); + } + + return ads; +} + +static inline void free_dev_ioctl(struct autofs_dev_ioctl *param) +{ + kfree(param); + return; +} + +/* + * Check sanity of parameter control fields and if a path is present + * check that it has a "/" and is terminated. + */ +static int validate_dev_ioctl(int cmd, struct autofs_dev_ioctl *param) +{ + int err = -EINVAL; + + if (check_dev_ioctl_version(cmd, param)) { + WARN("invalid device control module version " + "supplied for cmd(0x%08x)", cmd); + goto out; + } + + if (param->size > sizeof(*param)) { + err = check_name(param->path); + if (err) { + WARN("invalid path supplied for cmd(0x%08x)", cmd); + goto out; + } + + err = invalid_str(param->path, + (void *) ((size_t) param + param->size)); + if (err) { + WARN("invalid path supplied for cmd(0x%08x)", cmd); + goto out; + } + } + + err = 0; +out: + return err; +} + +/* + * Get the autofs super block info struct from the file opened on + * the autofs mount point. + */ +static struct autofs_sb_info *autofs_dev_ioctl_sbi(struct file *f) +{ + struct autofs_sb_info *sbi = NULL; + struct inode *inode; + + if (f) { + inode = f->f_path.dentry->d_inode; + sbi = autofs4_sbi(inode->i_sb); + } + + return sbi; +} + +/* Return autofs module protocol version */ +static inline int autofs_dev_ioctl_protover(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + param->arg1 = sbi->version; + return 0; +} + +/* Return autofs module protocol sub version */ +static inline int autofs_dev_ioctl_protosubver(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + param->arg1 = sbi->sub_version; + return 0; +} + +/* + * Walk down the mount stack looking for an autofs mount that + * has the requested device number (aka. new_encode_dev(sb->s_dev). + */ +static int autofs_dev_ioctl_find_super(struct nameidata *nd, dev_t devno) +{ + struct dentry *dentry; + struct inode *inode; + struct super_block *sb; + dev_t s_dev; + unsigned int err; + + err = -ENOENT; + + /* Lookup the dentry name at the base of our mount point */ + dentry = d_lookup(nd->path.dentry, &nd->last); + if (!dentry) + goto out; + + dput(nd->path.dentry); + nd->path.dentry = dentry; + + /* And follow the mount stack looking for our autofs mount */ + while (1) { + inode = nd->path.dentry->d_inode; + if (!inode) + continue; + + sb = inode->i_sb; + s_dev = new_encode_dev(sb->s_dev); + if (devno == s_dev) { + if (sb->s_magic == AUTOFS_SUPER_MAGIC) { + err = 0; + break; + } + } + + if (!follow_down(&nd->path.mnt, &nd->path.dentry)) + goto out; + } + +out: + return err; +} + +/* + * Install the file opened for autofs mount point control functions + * and set close on exec. + */ +static void autofs_dev_ioctl_fd_install(unsigned int fd, struct file *file) +{ + struct files_struct *files = current->files; + struct fdtable *fdt; + + spin_lock(&files->file_lock); + fdt = files_fdtable(files); + BUG_ON(fdt->fd[fd] != NULL); + rcu_assign_pointer(fdt->fd[fd], file); + FD_SET(fd, fdt->close_on_exec); + spin_unlock(&files->file_lock); +} + +/* + * Open a file descriptor on the autofs mount point corresponding + * to the given path and device number (aka. new_encode_dev(sb->s_dev)). + */ +static int autofs_dev_ioctl_open_mountpoint(const char *path, dev_t devid) +{ + struct file *filp; + struct nameidata nd; + int err, fd; + + fd = get_unused_fd(); + if (fd >= 0) { + /* Get nameidata of the parent directory */ + err = path_lookup(path, LOOKUP_PARENT, &nd); + if (err) + goto out; + + /* + * Search down, within the parent, looking for an + * autofs super block that has the device number + * corresponding to the autofs fs we want to open. + */ + err = autofs_dev_ioctl_find_super(&nd, devid); + if (err) + goto out_put; + + filp = dentry_open(nd.path.dentry, nd.path.mnt, O_RDONLY); + if (IS_ERR(filp)) { + err = PTR_ERR(filp); + goto out_put; + } + + autofs_dev_ioctl_fd_install(fd, filp); + } + + return fd; + +out_put: + path_put(&nd.path); +out: + put_unused_fd(fd); + return err; +} + +/* Open a file descriptor on an autofs mount point */ +static int autofs_dev_ioctl_openmount(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + const char *path; + dev_t devid; + int err, fd; + + /* param->path has already been checked */ + if (!param->arg1) + return -EINVAL; + + err = 0; + + devid = param->arg1; + path = param->path; + + fd = autofs_dev_ioctl_open_mountpoint(path, devid); + if (fd < 0) { + err = fd; + goto out; + } + + param->ioctlfd = fd; +out: + return err; +} + +/* Close file descriptor allocated above (user can also use close(2)). */ +static inline int autofs_dev_ioctl_closemount(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + return sys_close(param->ioctlfd); +} + +/* + * Send "ready" status for an existing wait (either a mount or an expire + * request). + */ +static inline int autofs_dev_ioctl_ready(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + autofs_wqt_t token; + + token = (autofs_wqt_t) param->arg1; + return autofs4_wait_release(sbi, token, 0); +} + +/* + * Send "fail" status for an existing wait (either a mount or an expire + * request). + */ +static inline int autofs_dev_ioctl_fail(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + autofs_wqt_t token; + int status; + + token = (autofs_wqt_t) param->arg1; + status = param->arg2 ? param->arg2 : -ENOENT; + return autofs4_wait_release(sbi, token, status); +} + +/* + * Set the pipe fd for kernel communication to the daemon. + * + * Normally this is set at mount using an option but if we + * are reconnecting to a busy mount then we need to use this + * to tell the autofs mount about the new kernel pipe fd. In + * order to protect mounts against incorrectly setting the + * pipefd we also require that the autofs mount be catatonic. + * + * This also sets the process group id used to identify the + * controlling process (eg. the owning automount(8) daemon). + */ +static int autofs_dev_ioctl_setpipefd(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + int pipefd; + int err = 0; + + if (param->arg1 == -1) + return -EINVAL; + + pipefd = param->arg1; + + if (!sbi->catatonic) + return -EBUSY; + else { + struct file *pipe = fget(pipefd); + if (!pipe->f_op || !pipe->f_op->write) { + err = -EPIPE; + fput(pipe); + goto out; + } + sbi->oz_pgrp = task_pgrp_nr(current); + sbi->pipefd = pipefd; + sbi->pipe = pipe; + sbi->catatonic = 0; + } +out: + return err; +} + +/* + * Make the autofs mount point catatonic, no longer responsive to + * mount requests. Also closes the kernel pipe file descriptor. + */ +static inline int autofs_dev_ioctl_catatonic(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + if (!sbi->catatonic) + autofs4_catatonic_mode(sbi); + return 0; +} + +/* Set the autofs mount timeout */ +static inline int autofs_dev_ioctl_timeout(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + unsigned long timeout; + + timeout = param->arg1; + param->arg1 = sbi->exp_timeout / HZ; + sbi->exp_timeout = timeout * HZ; + return 0; +} + +/* + * Return the uid and gid of the last request for the mount + * + * When reconstructing an autofs mount tree with active mounts + * we need to re-connect to mounts that may have used the original + * process uid and gid (or string variations of them) for mount + * lookups within the map entry. + */ +static inline int autofs_dev_ioctl_requestor(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + struct autofs_info *ino; + struct nameidata nd; + const char *path; + dev_t devid; + int err = -ENOENT; + + if (param->size <= sizeof(*param)) { + err = -EINVAL; + goto out; + } + + path = param->path; + devid = sbi->sb->s_dev; + + param->arg1 = param->arg2 = -1; + + /* Get nameidata of the parent directory */ + err = path_lookup(path, LOOKUP_PARENT, &nd); + if (err) + goto out; + + err = autofs_dev_ioctl_find_super(&nd, devid); + if (err) + goto out_release; + + ino = autofs4_dentry_ino(nd.path.dentry); + if (ino) { + err = 0; + param->arg1 = ino->uid; + param->arg2 = ino->gid; + } + +out_release: + path_put(&nd.path); +out: + return err; +} + +/* + * Call repeatedly until it returns -EAGAIN, meaning there's nothing + * more that can be done. + */ +static int autofs_dev_ioctl_expire(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + struct dentry *dentry; + struct vfsmount *mnt; + int err = -EAGAIN; + int when; + + when = param->arg1; + mnt = fp->f_path.mnt; + + if (sbi->type & AUTOFS_TYPE_DIRECT) + dentry = autofs4_expire_direct(sbi->sb, mnt, sbi, when); + else + dentry = autofs4_expire_indirect(sbi->sb, mnt, sbi, when); + + if (dentry) { + struct autofs_info *ino = autofs4_dentry_ino(dentry); + + /* + * This is synchronous because it makes the daemon a + * little easier + */ + ino->flags |= AUTOFS_INF_EXPIRING; + err = autofs4_wait(sbi, dentry, NFY_EXPIRE); + ino->flags &= ~AUTOFS_INF_EXPIRING; + dput(dentry); + } + + return err; +} + +/* Check if autofs mount point is in use */ +static inline int autofs_dev_ioctl_askumount(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + param->arg1 = 0; + if (may_umount(fp->f_path.mnt)) + param->arg1 = 1; + return 0; +} + +/* + * Check if the given path is a mountpoint. + * + * If we are supplied with the file descriptor of the autofs + * mount we're looking for a specific mount. In this case + * the path is considered a mountpoint if it is itself a + * mountpoint or contains a mount, such as a multi-mount + * without a root mount. + * + * If we aren't supplied with a file descriptor then we lookup + * the nameidata of the path and check if is a mounted autofs + * filesystem or a filesystem mounted within an autofs filesystem. + * + * In both cases we return an indication of whether the path + * is a mount point and the super magic of the covering mount. + */ +static inline int autofs_dev_ioctl_ismountpoint(struct file *fp, + struct autofs_sb_info *sbi, + struct autofs_dev_ioctl *param) +{ + struct nameidata nd; + const char *path; + int err = -ENOENT; + + if (param->size <= sizeof(*param)) { + err = -EINVAL; + goto out; + } + + path = param->path; + + param->arg1 = param->arg2 = 0; + + if (param->ioctlfd == -1) { + unsigned long magic; + + err = path_lookup(path, LOOKUP_FOLLOW, &nd); + if (err) + goto out; + + magic = nd.path.dentry->d_inode->i_sb->s_magic; + + if (follow_up(&nd.path.mnt, &nd.path.dentry)) { + struct inode *inode = nd.path.dentry->d_inode; + if (magic == AUTOFS_SUPER_MAGIC || + inode->i_sb->s_magic == AUTOFS_SUPER_MAGIC) { + param->arg1 = 1; + param->arg2 = magic; + } + } + } else { + dev_t devid = sbi->sb->s_dev; + + err = path_lookup(path, LOOKUP_PARENT, &nd); + if (err) + goto out; + + err = autofs_dev_ioctl_find_super(&nd, devid); + if (err) + goto out_release; + + param->arg1 = have_submounts(nd.path.dentry); + + if (d_mountpoint(nd.path.dentry)) { + if (follow_down(&nd.path.mnt, &nd.path.dentry)) { + struct inode *inode = nd.path.dentry->d_inode; + param->arg2 = inode->i_sb->s_magic; + } + } + } + +out_release: + path_put(&nd.path); +out: + return err; +} + +/* + * Our range of ioctl numbers isn't 0 based so we need to shift + * the array index by _IOC_NR(AUTOFS_CTL_IOC_FIRST) for the table + * lookup. + */ +#define cmd_idx(cmd) (cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST)) + +static ioctl_fn lookup_dev_ioctl(unsigned int cmd) +{ + static struct { + int cmd; + ioctl_fn fn; + } _ioctls[] = { + {cmd_idx(AUTOFS_DEV_IOCTL_VERSION_CMD), NULL}, + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOVER_CMD), autofs_dev_ioctl_protover}, + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD), autofs_dev_ioctl_protosubver}, + {cmd_idx(AUTOFS_DEV_IOCTL_OPENMOUNT_CMD), autofs_dev_ioctl_openmount}, + {cmd_idx(AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD), autofs_dev_ioctl_closemount}, + {cmd_idx(AUTOFS_DEV_IOCTL_READY_CMD), autofs_dev_ioctl_ready}, + {cmd_idx(AUTOFS_DEV_IOCTL_FAIL_CMD), autofs_dev_ioctl_fail}, + {cmd_idx(AUTOFS_DEV_IOCTL_SETPIPEFD_CMD), autofs_dev_ioctl_setpipefd}, + {cmd_idx(AUTOFS_DEV_IOCTL_CATATONIC_CMD), autofs_dev_ioctl_catatonic}, + {cmd_idx(AUTOFS_DEV_IOCTL_TIMEOUT_CMD), autofs_dev_ioctl_timeout}, + {cmd_idx(AUTOFS_DEV_IOCTL_REQUESTOR_CMD), autofs_dev_ioctl_requestor}, + {cmd_idx(AUTOFS_DEV_IOCTL_EXPIRE_CMD), autofs_dev_ioctl_expire}, + {cmd_idx(AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD), autofs_dev_ioctl_askumount}, + {cmd_idx(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD), autofs_dev_ioctl_ismountpoint} + }; + int idx = cmd_idx(cmd); + + return (idx >= ARRAY_SIZE(_ioctls)) ? NULL : _ioctls[idx].fn; +} + +/* ioctl dispatcher */ +static int _autofs_dev_ioctl(unsigned int command, struct autofs_dev_ioctl __user *user) +{ + struct autofs_dev_ioctl *param; + struct file *fp; + struct autofs_sb_info *sbi; + unsigned int cmd; + ioctl_fn fn = NULL; + int err = 0; + + /* only root can play with this */ + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + cmd = _IOC_NR(command); + + if (_IOC_TYPE(command) != _IOC_TYPE(AUTOFS_DEV_IOCTL_IOC_FIRST) || + cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST) >= AUTOFS_DEV_IOCTL_IOC_COUNT) { + return -ENOTTY; + } + + fn = lookup_dev_ioctl(cmd); + if (!fn) { + WARN("unknown command 0x%08x", command); + return -ENOTTY; + } + + /* + * Trying to avoid low memory issues when a device is + * suspended. + */ + current->flags |= PF_MEMALLOC; + + /* Copy the parameters into kernel space. */ + param = copy_dev_ioctl(user); + + current->flags &= ~PF_MEMALLOC; + + if (IS_ERR(param)) + return PTR_ERR(param); + + err = validate_dev_ioctl(command, param); + if (err) + goto out; + + /* The validate routine above always sets the version */ + if (cmd == AUTOFS_DEV_IOCTL_VERSION_CMD) + goto done; + + fp = NULL; + sbi = NULL; + + /* + * For obvious reasons the openmount can't have a file + * descriptor yet. We don't take a reference to the + * file during close to allow for immediate release. + */ + if (cmd != AUTOFS_DEV_IOCTL_OPENMOUNT_CMD && + cmd != AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD) { + fp = fget(param->ioctlfd); + if (!fp) { + if (cmd == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD) + goto cont; + err = -EBADF; + goto out; + } + + if (!fp->f_op) { + err = -ENOTTY; + fput(fp); + goto out; + } + + sbi = autofs_dev_ioctl_sbi(fp); + if (!sbi) { + err = -EINVAL; + fput(fp); + goto out; + } + } +cont: + err = fn(fp, sbi, param); + + if (fp) + fput(fp); +done: + if (!err && copy_to_user(user, param, AUTOFS_DEV_IOCTL_SIZE)) + err = -EFAULT; +out: + free_dev_ioctl(param); + return err; +} + +static long autofs_dev_ioctl(struct file *file, uint command, ulong u) +{ + return (long) _autofs_dev_ioctl(command, (struct autofs_dev_ioctl __user *) u); +} + +#ifdef CONFIG_COMPAT +static long autofs_dev_ioctl_compat(struct file *file, uint command, ulong u) +{ + return (long) autofs_dev_ioctl(file, command, (ulong) compat_ptr(u)); +} +#else +#define autofs_dev_ioctl_compat NULL +#endif + +static const struct file_operations _dev_ioctl_fops = { + .unlocked_ioctl = autofs_dev_ioctl, + .compat_ioctl = autofs_dev_ioctl_compat, + .owner = THIS_MODULE, +}; + +static struct miscdevice _autofs_dev_ioctl_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = AUTOFS_DEVICE_NAME, + .fops = &_dev_ioctl_fops +}; + +/* Register/deregister misc character device */ +int autofs_dev_ioctl_init(void) +{ + int r; + + r = misc_register(&_autofs_dev_ioctl_misc); + if (r) { + ERROR("misc_register failed for control device"); + return r; + } + + return 0; +} + +void autofs_dev_ioctl_exit(void) +{ + if (misc_deregister(&_autofs_dev_ioctl_misc) < 0) + ERROR("misc_deregister failed for control device"); + + return; +} + diff -up linux-2.6.25-rc2-mm1/include/linux/auto_fs4.h.device-node-ioctl linux-2.6.25-rc2-mm1/include/linux/auto_fs4.h --- linux-2.6.25-rc2-mm1/include/linux/auto_fs4.h.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900 +++ linux-2.6.25-rc2-mm1/include/linux/auto_fs4.h 2008-02-22 11:51:41.000000000 +0900 @@ -23,7 +23,7 @@ #define AUTOFS_MIN_PROTO_VERSION 3 #define AUTOFS_MAX_PROTO_VERSION 5 -#define AUTOFS_PROTO_SUBVERSION 0 +#define AUTOFS_PROTO_SUBVERSION 1 /* Mask for expire behaviour */ #define AUTOFS_EXP_IMMEDIATE 1 diff -up /dev/null linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h --- /dev/null 2008-02-22 18:55:57.149000956 +0900 +++ linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h 2008-02-22 11:51:41.000000000 +0900 @@ -0,0 +1,114 @@ +/* + * linux/include/linux/auto_dev-ioctl.h + * + * Copyright 2008 Red Hat, Inc. All rights reserved. + * Copyright 2008 Ian Kent <raven@themaw.net> + * + * This file is part of the Linux kernel and is made available under + * the terms of the GNU General Public License, version 2, or at your + * option, any later version, incorporated herein by reference. + */ + +#ifndef _LINUX_AUTO_DEV_IOCTL_H +#define _LINUX_AUTO_DEV_IOCTL_H + +#include <linux/types.h> + +#define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1 +#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0 + +#define AUTOFS_DEVID_LEN 16 + +/* + * An ioctl interface for autofs mount point control. + */ + +/* + * All the ioctls use this structure. + * When sending a path size must account for the total length + * of the chunk of memory otherwise is is the size of the + * structure. + */ + +struct autofs_dev_ioctl { + __u32 ver_major; + __u32 ver_minor; + __u32 size; /* total size of data passed in + * including this struct */ + __s32 ioctlfd; /* automount command fd */ + + __u32 arg1; /* Command parameters */ + __u32 arg2; + + char path[0]; +}; + +static inline void init_autofs_dev_ioctl(struct autofs_dev_ioctl *in) +{ + in->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR; + in->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR; + in->size = sizeof(struct autofs_dev_ioctl); + in->ioctlfd = -1; + in->arg1 = 0; + in->arg2 = 0; + return; +} + +/* + * If you change this make sure you make the corresponding change + * to autofs-dev-ioctl.c:lookup_ioctl() + */ +enum { + /* Get various version info */ + AUTOFS_DEV_IOCTL_VERSION_CMD = 0x71, + AUTOFS_DEV_IOCTL_PROTOVER_CMD, + AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD, + + /* Open mount ioctl fd */ + AUTOFS_DEV_IOCTL_OPENMOUNT_CMD, + + /* Close mount ioctl fd */ + AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD, + + /* Mount/expire status returns */ + AUTOFS_DEV_IOCTL_READY_CMD, + AUTOFS_DEV_IOCTL_FAIL_CMD, + + /* Activate/deactivate autofs mount */ + AUTOFS_DEV_IOCTL_SETPIPEFD_CMD, + AUTOFS_DEV_IOCTL_CATATONIC_CMD, + + /* Expiry timeout */ + AUTOFS_DEV_IOCTL_TIMEOUT_CMD, + + /* Get mount last requesting uid and gid */ + AUTOFS_DEV_IOCTL_REQUESTOR_CMD, + + /* Check for eligible expire candidates */ + AUTOFS_DEV_IOCTL_EXPIRE_CMD, + + /* Request busy status */ + AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD, + + /* Check if path is a mountpoint */ + AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD, +}; + +#define AUTOFS_IOCTL 0x93 + +#define AUTOFS_DEV_IOCTL_VERSION _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_VERSION_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_PROTOVER _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_PROTOVER_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_PROTOSUBVER _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_OPENMOUNT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_CLOSEMOUNT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_READY _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_READY_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_FAIL _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_FAIL_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_SETPIPEFD _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_SETPIPEFD_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_CATATONIC _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_CATATONIC_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_TIMEOUT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_TIMEOUT_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_REQUESTOR _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_REQUESTOR_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_EXPIRE _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_EXPIRE_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_ASKUMOUNT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD, struct autofs_dev_ioctl) +#define AUTOFS_DEV_IOCTL_ISMOUNTPOINT _IOWR(AUTOFS_IOCTL, AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD, struct autofs_dev_ioctl) + +#endif /* _LINUX_AUTO_DEV_IOCTL_H */ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-02-26 3:23 ` [PATCH 4/4] autofs4 - add miscelaneous device for ioctls Ian Kent @ 2008-02-28 5:17 ` Andrew Morton 2008-02-28 6:18 ` Ian Kent 2008-02-29 16:24 ` Ian Kent 0 siblings, 2 replies; 40+ messages in thread From: Andrew Morton @ 2008-02-28 5:17 UTC (permalink / raw) To: Ian Kent Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Christoph Hellwig, Al Viro On Tue, 26 Feb 2008 12:23:55 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > Hi Andrew, > > Patch to add miscellaneous device to autofs4 module for > ioctls. Could you please document the new kernel interface which you're proposing? In Docmentation/ or in the changelog? We seem to be passing some string into a miscdevice ioctl and getting some results back. Be aware that this won't be a terribly popular proposal, so I'd suggest that you fully describe the problem which it's trying to solve, and how it solves it, and why the various alternatives (sysfs, netlink, mount options, etc) were judged unsuitable. > ... > > --- linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900 > +++ linux-2.6.25-rc2-mm1/fs/autofs4/init.c 2008-02-22 11:51:41.000000000 +0900 > @@ -29,11 +29,20 @@ static struct file_system_type autofs_fs > > static int __init init_autofs4_fs(void) > { > - return register_filesystem(&autofs_fs_type); > + int err; > + > + err = register_filesystem(&autofs_fs_type); > + if (err) > + return err; > + > + err = autofs_dev_ioctl_init(); > + > + return err; > } We should run unregister_filesystem() if autofs_dev_ioctl_init() fails. > static void __exit exit_autofs4_fs(void) > { > + autofs_dev_ioctl_exit(); > unregister_filesystem(&autofs_fs_type); > } > > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h > --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl 2008-02-22 11:51:41.000000000 +0900 > +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-22 11:51:41.000000000 +0900 > @@ -14,6 +14,7 @@ > /* Internal header file for autofs */ > > #include <linux/auto_fs4.h> > +#include <linux/auto_dev-ioctl.h> > #include <linux/mutex.h> > #include <linux/list.h> > > @@ -40,6 +41,16 @@ > #define DPRINTK(fmt,args...) do {} while(0) > #endif > > +#define WARN(fmt,args...) \ > +do { \ > + printk("KERN_WARNING pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \ > +} while(0) > + > +#define ERROR(fmt,args...) \ > +do { \ > + printk("KERN_ERR pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \ > +} while(0) Please always pass all diffs through scripts/checkpatch.pl. > /* Unified info structure. This is pointed to by both the dentry and > inode structures. Each file in the filesystem has an instance of this > structure. It holds a reference to the dentry, so dentries are never > @@ -172,6 +183,17 @@ int autofs4_expire_run(struct super_bloc > struct autofs_packet_expire __user *); > int autofs4_expire_multi(struct super_block *, struct vfsmount *, > struct autofs_sb_info *, int __user *); > +struct dentry *autofs4_expire_direct(struct super_block *sb, > + struct vfsmount *mnt, > + struct autofs_sb_info *sbi, int how); > +struct dentry *autofs4_expire_indirect(struct super_block *sb, > + struct vfsmount *mnt, > + struct autofs_sb_info *sbi, int how); > + > +/* Device node initialization */ > + > +int autofs_dev_ioctl_init(void); > +void autofs_dev_ioctl_exit(void); > > ... > > @@ -0,0 +1,788 @@ > +/* > + * linux/fs/autofs4/dev-ioctl.c We prefer not to bother with the filename-in-the-file thing. You know what file you're reading, and these things tend to not get updated across renames. > + * Copyright 2008 Red Hat, Inc. All rights reserved. > + * Copyright 2008 Ian Kent <raven@themaw.net> > + * > + * This file is part of the Linux kernel and is made available under > + * the terms of the GNU General Public License, version 2, or at your > + * option, any later version, incorporated herein by reference. > + */ > + > +#include <linux/module.h> > +#include <linux/vmalloc.h> > +#include <linux/miscdevice.h> > +#include <linux/init.h> > +#include <linux/wait.h> > +#include <linux/namei.h> > +#include <linux/fcntl.h> > +#include <linux/file.h> > +#include <linux/sched.h> > +#include <linux/compat.h> > +#include <linux/syscalls.h> > +#include <linux/smp_lock.h> > +#include <linux/magic.h> > +#include <linux/dcache.h> > +#include <asm/uaccess.h> Please include <linux/foo.h> rather than <asm/foo.h> if the former exists. > +#include "autofs_i.h" > + > +/* > + * This module implements an interface for routing autofs ioctl control > + * commands via a miscellaneous device file. > + * > + * The alternate interface is needed because we need to be able open > + * an ioctl file descriptor on an autofs mount that may be covered by > + * another mount. This situation arises when starting automount(8) > + * or other user space daemon which uses direct mounts or offset > + * mounts (used for autofs lazy mount/umount of nested mount trees), > + * which have been left busy at at service shutdown. > + */ > + > +#define AUTOFS_DEVICE_NAME "autofs" > + > +#define AUTOFS_DEV_IOCTL_IOC_FIRST AUTOFS_DEV_IOCTL_VERSION > +#define AUTOFS_DEV_IOCTL_IOC_COUNT AUTOFS_IOC_COUNT - 11 This needs parentheses. Shouldn't these be in a header file, exported to userspace builds? > +#define AUTOFS_DEV_IOCTL_SIZE sizeof(struct autofs_dev_ioctl) > + > +typedef int (*ioctl_fn)(struct file *, struct autofs_sb_info *, struct autofs_dev_ioctl *); > + > +static int check_name(const char *name) > +{ > + if (!strchr(name, '/')) > + return -EINVAL; > + return 0; > +} > + > +/* > + * Check a string doesn't overrun the chunk of > + * memory we copied from user land. > + */ > +static int invalid_str(char *str, void *end) > +{ > + while ((void *) str <= end) > + if (!*str++) > + return 0; > + return -EINVAL; > +} > > ... > > +/* > + * Get the autofs super block info struct from the file opened on > + * the autofs mount point. > + */ > +static struct autofs_sb_info *autofs_dev_ioctl_sbi(struct file *f) > +{ > + struct autofs_sb_info *sbi = NULL; > + struct inode *inode; > + > + if (f) { `f' cannot be NULL here. > + inode = f->f_path.dentry->d_inode; > + sbi = autofs4_sbi(inode->i_sb); > + } > + > + return sbi; > +} > + > +/* Return autofs module protocol version */ > +static inline int autofs_dev_ioctl_protover(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) > +{ > + param->arg1 = sbi->version; > + return 0; > +} > + > +/* Return autofs module protocol sub version */ > +static inline int autofs_dev_ioctl_protosubver(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) > +{ > + param->arg1 = sbi->sub_version; > + return 0; > +} Don't bother inlining things - the compiler will do it. > +/* > + * Walk down the mount stack looking for an autofs mount that > + * has the requested device number (aka. new_encode_dev(sb->s_dev). > + */ > +static int autofs_dev_ioctl_find_super(struct nameidata *nd, dev_t devno) > +{ > + struct dentry *dentry; > + struct inode *inode; > + struct super_block *sb; > + dev_t s_dev; > + unsigned int err; > + > + err = -ENOENT; > + > + /* Lookup the dentry name at the base of our mount point */ > + dentry = d_lookup(nd->path.dentry, &nd->last); > + if (!dentry) > + goto out; > + > + dput(nd->path.dentry); > + nd->path.dentry = dentry; > + > + /* And follow the mount stack looking for our autofs mount */ > + while (1) { > + inode = nd->path.dentry->d_inode; > + if (!inode) > + continue; > + > + sb = inode->i_sb; > + s_dev = new_encode_dev(sb->s_dev); > + if (devno == s_dev) { > + if (sb->s_magic == AUTOFS_SUPER_MAGIC) { > + err = 0; > + break; > + } > + } > + > + if (!follow_down(&nd->path.mnt, &nd->path.dentry)) > + goto out; > + } > + > +out: > + return err; > +} hm. possibly-interested parties cc'ed. > +/* > + * Install the file opened for autofs mount point control functions > + * and set close on exec. > + */ > +static void autofs_dev_ioctl_fd_install(unsigned int fd, struct file *file) > +{ > + struct files_struct *files = current->files; > + struct fdtable *fdt; > + > + spin_lock(&files->file_lock); > + fdt = files_fdtable(files); > + BUG_ON(fdt->fd[fd] != NULL); > + rcu_assign_pointer(fdt->fd[fd], file); > + FD_SET(fd, fdt->close_on_exec); > + spin_unlock(&files->file_lock); > +} That's fd_install() plus an add-on. It's not autofs-specific. Should be in fs/open.c, methinks? > > ... > > +/* Close file descriptor allocated above (user can also use close(2)). */ > +static inline int autofs_dev_ioctl_closemount(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) > +{ > + return sys_close(param->ioctlfd); > +} hm. > > ... > > +/* > + * Set the pipe fd for kernel communication to the daemon. > + * > + * Normally this is set at mount using an option but if we > + * are reconnecting to a busy mount then we need to use this > + * to tell the autofs mount about the new kernel pipe fd. In > + * order to protect mounts against incorrectly setting the > + * pipefd we also require that the autofs mount be catatonic. > + * > + * This also sets the process group id used to identify the > + * controlling process (eg. the owning automount(8) daemon). > + */ > +static int autofs_dev_ioctl_setpipefd(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) > +{ > + int pipefd; > + int err = 0; > + > + if (param->arg1 == -1) > + return -EINVAL; > + > + pipefd = param->arg1; > + > + if (!sbi->catatonic) > + return -EBUSY; > + else { > + struct file *pipe = fget(pipefd); > + if (!pipe->f_op || !pipe->f_op->write) { > + err = -EPIPE; > + fput(pipe); > + goto out; > + } > + sbi->oz_pgrp = task_pgrp_nr(current); > + sbi->pipefd = pipefd; > + sbi->pipe = pipe; > + sbi->catatonic = 0; > + } > +out: > + return err; > +} We have a new filesystem type, a misc device with a mysterious ioctl, hand-rolled mountpoint chasing, hand-rolled fd installation and now pipes too. This is a complex interface. We really need to see the overall problem statement, design and interface description, please. > +/* > + * Make the autofs mount point catatonic, no longer responsive to > + * mount requests. Also closes the kernel pipe file descriptor. > + */ > +static inline int autofs_dev_ioctl_catatonic(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) > +{ > + if (!sbi->catatonic) > + autofs4_catatonic_mode(sbi); > + return 0; > +} > + > +/* Set the autofs mount timeout */ > +static inline int autofs_dev_ioctl_timeout(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) > +{ > + unsigned long timeout; > + > + timeout = param->arg1; > + param->arg1 = sbi->exp_timeout / HZ; > + sbi->exp_timeout = timeout * HZ; > + return 0; > +} uninline everything... > +/* > + * Return the uid and gid of the last request for the mount > + * > + * When reconstructing an autofs mount tree with active mounts > + * we need to re-connect to mounts that may have used the original > + * process uid and gid (or string variations of them) for mount > + * lookups within the map entry. > + */ > +static inline int autofs_dev_ioctl_requestor(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) especially that - it's only ever called indirectly anwyay! > + struct autofs_info *ino; > + struct nameidata nd; > + const char *path; > + dev_t devid; > + int err = -ENOENT; > + > + if (param->size <= sizeof(*param)) { > + err = -EINVAL; > + goto out; > + } > + > + path = param->path; > + devid = sbi->sb->s_dev; > + > + param->arg1 = param->arg2 = -1; > + > + /* Get nameidata of the parent directory */ > + err = path_lookup(path, LOOKUP_PARENT, &nd); > + if (err) > + goto out; > + > + err = autofs_dev_ioctl_find_super(&nd, devid); > + if (err) > + goto out_release; > + > + ino = autofs4_dentry_ino(nd.path.dentry); > + if (ino) { > + err = 0; > + param->arg1 = ino->uid; > + param->arg2 = ino->gid; > + } > + > +out_release: > + path_put(&nd.path); > +out: > + return err; > +} > + > +/* > + * Call repeatedly until it returns -EAGAIN, meaning there's nothing > + * more that can be done. > + */ > +static int autofs_dev_ioctl_expire(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) > +{ > + struct dentry *dentry; > + struct vfsmount *mnt; > + int err = -EAGAIN; > + int when; > + > + when = param->arg1; > + mnt = fp->f_path.mnt; > + > + if (sbi->type & AUTOFS_TYPE_DIRECT) > + dentry = autofs4_expire_direct(sbi->sb, mnt, sbi, when); > + else > + dentry = autofs4_expire_indirect(sbi->sb, mnt, sbi, when); > + > + if (dentry) { > + struct autofs_info *ino = autofs4_dentry_ino(dentry); > + > + /* > + * This is synchronous because it makes the daemon a > + * little easier > + */ > + ino->flags |= AUTOFS_INF_EXPIRING; > + err = autofs4_wait(sbi, dentry, NFY_EXPIRE); > + ino->flags &= ~AUTOFS_INF_EXPIRING; Are there races around the modification of ino->flags here? > + dput(dentry); > + } > + > + return err; > +} > + > +/* Check if autofs mount point is in use */ > +static inline int autofs_dev_ioctl_askumount(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) > +{ > + param->arg1 = 0; > + if (may_umount(fp->f_path.mnt)) > + param->arg1 = 1; > + return 0; > +} > + > +/* > + * Check if the given path is a mountpoint. > + * > + * If we are supplied with the file descriptor of the autofs > + * mount we're looking for a specific mount. In this case > + * the path is considered a mountpoint if it is itself a > + * mountpoint or contains a mount, such as a multi-mount > + * without a root mount. > + * > + * If we aren't supplied with a file descriptor then we lookup > + * the nameidata of the path and check if is a mounted autofs > + * filesystem or a filesystem mounted within an autofs filesystem. > + * > + * In both cases we return an indication of whether the path > + * is a mount point and the super magic of the covering mount. > + */ > +static inline int autofs_dev_ioctl_ismountpoint(struct file *fp, > + struct autofs_sb_info *sbi, > + struct autofs_dev_ioctl *param) uninline.. > + struct nameidata nd; > + const char *path; > + int err = -ENOENT; > + > + if (param->size <= sizeof(*param)) { > + err = -EINVAL; > + goto out; > + } > + > + path = param->path; > + > + param->arg1 = param->arg2 = 0; > + > + if (param->ioctlfd == -1) { > + unsigned long magic; > + > + err = path_lookup(path, LOOKUP_FOLLOW, &nd); > + if (err) > + goto out; > + > + magic = nd.path.dentry->d_inode->i_sb->s_magic; > + > + if (follow_up(&nd.path.mnt, &nd.path.dentry)) { > + struct inode *inode = nd.path.dentry->d_inode; > + if (magic == AUTOFS_SUPER_MAGIC || > + inode->i_sb->s_magic == AUTOFS_SUPER_MAGIC) { > + param->arg1 = 1; > + param->arg2 = magic; > + } > + } > + } else { > + dev_t devid = sbi->sb->s_dev; > + > + err = path_lookup(path, LOOKUP_PARENT, &nd); > + if (err) > + goto out; > + > + err = autofs_dev_ioctl_find_super(&nd, devid); > + if (err) > + goto out_release; > + > + param->arg1 = have_submounts(nd.path.dentry); > + > + if (d_mountpoint(nd.path.dentry)) { > + if (follow_down(&nd.path.mnt, &nd.path.dentry)) { > + struct inode *inode = nd.path.dentry->d_inode; > + param->arg2 = inode->i_sb->s_magic; > + } > + } > + } > + > +out_release: > + path_put(&nd.path); > +out: > + return err; > +} Have you really carefully reviewed and tested what happens when non-autofs fds are fed into all the ioctl modes? I hope all these ioctl entrypoints are root-only. What determines that? The miscdevice permissions? > +/* > + * Our range of ioctl numbers isn't 0 based so we need to shift > + * the array index by _IOC_NR(AUTOFS_CTL_IOC_FIRST) for the table > + * lookup. > + */ > +#define cmd_idx(cmd) (cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST)) > + > +static ioctl_fn lookup_dev_ioctl(unsigned int cmd) > +{ > + static struct { > + int cmd; > + ioctl_fn fn; > + } _ioctls[] = { > + {cmd_idx(AUTOFS_DEV_IOCTL_VERSION_CMD), NULL}, > + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOVER_CMD), autofs_dev_ioctl_protover}, > + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD), autofs_dev_ioctl_protosubver}, > + {cmd_idx(AUTOFS_DEV_IOCTL_OPENMOUNT_CMD), autofs_dev_ioctl_openmount}, > + {cmd_idx(AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD), autofs_dev_ioctl_closemount}, > + {cmd_idx(AUTOFS_DEV_IOCTL_READY_CMD), autofs_dev_ioctl_ready}, > + {cmd_idx(AUTOFS_DEV_IOCTL_FAIL_CMD), autofs_dev_ioctl_fail}, > + {cmd_idx(AUTOFS_DEV_IOCTL_SETPIPEFD_CMD), autofs_dev_ioctl_setpipefd}, > + {cmd_idx(AUTOFS_DEV_IOCTL_CATATONIC_CMD), autofs_dev_ioctl_catatonic}, > + {cmd_idx(AUTOFS_DEV_IOCTL_TIMEOUT_CMD), autofs_dev_ioctl_timeout}, > + {cmd_idx(AUTOFS_DEV_IOCTL_REQUESTOR_CMD), autofs_dev_ioctl_requestor}, > + {cmd_idx(AUTOFS_DEV_IOCTL_EXPIRE_CMD), autofs_dev_ioctl_expire}, > + {cmd_idx(AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD), autofs_dev_ioctl_askumount}, > + {cmd_idx(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD), autofs_dev_ioctl_ismountpoint} > + }; > + int idx = cmd_idx(cmd); `idx' should have unsigned type. > + return (idx >= ARRAY_SIZE(_ioctls)) ? NULL : _ioctls[idx].fn; > +} > + > +/* ioctl dispatcher */ > +static int _autofs_dev_ioctl(unsigned int command, struct autofs_dev_ioctl __user *user) > +{ > + struct autofs_dev_ioctl *param; > + struct file *fp; > + struct autofs_sb_info *sbi; > + unsigned int cmd; > + ioctl_fn fn = NULL; > + int err = 0; > + > + /* only root can play with this */ > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; OK, I guess that answers my above question. > + cmd = _IOC_NR(command); > + > + if (_IOC_TYPE(command) != _IOC_TYPE(AUTOFS_DEV_IOCTL_IOC_FIRST) || > + cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST) >= AUTOFS_DEV_IOCTL_IOC_COUNT) { > + return -ENOTTY; > + } > + > + fn = lookup_dev_ioctl(cmd); > + if (!fn) { > + WARN("unknown command 0x%08x", command); > + return -ENOTTY; > + } > + > + /* > + * Trying to avoid low memory issues when a device is > + * suspended. > + */ > + current->flags |= PF_MEMALLOC; whoa, what's this doing here? > + /* Copy the parameters into kernel space. */ > + param = copy_dev_ioctl(user); > + > + current->flags &= ~PF_MEMALLOC; > + > + if (IS_ERR(param)) > + return PTR_ERR(param); > + > + err = validate_dev_ioctl(command, param); > + if (err) > + goto out; > + > + /* The validate routine above always sets the version */ > + if (cmd == AUTOFS_DEV_IOCTL_VERSION_CMD) > + goto done; > + > + fp = NULL; > + sbi = NULL; > + > + /* > + * For obvious reasons the openmount can't have a file > + * descriptor yet. We don't take a reference to the > + * file during close to allow for immediate release. > + */ > + if (cmd != AUTOFS_DEV_IOCTL_OPENMOUNT_CMD && > + cmd != AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD) { > + fp = fget(param->ioctlfd); > + if (!fp) { > + if (cmd == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD) > + goto cont; > + err = -EBADF; > + goto out; > + } > + > + if (!fp->f_op) { > + err = -ENOTTY; > + fput(fp); > + goto out; > + } > + > + sbi = autofs_dev_ioctl_sbi(fp); > + if (!sbi) { > + err = -EINVAL; > + fput(fp); > + goto out; > + } > + } > +cont: > + err = fn(fp, sbi, param); > + > + if (fp) > + fput(fp); > +done: > + if (!err && copy_to_user(user, param, AUTOFS_DEV_IOCTL_SIZE)) > + err = -EFAULT; > +out: > + free_dev_ioctl(param); > + return err; > +} > + > > ... > > diff -up /dev/null linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h > --- /dev/null 2008-02-22 18:55:57.149000956 +0900 > +++ linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h 2008-02-22 11:51:41.000000000 +0900 > @@ -0,0 +1,114 @@ > +/* > + * linux/include/linux/auto_dev-ioctl.h > + * > + * Copyright 2008 Red Hat, Inc. All rights reserved. > + * Copyright 2008 Ian Kent <raven@themaw.net> > + * > + * This file is part of the Linux kernel and is made available under > + * the terms of the GNU General Public License, version 2, or at your > + * option, any later version, incorporated herein by reference. > + */ > + > +#ifndef _LINUX_AUTO_DEV_IOCTL_H > +#define _LINUX_AUTO_DEV_IOCTL_H > + > +#include <linux/types.h> > + > +#define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1 > +#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0 > + > +#define AUTOFS_DEVID_LEN 16 > + > +/* > + * An ioctl interface for autofs mount point control. > + */ > + > +/* > + * All the ioctls use this structure. > + * When sending a path size must account for the total length > + * of the chunk of memory otherwise is is the size of the > + * structure. > + */ > + > +struct autofs_dev_ioctl { > + __u32 ver_major; > + __u32 ver_minor; > + __u32 size; /* total size of data passed in > + * including this struct */ > + __s32 ioctlfd; /* automount command fd */ > + > + __u32 arg1; /* Command parameters */ > + __u32 arg2; > + > + char path[0]; > +}; > + > +static inline void init_autofs_dev_ioctl(struct autofs_dev_ioctl *in) > +{ > + in->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR; > + in->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR; > + in->size = sizeof(struct autofs_dev_ioctl); > + in->ioctlfd = -1; > + in->arg1 = 0; > + in->arg2 = 0; > + return; > +} uninline.. > +/* > + * If you change this make sure you make the corresponding change > + * to autofs-dev-ioctl.c:lookup_ioctl() Can we do this automatically via preprocessor tricks, or whatever? > + */ > +enum { > + /* Get various version info */ > + AUTOFS_DEV_IOCTL_VERSION_CMD = 0x71, > + AUTOFS_DEV_IOCTL_PROTOVER_CMD, > + AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD, > + > + /* Open mount ioctl fd */ > + AUTOFS_DEV_IOCTL_OPENMOUNT_CMD, > + > + /* Close mount ioctl fd */ > + AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD, > + > + /* Mount/expire status returns */ > + AUTOFS_DEV_IOCTL_READY_CMD, > + AUTOFS_DEV_IOCTL_FAIL_CMD, > + > + /* Activate/deactivate autofs mount */ > + AUTOFS_DEV_IOCTL_SETPIPEFD_CMD, > + AUTOFS_DEV_IOCTL_CATATONIC_CMD, > + > + /* Expiry timeout */ > + AUTOFS_DEV_IOCTL_TIMEOUT_CMD, > + > + /* Get mount last requesting uid and gid */ > + AUTOFS_DEV_IOCTL_REQUESTOR_CMD, > + > + /* Check for eligible expire candidates */ > + AUTOFS_DEV_IOCTL_EXPIRE_CMD, > + > + /* Request busy status */ > + AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD, > + > + /* Check if path is a mountpoint */ > + AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD, > +}; > + ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-02-28 5:17 ` Andrew Morton @ 2008-02-28 6:18 ` Ian Kent 2008-03-13 7:00 ` [RFC] " Ian Kent 2008-02-29 16:24 ` Ian Kent 1 sibling, 1 reply; 40+ messages in thread From: Ian Kent @ 2008-02-28 6:18 UTC (permalink / raw) To: Andrew Morton Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Christoph Hellwig, Al Viro On Wed, 2008-02-27 at 21:17 -0800, Andrew Morton wrote: > On Tue, 26 Feb 2008 12:23:55 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > > > Hi Andrew, > > > > Patch to add miscellaneous device to autofs4 module for > > ioctls. > > Could you please document the new kernel interface which you're proposing? > In Docmentation/ or in the changelog? Also a good suggestion. I'll include that too. > > We seem to be passing some string into a miscdevice ioctl and getting some > results back. Be aware that this won't be a terribly popular proposal, so > I'd suggest that you fully describe the problem which it's trying to solve, > and how it solves it, and why the various alternatives (sysfs, netlink, > mount options, etc) were judged unsuitable. Yes, as I said above. I don't expect that people that aren't close to the development of autofs will "get" the problem description in the leading post but I will try and expand on it as best I can. As for the possible alternatives, it sounds like I have some more work to do on that. Mount options can't be used as I described in the lead in post and, as far as my understanding of sysfs goes, I don't think it's appropriate. But, I'm not aware of what the netlink interface may be able to do for me so I will need to check on that. > > > > ... > > > > --- linux-2.6.25-rc2-mm1/fs/autofs4/init.c.device-node-ioctl 2008-01-25 07:58:37.000000000 +0900 > > +++ linux-2.6.25-rc2-mm1/fs/autofs4/init.c 2008-02-22 11:51:41.000000000 +0900 > > @@ -29,11 +29,20 @@ static struct file_system_type autofs_fs > > > > static int __init init_autofs4_fs(void) > > { > > - return register_filesystem(&autofs_fs_type); > > + int err; > > + > > + err = register_filesystem(&autofs_fs_type); > > + if (err) > > + return err; > > + > > + err = autofs_dev_ioctl_init(); > > + > > + return err; > > } > > We should run unregister_filesystem() if autofs_dev_ioctl_init() fails. > > > static void __exit exit_autofs4_fs(void) > > { > > + autofs_dev_ioctl_exit(); > > unregister_filesystem(&autofs_fs_type); > > } > > > > diff -up linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h > > --- linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h.device-node-ioctl 2008-02-22 11:51:41.000000000 +0900 > > +++ linux-2.6.25-rc2-mm1/fs/autofs4/autofs_i.h 2008-02-22 11:51:41.000000000 +0900 > > @@ -14,6 +14,7 @@ > > /* Internal header file for autofs */ > > > > #include <linux/auto_fs4.h> > > +#include <linux/auto_dev-ioctl.h> > > #include <linux/mutex.h> > > #include <linux/list.h> > > > > @@ -40,6 +41,16 @@ > > #define DPRINTK(fmt,args...) do {} while(0) > > #endif > > > > +#define WARN(fmt,args...) \ > > +do { \ > > + printk("KERN_WARNING pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \ > > +} while(0) > > + > > +#define ERROR(fmt,args...) \ > > +do { \ > > + printk("KERN_ERR pid %d: %s: " fmt "\n" , current->pid , __FUNCTION__ , ##args); \ > > +} while(0) > > Please always pass all diffs through scripts/checkpatch.pl. > > > /* Unified info structure. This is pointed to by both the dentry and > > inode structures. Each file in the filesystem has an instance of this > > structure. It holds a reference to the dentry, so dentries are never > > @@ -172,6 +183,17 @@ int autofs4_expire_run(struct super_bloc > > struct autofs_packet_expire __user *); > > int autofs4_expire_multi(struct super_block *, struct vfsmount *, > > struct autofs_sb_info *, int __user *); > > +struct dentry *autofs4_expire_direct(struct super_block *sb, > > + struct vfsmount *mnt, > > + struct autofs_sb_info *sbi, int how); > > +struct dentry *autofs4_expire_indirect(struct super_block *sb, > > + struct vfsmount *mnt, > > + struct autofs_sb_info *sbi, int how); > > + > > +/* Device node initialization */ > > + > > +int autofs_dev_ioctl_init(void); > > +void autofs_dev_ioctl_exit(void); > > > > ... > > > > @@ -0,0 +1,788 @@ > > +/* > > + * linux/fs/autofs4/dev-ioctl.c > > We prefer not to bother with the filename-in-the-file thing. You know what > file you're reading, and these things tend to not get updated across > renames. > > > + * Copyright 2008 Red Hat, Inc. All rights reserved. > > + * Copyright 2008 Ian Kent <raven@themaw.net> > > + * > > + * This file is part of the Linux kernel and is made available under > > + * the terms of the GNU General Public License, version 2, or at your > > + * option, any later version, incorporated herein by reference. > > + */ > > + > > +#include <linux/module.h> > > +#include <linux/vmalloc.h> > > +#include <linux/miscdevice.h> > > +#include <linux/init.h> > > +#include <linux/wait.h> > > +#include <linux/namei.h> > > +#include <linux/fcntl.h> > > +#include <linux/file.h> > > +#include <linux/sched.h> > > +#include <linux/compat.h> > > +#include <linux/syscalls.h> > > +#include <linux/smp_lock.h> > > +#include <linux/magic.h> > > +#include <linux/dcache.h> > > +#include <asm/uaccess.h> > > Please include <linux/foo.h> rather than <asm/foo.h> if the former exists. > > > +#include "autofs_i.h" > > + > > +/* > > + * This module implements an interface for routing autofs ioctl control > > + * commands via a miscellaneous device file. > > + * > > + * The alternate interface is needed because we need to be able open > > + * an ioctl file descriptor on an autofs mount that may be covered by > > + * another mount. This situation arises when starting automount(8) > > + * or other user space daemon which uses direct mounts or offset > > + * mounts (used for autofs lazy mount/umount of nested mount trees), > > + * which have been left busy at at service shutdown. > > + */ > > + > > +#define AUTOFS_DEVICE_NAME "autofs" > > + > > +#define AUTOFS_DEV_IOCTL_IOC_FIRST AUTOFS_DEV_IOCTL_VERSION > > +#define AUTOFS_DEV_IOCTL_IOC_COUNT AUTOFS_IOC_COUNT - 11 > > This needs parentheses. > > Shouldn't these be in a header file, exported to userspace builds? > > > +#define AUTOFS_DEV_IOCTL_SIZE sizeof(struct autofs_dev_ioctl) > > + > > +typedef int (*ioctl_fn)(struct file *, struct autofs_sb_info *, struct autofs_dev_ioctl *); > > + > > +static int check_name(const char *name) > > +{ > > + if (!strchr(name, '/')) > > + return -EINVAL; > > + return 0; > > +} > > + > > +/* > > + * Check a string doesn't overrun the chunk of > > + * memory we copied from user land. > > + */ > > +static int invalid_str(char *str, void *end) > > +{ > > + while ((void *) str <= end) > > + if (!*str++) > > + return 0; > > + return -EINVAL; > > +} > > > > ... > > > > +/* > > + * Get the autofs super block info struct from the file opened on > > + * the autofs mount point. > > + */ > > +static struct autofs_sb_info *autofs_dev_ioctl_sbi(struct file *f) > > +{ > > + struct autofs_sb_info *sbi = NULL; > > + struct inode *inode; > > + > > + if (f) { > > `f' cannot be NULL here. > > > + inode = f->f_path.dentry->d_inode; > > + sbi = autofs4_sbi(inode->i_sb); > > + } > > + > > + return sbi; > > +} > > + > > +/* Return autofs module protocol version */ > > +static inline int autofs_dev_ioctl_protover(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + param->arg1 = sbi->version; > > + return 0; > > +} > > + > > +/* Return autofs module protocol sub version */ > > +static inline int autofs_dev_ioctl_protosubver(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + param->arg1 = sbi->sub_version; > > + return 0; > > +} > > Don't bother inlining things - the compiler will do it. > > > +/* > > + * Walk down the mount stack looking for an autofs mount that > > + * has the requested device number (aka. new_encode_dev(sb->s_dev). > > + */ > > +static int autofs_dev_ioctl_find_super(struct nameidata *nd, dev_t devno) > > +{ > > + struct dentry *dentry; > > + struct inode *inode; > > + struct super_block *sb; > > + dev_t s_dev; > > + unsigned int err; > > + > > + err = -ENOENT; > > + > > + /* Lookup the dentry name at the base of our mount point */ > > + dentry = d_lookup(nd->path.dentry, &nd->last); > > + if (!dentry) > > + goto out; > > + > > + dput(nd->path.dentry); > > + nd->path.dentry = dentry; > > + > > + /* And follow the mount stack looking for our autofs mount */ > > + while (1) { > > + inode = nd->path.dentry->d_inode; > > + if (!inode) > > + continue; > > + > > + sb = inode->i_sb; > > + s_dev = new_encode_dev(sb->s_dev); > > + if (devno == s_dev) { > > + if (sb->s_magic == AUTOFS_SUPER_MAGIC) { > > + err = 0; > > + break; > > + } > > + } > > + > > + if (!follow_down(&nd->path.mnt, &nd->path.dentry)) > > + goto out; > > + } > > + > > +out: > > + return err; > > +} > > hm. possibly-interested parties cc'ed. > > > +/* > > + * Install the file opened for autofs mount point control functions > > + * and set close on exec. > > + */ > > +static void autofs_dev_ioctl_fd_install(unsigned int fd, struct file *file) > > +{ > > + struct files_struct *files = current->files; > > + struct fdtable *fdt; > > + > > + spin_lock(&files->file_lock); > > + fdt = files_fdtable(files); > > + BUG_ON(fdt->fd[fd] != NULL); > > + rcu_assign_pointer(fdt->fd[fd], file); > > + FD_SET(fd, fdt->close_on_exec); > > + spin_unlock(&files->file_lock); > > +} > > That's fd_install() plus an add-on. It's not autofs-specific. Should be > in fs/open.c, methinks? > > > > > ... > > > > +/* Close file descriptor allocated above (user can also use close(2)). */ > > +static inline int autofs_dev_ioctl_closemount(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + return sys_close(param->ioctlfd); > > +} > > hm. > > > > > ... > > > > +/* > > + * Set the pipe fd for kernel communication to the daemon. > > + * > > + * Normally this is set at mount using an option but if we > > + * are reconnecting to a busy mount then we need to use this > > + * to tell the autofs mount about the new kernel pipe fd. In > > + * order to protect mounts against incorrectly setting the > > + * pipefd we also require that the autofs mount be catatonic. > > + * > > + * This also sets the process group id used to identify the > > + * controlling process (eg. the owning automount(8) daemon). > > + */ > > +static int autofs_dev_ioctl_setpipefd(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + int pipefd; > > + int err = 0; > > + > > + if (param->arg1 == -1) > > + return -EINVAL; > > + > > + pipefd = param->arg1; > > + > > + if (!sbi->catatonic) > > + return -EBUSY; > > + else { > > + struct file *pipe = fget(pipefd); > > + if (!pipe->f_op || !pipe->f_op->write) { > > + err = -EPIPE; > > + fput(pipe); > > + goto out; > > + } > > + sbi->oz_pgrp = task_pgrp_nr(current); > > + sbi->pipefd = pipefd; > > + sbi->pipe = pipe; > > + sbi->catatonic = 0; > > + } > > +out: > > + return err; > > +} > > We have a new filesystem type, a misc device with a mysterious ioctl, > hand-rolled mountpoint chasing, hand-rolled fd installation and now pipes > too. > > This is a complex interface. We really need to see the overall problem > statement, design and interface description, please. > > > +/* > > + * Make the autofs mount point catatonic, no longer responsive to > > + * mount requests. Also closes the kernel pipe file descriptor. > > + */ > > +static inline int autofs_dev_ioctl_catatonic(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + if (!sbi->catatonic) > > + autofs4_catatonic_mode(sbi); > > + return 0; > > +} > > + > > +/* Set the autofs mount timeout */ > > +static inline int autofs_dev_ioctl_timeout(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + unsigned long timeout; > > + > > + timeout = param->arg1; > > + param->arg1 = sbi->exp_timeout / HZ; > > + sbi->exp_timeout = timeout * HZ; > > + return 0; > > +} > > uninline everything... > > > +/* > > + * Return the uid and gid of the last request for the mount > > + * > > + * When reconstructing an autofs mount tree with active mounts > > + * we need to re-connect to mounts that may have used the original > > + * process uid and gid (or string variations of them) for mount > > + * lookups within the map entry. > > + */ > > +static inline int autofs_dev_ioctl_requestor(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > especially that - it's only ever called indirectly anwyay! > > > + struct autofs_info *ino; > > + struct nameidata nd; > > + const char *path; > > + dev_t devid; > > + int err = -ENOENT; > > + > > + if (param->size <= sizeof(*param)) { > > + err = -EINVAL; > > + goto out; > > + } > > + > > + path = param->path; > > + devid = sbi->sb->s_dev; > > + > > + param->arg1 = param->arg2 = -1; > > + > > + /* Get nameidata of the parent directory */ > > + err = path_lookup(path, LOOKUP_PARENT, &nd); > > + if (err) > > + goto out; > > + > > + err = autofs_dev_ioctl_find_super(&nd, devid); > > + if (err) > > + goto out_release; > > + > > + ino = autofs4_dentry_ino(nd.path.dentry); > > + if (ino) { > > + err = 0; > > + param->arg1 = ino->uid; > > + param->arg2 = ino->gid; > > + } > > + > > +out_release: > > + path_put(&nd.path); > > +out: > > + return err; > > +} > > + > > +/* > > + * Call repeatedly until it returns -EAGAIN, meaning there's nothing > > + * more that can be done. > > + */ > > +static int autofs_dev_ioctl_expire(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + struct dentry *dentry; > > + struct vfsmount *mnt; > > + int err = -EAGAIN; > > + int when; > > + > > + when = param->arg1; > > + mnt = fp->f_path.mnt; > > + > > + if (sbi->type & AUTOFS_TYPE_DIRECT) > > + dentry = autofs4_expire_direct(sbi->sb, mnt, sbi, when); > > + else > > + dentry = autofs4_expire_indirect(sbi->sb, mnt, sbi, when); > > + > > + if (dentry) { > > + struct autofs_info *ino = autofs4_dentry_ino(dentry); > > + > > + /* > > + * This is synchronous because it makes the daemon a > > + * little easier > > + */ > > + ino->flags |= AUTOFS_INF_EXPIRING; > > + err = autofs4_wait(sbi, dentry, NFY_EXPIRE); > > + ino->flags &= ~AUTOFS_INF_EXPIRING; > > Are there races around the modification of ino->flags here? > > > + dput(dentry); > > + } > > + > > + return err; > > +} > > + > > +/* Check if autofs mount point is in use */ > > +static inline int autofs_dev_ioctl_askumount(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + param->arg1 = 0; > > + if (may_umount(fp->f_path.mnt)) > > + param->arg1 = 1; > > + return 0; > > +} > > + > > +/* > > + * Check if the given path is a mountpoint. > > + * > > + * If we are supplied with the file descriptor of the autofs > > + * mount we're looking for a specific mount. In this case > > + * the path is considered a mountpoint if it is itself a > > + * mountpoint or contains a mount, such as a multi-mount > > + * without a root mount. > > + * > > + * If we aren't supplied with a file descriptor then we lookup > > + * the nameidata of the path and check if is a mounted autofs > > + * filesystem or a filesystem mounted within an autofs filesystem. > > + * > > + * In both cases we return an indication of whether the path > > + * is a mount point and the super magic of the covering mount. > > + */ > > +static inline int autofs_dev_ioctl_ismountpoint(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > uninline.. > > > + struct nameidata nd; > > + const char *path; > > + int err = -ENOENT; > > + > > + if (param->size <= sizeof(*param)) { > > + err = -EINVAL; > > + goto out; > > + } > > + > > + path = param->path; > > + > > + param->arg1 = param->arg2 = 0; > > + > > + if (param->ioctlfd == -1) { > > + unsigned long magic; > > + > > + err = path_lookup(path, LOOKUP_FOLLOW, &nd); > > + if (err) > > + goto out; > > + > > + magic = nd.path.dentry->d_inode->i_sb->s_magic; > > + > > + if (follow_up(&nd.path.mnt, &nd.path.dentry)) { > > + struct inode *inode = nd.path.dentry->d_inode; > > + if (magic == AUTOFS_SUPER_MAGIC || > > + inode->i_sb->s_magic == AUTOFS_SUPER_MAGIC) { > > + param->arg1 = 1; > > + param->arg2 = magic; > > + } > > + } > > + } else { > > + dev_t devid = sbi->sb->s_dev; > > + > > + err = path_lookup(path, LOOKUP_PARENT, &nd); > > + if (err) > > + goto out; > > + > > + err = autofs_dev_ioctl_find_super(&nd, devid); > > + if (err) > > + goto out_release; > > + > > + param->arg1 = have_submounts(nd.path.dentry); > > + > > + if (d_mountpoint(nd.path.dentry)) { > > + if (follow_down(&nd.path.mnt, &nd.path.dentry)) { > > + struct inode *inode = nd.path.dentry->d_inode; > > + param->arg2 = inode->i_sb->s_magic; > > + } > > + } > > + } > > + > > +out_release: > > + path_put(&nd.path); > > +out: > > + return err; > > +} > > Have you really carefully reviewed and tested what happens when non-autofs > fds are fed into all the ioctl modes? > > I hope all these ioctl entrypoints are root-only. What determines that? > The miscdevice permissions? > > > +/* > > + * Our range of ioctl numbers isn't 0 based so we need to shift > > + * the array index by _IOC_NR(AUTOFS_CTL_IOC_FIRST) for the table > > + * lookup. > > + */ > > +#define cmd_idx(cmd) (cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST)) > > + > > +static ioctl_fn lookup_dev_ioctl(unsigned int cmd) > > +{ > > + static struct { > > + int cmd; > > + ioctl_fn fn; > > + } _ioctls[] = { > > + {cmd_idx(AUTOFS_DEV_IOCTL_VERSION_CMD), NULL}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOVER_CMD), autofs_dev_ioctl_protover}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD), autofs_dev_ioctl_protosubver}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_OPENMOUNT_CMD), autofs_dev_ioctl_openmount}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD), autofs_dev_ioctl_closemount}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_READY_CMD), autofs_dev_ioctl_ready}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_FAIL_CMD), autofs_dev_ioctl_fail}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_SETPIPEFD_CMD), autofs_dev_ioctl_setpipefd}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_CATATONIC_CMD), autofs_dev_ioctl_catatonic}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_TIMEOUT_CMD), autofs_dev_ioctl_timeout}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_REQUESTOR_CMD), autofs_dev_ioctl_requestor}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_EXPIRE_CMD), autofs_dev_ioctl_expire}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD), autofs_dev_ioctl_askumount}, > > + {cmd_idx(AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD), autofs_dev_ioctl_ismountpoint} > > + }; > > + int idx = cmd_idx(cmd); > > `idx' should have unsigned type. > > > + return (idx >= ARRAY_SIZE(_ioctls)) ? NULL : _ioctls[idx].fn; > > +} > > + > > +/* ioctl dispatcher */ > > +static int _autofs_dev_ioctl(unsigned int command, struct autofs_dev_ioctl __user *user) > > +{ > > + struct autofs_dev_ioctl *param; > > + struct file *fp; > > + struct autofs_sb_info *sbi; > > + unsigned int cmd; > > + ioctl_fn fn = NULL; > > + int err = 0; > > + > > + /* only root can play with this */ > > + if (!capable(CAP_SYS_ADMIN)) > > + return -EPERM; > > OK, I guess that answers my above question. > > > + cmd = _IOC_NR(command); > > + > > + if (_IOC_TYPE(command) != _IOC_TYPE(AUTOFS_DEV_IOCTL_IOC_FIRST) || > > + cmd - _IOC_NR(AUTOFS_DEV_IOCTL_IOC_FIRST) >= AUTOFS_DEV_IOCTL_IOC_COUNT) { > > + return -ENOTTY; > > + } > > + > > + fn = lookup_dev_ioctl(cmd); > > + if (!fn) { > > + WARN("unknown command 0x%08x", command); > > + return -ENOTTY; > > + } > > + > > + /* > > + * Trying to avoid low memory issues when a device is > > + * suspended. > > + */ > > + current->flags |= PF_MEMALLOC; > > whoa, what's this doing here? > > > + /* Copy the parameters into kernel space. */ > > + param = copy_dev_ioctl(user); > > + > > + current->flags &= ~PF_MEMALLOC; > > + > > + if (IS_ERR(param)) > > + return PTR_ERR(param); > > + > > + err = validate_dev_ioctl(command, param); > > + if (err) > > + goto out; > > + > > + /* The validate routine above always sets the version */ > > + if (cmd == AUTOFS_DEV_IOCTL_VERSION_CMD) > > + goto done; > > + > > + fp = NULL; > > + sbi = NULL; > > + > > + /* > > + * For obvious reasons the openmount can't have a file > > + * descriptor yet. We don't take a reference to the > > + * file during close to allow for immediate release. > > + */ > > + if (cmd != AUTOFS_DEV_IOCTL_OPENMOUNT_CMD && > > + cmd != AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD) { > > + fp = fget(param->ioctlfd); > > + if (!fp) { > > + if (cmd == AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD) > > + goto cont; > > + err = -EBADF; > > + goto out; > > + } > > + > > + if (!fp->f_op) { > > + err = -ENOTTY; > > + fput(fp); > > + goto out; > > + } > > + > > + sbi = autofs_dev_ioctl_sbi(fp); > > + if (!sbi) { > > + err = -EINVAL; > > + fput(fp); > > + goto out; > > + } > > + } > > +cont: > > + err = fn(fp, sbi, param); > > + > > + if (fp) > > + fput(fp); > > +done: > > + if (!err && copy_to_user(user, param, AUTOFS_DEV_IOCTL_SIZE)) > > + err = -EFAULT; > > +out: > > + free_dev_ioctl(param); > > + return err; > > +} > > + > > > > ... > > > > diff -up /dev/null linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h > > --- /dev/null 2008-02-22 18:55:57.149000956 +0900 > > +++ linux-2.6.25-rc2-mm1/include/linux/auto_dev-ioctl.h 2008-02-22 11:51:41.000000000 +0900 > > @@ -0,0 +1,114 @@ > > +/* > > + * linux/include/linux/auto_dev-ioctl.h > > + * > > + * Copyright 2008 Red Hat, Inc. All rights reserved. > > + * Copyright 2008 Ian Kent <raven@themaw.net> > > + * > > + * This file is part of the Linux kernel and is made available under > > + * the terms of the GNU General Public License, version 2, or at your > > + * option, any later version, incorporated herein by reference. > > + */ > > + > > +#ifndef _LINUX_AUTO_DEV_IOCTL_H > > +#define _LINUX_AUTO_DEV_IOCTL_H > > + > > +#include <linux/types.h> > > + > > +#define AUTOFS_DEV_IOCTL_VERSION_MAJOR 1 > > +#define AUTOFS_DEV_IOCTL_VERSION_MINOR 0 > > + > > +#define AUTOFS_DEVID_LEN 16 > > + > > +/* > > + * An ioctl interface for autofs mount point control. > > + */ > > + > > +/* > > + * All the ioctls use this structure. > > + * When sending a path size must account for the total length > > + * of the chunk of memory otherwise is is the size of the > > + * structure. > > + */ > > + > > +struct autofs_dev_ioctl { > > + __u32 ver_major; > > + __u32 ver_minor; > > + __u32 size; /* total size of data passed in > > + * including this struct */ > > + __s32 ioctlfd; /* automount command fd */ > > + > > + __u32 arg1; /* Command parameters */ > > + __u32 arg2; > > + > > + char path[0]; > > +}; > > + > > +static inline void init_autofs_dev_ioctl(struct autofs_dev_ioctl *in) > > +{ > > + in->ver_major = AUTOFS_DEV_IOCTL_VERSION_MAJOR; > > + in->ver_minor = AUTOFS_DEV_IOCTL_VERSION_MINOR; > > + in->size = sizeof(struct autofs_dev_ioctl); > > + in->ioctlfd = -1; > > + in->arg1 = 0; > > + in->arg2 = 0; > > + return; > > +} > > uninline.. > > > +/* > > + * If you change this make sure you make the corresponding change > > + * to autofs-dev-ioctl.c:lookup_ioctl() > > Can we do this automatically via preprocessor tricks, or whatever? > > > + */ > > +enum { > > + /* Get various version info */ > > + AUTOFS_DEV_IOCTL_VERSION_CMD = 0x71, > > + AUTOFS_DEV_IOCTL_PROTOVER_CMD, > > + AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD, > > + > > + /* Open mount ioctl fd */ > > + AUTOFS_DEV_IOCTL_OPENMOUNT_CMD, > > + > > + /* Close mount ioctl fd */ > > + AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD, > > + > > + /* Mount/expire status returns */ > > + AUTOFS_DEV_IOCTL_READY_CMD, > > + AUTOFS_DEV_IOCTL_FAIL_CMD, > > + > > + /* Activate/deactivate autofs mount */ > > + AUTOFS_DEV_IOCTL_SETPIPEFD_CMD, > > + AUTOFS_DEV_IOCTL_CATATONIC_CMD, > > + > > + /* Expiry timeout */ > > + AUTOFS_DEV_IOCTL_TIMEOUT_CMD, > > + > > + /* Get mount last requesting uid and gid */ > > + AUTOFS_DEV_IOCTL_REQUESTOR_CMD, > > + > > + /* Check for eligible expire candidates */ > > + AUTOFS_DEV_IOCTL_EXPIRE_CMD, > > + > > + /* Request busy status */ > > + AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD, > > + > > + /* Check if path is a mountpoint */ > > + AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD, > > +}; > > + > ^ permalink raw reply [flat|nested] 40+ messages in thread
* [RFC] Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-02-28 6:18 ` Ian Kent @ 2008-03-13 7:00 ` Ian Kent 2008-03-14 2:45 ` Ian Kent 2008-03-14 12:45 ` Thomas Graf 0 siblings, 2 replies; 40+ messages in thread From: Ian Kent @ 2008-03-13 7:00 UTC (permalink / raw) To: Kernel Mailing List Cc: Andrew Morton, autofs mailing list, linux-fsdevel, Christoph Hellwig, Al Viro On Thu, 28 Feb 2008, Ian Kent wrote: > > > > We seem to be passing some string into a misc-device ioctl and getting some > > results back. Be aware that this won't be a terribly popular proposal, so > > I'd suggest that you fully describe the problem which it's trying to solve, > > and how it solves it, and why the various alternatives (sysfs, netlink, > > mount options, etc) were judged unsuitable. > > Yes, as I said above. > > I don't expect that people that aren't close to the development of > autofs will "get" the problem description in the leading post but I will > try and expand on it as best I can. > > As for the possible alternatives, it sounds like I have some more work > to do on that. Mount options can't be used as I described in the lead in > post and, as far as my understanding of sysfs goes, I don't think it's > appropriate. But, I'm not aware of what the netlink interface may be > able to do for me so I will need to check on that. > I've attempted an implementation of this using the Generic Netlink interface and I've struck a difficulty. I would like to use current recommended development policy but I'm not sure how far I should go with this in order to avoid using an ioctl implementation so I'm after some advice and suggestions. So, onto the description. Sorry about the length of the post. Autofs user space uses a number of ioctls for mount control. I have a problem that can only be resolved by adding an additional control function that doesn't need to open the autofs mount point path directly to issue ioctl commands. So I decided to re-implement the the control interface, hopefully, in a cleaner way and solve a couple of other problems at the same time (by adding two additional control functions giving a total of three new functions) and improving one of the existing functions. My initial proposal used a miscellaneous device to route ioctl commands to autofs mounts and the question of why current recommended alternatives were not suitable was asked. The only alternative that may be suitable is the Netlink interface. I won't go into the details of the new functions now but focus on the difficulty I have found implementing one of the existing functions using the Netlink interface. There are some restrictions on the scope of the change. The scope is only the ioctl interface. I don't want change too many things at once, in particular things that are currently working OK. And I'd like to retain the existing semantic behavior of the interface. The function that is a problem is the sending of expire requests. In the current implementation this function is synchronous. An ioctl is used to ask the kernel module (autofs4) to check for mounts that can be expired and, if a candidate is found the module sends a request to the user space daemon asking it to try and umount the select mount after which the daemon sends a success or fail status back to the module which marks the completion of the original ioctl expire request. The Generic Netlink interface won't allow this because only one concurrent send request can be active for "all" Generic Netlink Families in use, since the socket receive function is bracketed by a single mutex. So I would need to use a workqueue to queue the request but that has it's own set of problems. A workqueue can have only one active request going at a time but I may have a number of concurrent expires happening at any one time and the request could block for a significant amount of time so I would have to use multiple queues. Also there isn't a one-to-one correspondence between autofs super blocks and the stream of expire requests so the number of needed workqueues isn't known. Hence, a workqueue would need to be created, used and destroyed for every umount request or some other elaborate mechanism developed to re-use workqueues. Ideas? The next issue is that in order to keep track of multiple in flight requests a separate Netlink socket would need to be opened for every expire request in order to ensure that the Netlink completion reply makes it back to the original requesting thread (Is that actually correct?). Not really such a big problem but it defeats another aim of the re-implementation, which is to reduce the selinux user space exposure to file descriptors that are open but don't yet have close-on-exec flag set when a mount or umount is spawned by the automount daemon. This can obviously be resolved by adding a mutex around the fork/exec code but isn't a popular idea due to added performance overhead. That's about it for a Generic Netlink solution. It looks like it may be possible to avoid the need to use workqueues by adding a new netlink protocol. I didn't notice any mutual exclusion issues in the code but I may not have looked far enough (comments please?). This approach would still have the issue of needing a new socket opened for each expire for the tracking multiple requests and a considerable increase in complexity in the autofs4 module for netlink communication. I'm concerned about the effort I would need to devote to make this work and the increase in complexity that may be needed, especially if the implementation is ultimately not satisfactory for inclusion in the kernel. Please help! Ian ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-03-13 7:00 ` [RFC] " Ian Kent @ 2008-03-14 2:45 ` Ian Kent 2008-03-14 12:45 ` Thomas Graf 1 sibling, 0 replies; 40+ messages in thread From: Ian Kent @ 2008-03-14 2:45 UTC (permalink / raw) To: Thomas Graf Cc: Kernel Mailing List, Andrew Morton, autofs mailing list, linux-fsdevel On Thu, 13 Mar 2008, Ian Kent wrote: Thomas, could you comment on the Netlink related questions I have posed blow please. > On Thu, 28 Feb 2008, Ian Kent wrote: > > > > > > > We seem to be passing some string into a misc-device ioctl and getting some > > > results back. Be aware that this won't be a terribly popular proposal, so > > > I'd suggest that you fully describe the problem which it's trying to solve, > > > and how it solves it, and why the various alternatives (sysfs, netlink, > > > mount options, etc) were judged unsuitable. > > > > Yes, as I said above. > > > > I don't expect that people that aren't close to the development of > > autofs will "get" the problem description in the leading post but I will > > try and expand on it as best I can. > > > > As for the possible alternatives, it sounds like I have some more work > > to do on that. Mount options can't be used as I described in the lead in > > post and, as far as my understanding of sysfs goes, I don't think it's > > appropriate. But, I'm not aware of what the netlink interface may be > > able to do for me so I will need to check on that. > > > > I've attempted an implementation of this using the Generic Netlink > interface and I've struck a difficulty. I would like to use current > recommended development policy but I'm not sure how far I should go with > this in order to avoid using an ioctl implementation so I'm after some > advice and suggestions. > > So, onto the description. > Sorry about the length of the post. > > Autofs user space uses a number of ioctls for mount control. > > I have a problem that can only be resolved by adding an additional control > function that doesn't need to open the autofs mount point path directly > to issue ioctl commands. So I decided to re-implement the the control > interface, hopefully, in a cleaner way and solve a couple of other > problems at the same time (by adding two additional control functions > giving a total of three new functions) and improving one of the existing > functions. > > My initial proposal used a miscellaneous device to route ioctl commands to > autofs mounts and the question of why current recommended alternatives > were not suitable was asked. The only alternative that may be suitable is > the Netlink interface. > > I won't go into the details of the new functions now but focus on the > difficulty I have found implementing one of the existing functions using > the Netlink interface. > > There are some restrictions on the scope of the change. > The scope is only the ioctl interface. I don't want change too many things > at once, in particular things that are currently working OK. And I'd like > to retain the existing semantic behavior of the interface. > > The function that is a problem is the sending of expire requests. In the > current implementation this function is synchronous. An ioctl is used to > ask the kernel module (autofs4) to check for mounts that can be expired > and, if a candidate is found the module sends a request to the user space > daemon asking it to try and umount the select mount after which the daemon > sends a success or fail status back to the module which marks the > completion of the original ioctl expire request. > > The Generic Netlink interface won't allow this because only one concurrent > send request can be active for "all" Generic Netlink Families in use, > since the socket receive function is bracketed by a single mutex. So I would > need to use a workqueue to queue the request but that has it's own set of > problems. > > A workqueue can have only one active request going at a time but I may > have a number of concurrent expires happening at any one time and the > request could block for a significant amount of time so I would have to > use multiple queues. Also there isn't a one-to-one correspondence > between autofs super blocks and the stream of expire requests so the > number of needed workqueues isn't known. Hence, a workqueue would need to > be created, used and destroyed for every umount request or some other > elaborate mechanism developed to re-use workqueues. Ideas? > > The next issue is that in order to keep track of multiple in flight > requests a separate Netlink socket would need to be opened for every > expire request in order to ensure that the Netlink completion reply makes > it back to the original requesting thread (Is that actually correct?). Not > really such a big problem but it defeats another aim of the > re-implementation, which is to reduce the selinux user space exposure to > file descriptors that are open but don't yet have close-on-exec flag set > when a mount or umount is spawned by the automount daemon. This can > obviously be resolved by adding a mutex around the fork/exec code but > isn't a popular idea due to added performance overhead. > > That's about it for a Generic Netlink solution. > > It looks like it may be possible to avoid the need to use workqueues by > adding a new netlink protocol. I didn't notice any mutual exclusion > issues in the code but I may not have looked far enough (comments > please?). This approach would still have the issue of needing a new socket > opened for each expire for the tracking multiple requests and a > considerable increase in complexity in the autofs4 module for netlink > communication. > > I'm concerned about the effort I would need to devote to make this work > and the increase in complexity that may be needed, especially if the > implementation is ultimately not satisfactory for inclusion in the kernel. > > Please help! > Ian > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-03-13 7:00 ` [RFC] " Ian Kent 2008-03-14 2:45 ` Ian Kent @ 2008-03-14 12:45 ` Thomas Graf 2008-03-14 14:10 ` Ian Kent 1 sibling, 1 reply; 40+ messages in thread From: Thomas Graf @ 2008-03-14 12:45 UTC (permalink / raw) To: Ian Kent Cc: Kernel Mailing List, Andrew Morton, autofs mailing list, linux-fsdevel, Christoph Hellwig, Al Viro * Ian Kent <raven@themaw.net> 2008-03-13 16:00 > The function that is a problem is the sending of expire requests. In the > current implementation this function is synchronous. An ioctl is used to > ask the kernel module (autofs4) to check for mounts that can be expired > and, if a candidate is found the module sends a request to the user space > daemon asking it to try and umount the select mount after which the daemon > sends a success or fail status back to the module which marks the > completion of the original ioctl expire request. > > The Generic Netlink interface won't allow this because only one concurrent > send request can be active for "all" Generic Netlink Families in use, > since the socket receive function is bracketed by a single mutex. So I would > need to use a workqueue to queue the request but that has it's own set of > problems. > > The next issue is that in order to keep track of multiple in flight > requests a separate Netlink socket would need to be opened for every > expire request in order to ensure that the Netlink completion reply makes > it back to the original requesting thread (Is that actually correct?). Not > really such a big problem but it defeats another aim of the > re-implementation, which is to reduce the selinux user space exposure to > file descriptors that are open but don't yet have close-on-exec flag set > when a mount or umount is spawned by the automount daemon. This can > obviously be resolved by adding a mutex around the fork/exec code but > isn't a popular idea due to added performance overhead. Netlink is a messaging protocol, synchronization is up to the user. I suggest that you send a netlink notification to a multicast group for every expire candiate when an expire request is received. Unmount daemons simply subscribe to the group and wait for work to do. Put the request onto a list including the netlink pid and sequence number so you can address the orignal source of the request later on. Exit the netlink receive function and wait for the userspace daemon to get back to you. The userspace daemon notifies you of successful or unsuccesful unmount attempts by sending notifications. Update your list entry accordingly and once the request is fullfilled send a notification to the original source of the request by using the stored pid and sequence number. The userspace application requesting the expire can simply block on the receival of this notification in order to make the whole operation synchronous. Sounds acceptable? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [RFC] Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-03-14 12:45 ` Thomas Graf @ 2008-03-14 14:10 ` Ian Kent 0 siblings, 0 replies; 40+ messages in thread From: Ian Kent @ 2008-03-14 14:10 UTC (permalink / raw) To: Thomas Graf Cc: Kernel Mailing List, Andrew Morton, autofs mailing list, linux-fsdevel, Christoph Hellwig, Al Viro On Fri, 2008-03-14 at 13:45 +0100, Thomas Graf wrote: > * Ian Kent <raven@themaw.net> 2008-03-13 16:00 > > The function that is a problem is the sending of expire requests. In the > > current implementation this function is synchronous. An ioctl is used to > > ask the kernel module (autofs4) to check for mounts that can be expired > > and, if a candidate is found the module sends a request to the user space > > daemon asking it to try and umount the select mount after which the daemon > > sends a success or fail status back to the module which marks the > > completion of the original ioctl expire request. > > > > The Generic Netlink interface won't allow this because only one concurrent > > send request can be active for "all" Generic Netlink Families in use, > > since the socket receive function is bracketed by a single mutex. So I would > > need to use a workqueue to queue the request but that has it's own set of > > problems. > > > > The next issue is that in order to keep track of multiple in flight > > requests a separate Netlink socket would need to be opened for every > > expire request in order to ensure that the Netlink completion reply makes > > it back to the original requesting thread (Is that actually correct?). Not > > really such a big problem but it defeats another aim of the > > re-implementation, which is to reduce the selinux user space exposure to > > file descriptors that are open but don't yet have close-on-exec flag set > > when a mount or umount is spawned by the automount daemon. This can > > obviously be resolved by adding a mutex around the fork/exec code but > > isn't a popular idea due to added performance overhead. > > Netlink is a messaging protocol, synchronization is up to the user. Yes, I realize that, but what I'm curious about are the options that I have within the messaging system to control delivery of message replies, other than using separate sockets. Can this be achieved by using the pid set in the originating message? > > I suggest that you send a netlink notification to a multicast group for > every expire candiate when an expire request is received. Unmount > daemons simply subscribe to the group and wait for work to do. Put the > request onto a list including the netlink pid and sequence number so you > can address the orignal source of the request later on. Exit the netlink > receive function and wait for the userspace daemon to get back to you. I'll have to think about what you've said here to relate it to the situation I have. I don't actually have umount daemons, at the moment I request an expire and the daemon creates a thread to do the umount and sends a status message to the kernel module. But that may not matter, see below. > > The userspace daemon notifies you of successful or unsuccesful unmount > attempts by sending notifications. Update your list entry accordingly > and once the request is fullfilled send a notification to the original > source of the request by using the stored pid and sequence number. > > The userspace application requesting the expire can simply block on the > receival of this notification in order to make the whole operation > synchronous. Actually, I've progressed on this since posting. I've implemented the first steps toward using a workqueue to perform the expire and, to my surprise, my code worked for a simple test case. Basically, a thread in the daemon issues the expire, the kernel module queues the work and replies. The expire workqueue task does the expire check and if no candidates are found it sends an expire complete notification, or it sends a umount request to the daemon and waits for the status result, then returns that result as the expire compete notification. Seems to work quite well. I expect this is possibly the method you're suggesting above anyway. Unfortunately, having now stepped up intensity of the testing, I'm getting a hard hang on my system. I've setup to reduce the message functions used to only two simple notification messages to the kernel module to ensure it isn't the expire implementation causing the problem. It's hard to see where I could have messed these two functions as they are essentially re-entrant but there is fairly heavy mount activity of about 10-15 mounts a second. Such is life! Any ideas on what might be causing this? Ian ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-02-28 5:17 ` Andrew Morton 2008-02-28 6:18 ` Ian Kent @ 2008-02-29 16:24 ` Ian Kent 2008-04-11 7:02 ` Ian Kent 1 sibling, 1 reply; 40+ messages in thread From: Ian Kent @ 2008-02-29 16:24 UTC (permalink / raw) To: Andrew Morton Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Christoph Hellwig, Al Viro On Wed, 2008-02-27 at 21:17 -0800, Andrew Morton wrote: > On Tue, 26 Feb 2008 12:23:55 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > > > Hi Andrew, > > > > Patch to add miscellaneous device to autofs4 module for > > ioctls. > > Could you please document the new kernel interface which you're proposing? > In Docmentation/ or in the changelog? > > We seem to be passing some string into a miscdevice ioctl and getting some > results back. Be aware that this won't be a terribly popular proposal, so > I'd suggest that you fully describe the problem which it's trying to solve, > and how it solves it, and why the various alternatives (sysfs, netlink, > mount options, etc) were judged unsuitable. It appears I could do this with the generic netlink subsystem. I'll have a go at it. > > > > > > ... > > > > +/* > > + * Get the autofs super block info struct from the file opened on > > + * the autofs mount point. > > + */ > > +static struct autofs_sb_info *autofs_dev_ioctl_sbi(struct file *f) > > +{ > > + struct autofs_sb_info *sbi = NULL; > > + struct inode *inode; > > + > > + if (f) { > > `f' cannot be NULL here. And, will still be in the netlink implementation and will still return an appropriate error. > > > + inode = f->f_path.dentry->d_inode; > > + sbi = autofs4_sbi(inode->i_sb); > > + } > > + > > + return sbi; > > +} > > + > > +/* Return autofs module protocol version */ > > +static inline int autofs_dev_ioctl_protover(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + param->arg1 = sbi->version; > > + return 0; > > +} > > + > > +/* Return autofs module protocol sub version */ > > +static inline int autofs_dev_ioctl_protosubver(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + param->arg1 = sbi->sub_version; > > + return 0; > > +} > > Don't bother inlining things - the compiler will do it. > > > +/* > > + * Walk down the mount stack looking for an autofs mount that > > + * has the requested device number (aka. new_encode_dev(sb->s_dev). > > + */ > > +static int autofs_dev_ioctl_find_super(struct nameidata *nd, dev_t devno) > > +{ > > + struct dentry *dentry; > > + struct inode *inode; > > + struct super_block *sb; > > + dev_t s_dev; > > + unsigned int err; > > + > > + err = -ENOENT; > > + > > + /* Lookup the dentry name at the base of our mount point */ > > + dentry = d_lookup(nd->path.dentry, &nd->last); > > + if (!dentry) > > + goto out; > > + > > + dput(nd->path.dentry); > > + nd->path.dentry = dentry; > > + > > + /* And follow the mount stack looking for our autofs mount */ > > + while (1) { > > + inode = nd->path.dentry->d_inode; > > + if (!inode) > > + continue; > > + > > + sb = inode->i_sb; > > + s_dev = new_encode_dev(sb->s_dev); > > + if (devno == s_dev) { > > + if (sb->s_magic == AUTOFS_SUPER_MAGIC) { > > + err = 0; > > + break; > > + } > > + } > > + > > + if (!follow_down(&nd->path.mnt, &nd->path.dentry)) > > + goto out; > > + } > > + > > +out: > > + return err; > > +} > > hm. possibly-interested parties cc'ed. Agian, will still be in the netlink implementation. Speak up if it's not right. > > > +/* > > + * Install the file opened for autofs mount point control functions > > + * and set close on exec. > > + */ > > +static void autofs_dev_ioctl_fd_install(unsigned int fd, struct file *file) > > +{ > > + struct files_struct *files = current->files; > > + struct fdtable *fdt; > > + > > + spin_lock(&files->file_lock); > > + fdt = files_fdtable(files); > > + BUG_ON(fdt->fd[fd] != NULL); > > + rcu_assign_pointer(fdt->fd[fd], file); > > + FD_SET(fd, fdt->close_on_exec); > > + spin_unlock(&files->file_lock); > > +} > > That's fd_install() plus an add-on. It's not autofs-specific. Should be > in fs/open.c, methinks? OK, I'll do that if someone speaks up. > > > > > ... > > > > +/* Close file descriptor allocated above (user can also use close(2)). */ > > +static inline int autofs_dev_ioctl_closemount(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + return sys_close(param->ioctlfd); > > +} > > hm. Also, still in the netlink implementation, with a comment a bit more informative than the one above. > > > > > ... > > > > +/* > > + * Set the pipe fd for kernel communication to the daemon. > > + * > > + * Normally this is set at mount using an option but if we > > + * are reconnecting to a busy mount then we need to use this > > + * to tell the autofs mount about the new kernel pipe fd. In > > + * order to protect mounts against incorrectly setting the > > + * pipefd we also require that the autofs mount be catatonic. > > + * > > + * This also sets the process group id used to identify the > > + * controlling process (eg. the owning automount(8) daemon). > > + */ > > +static int autofs_dev_ioctl_setpipefd(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + int pipefd; > > + int err = 0; > > + > > + if (param->arg1 == -1) > > + return -EINVAL; > > + > > + pipefd = param->arg1; > > + > > + if (!sbi->catatonic) > > + return -EBUSY; > > + else { > > + struct file *pipe = fget(pipefd); > > + if (!pipe->f_op || !pipe->f_op->write) { > > + err = -EPIPE; > > + fput(pipe); > > + goto out; > > + } > > + sbi->oz_pgrp = task_pgrp_nr(current); > > + sbi->pipefd = pipefd; > > + sbi->pipe = pipe; > > + sbi->catatonic = 0; > > + } > > +out: > > + return err; > > +} > > We have a new filesystem type, a misc device with a mysterious ioctl, > hand-rolled mountpoint chasing, hand-rolled fd installation and now pipes > too. That's not going to change. There's nothing new here at all. This is merely an re-implementation of the existing autofs ioctl interface with two additional calls, nothing more. > > This is a complex interface. We really need to see the overall > problem > statement, design and interface description, please. I'll add a document describing this, as previously agreed. snip ... > > + > > +/* > > + * Call repeatedly until it returns -EAGAIN, meaning there's nothing > > + * more that can be done. > > + */ > > +static int autofs_dev_ioctl_expire(struct file *fp, > > + struct autofs_sb_info *sbi, > > + struct autofs_dev_ioctl *param) > > +{ > > + struct dentry *dentry; > > + struct vfsmount *mnt; > > + int err = -EAGAIN; > > + int when; > > + > > + when = param->arg1; > > + mnt = fp->f_path.mnt; > > + > > + if (sbi->type & AUTOFS_TYPE_DIRECT) > > + dentry = autofs4_expire_direct(sbi->sb, mnt, sbi, when); > > + else > > + dentry = autofs4_expire_indirect(sbi->sb, mnt, sbi, when); > > + > > + if (dentry) { > > + struct autofs_info *ino = autofs4_dentry_ino(dentry); > > + > > + /* > > + * This is synchronous because it makes the daemon a > > + * little easier > > + */ > > + ino->flags |= AUTOFS_INF_EXPIRING; > > + err = autofs4_wait(sbi, dentry, NFY_EXPIRE); > > + ino->flags &= ~AUTOFS_INF_EXPIRING; > > Are there races around the modification of ino->flags here? I haven't had any problems with this over time. I've always thought that this was because the flag is only set during an expire, of which there is only ever one going for a given mount, is synchronous, and mount requests only read the flag to check status. But I could be wrong since it may have been OK because the existing autofs ioctl holds the BKL for its operations. I'll think about it. snip .... ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-02-29 16:24 ` Ian Kent @ 2008-04-11 7:02 ` Ian Kent 2008-04-12 4:03 ` Andrew Morton 0 siblings, 1 reply; 40+ messages in thread From: Ian Kent @ 2008-04-11 7:02 UTC (permalink / raw) To: Andrew Morton Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Christoph Hellwig, Al Viro, Thomas Graf On Sat, 1 Mar 2008, Ian Kent wrote: > > On Wed, 2008-02-27 at 21:17 -0800, Andrew Morton wrote: > > On Tue, 26 Feb 2008 12:23:55 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > > > > > Hi Andrew, > > > > > > Patch to add miscellaneous device to autofs4 module for > > > ioctls. > > > > Could you please document the new kernel interface which you're proposing? > > In Docmentation/ or in the changelog? > > > > We seem to be passing some string into a miscdevice ioctl and getting some > > results back. Be aware that this won't be a terribly popular proposal, so > > I'd suggest that you fully describe the problem which it's trying to solve, > > and how it solves it, and why the various alternatives (sysfs, netlink, > > mount options, etc) were judged unsuitable. > > It appears I could do this with the generic netlink subsystem. > I'll have a go at it. > I've spent several weeks on this now and I'm having considerable difficulty with the expire function. First, I think using a raw netlink implementation defeats the point of using this approach at all due to increased complexity. So I've used the generic netlink facility and the libnl library for user space. While the complexity on the kernel side is acceptable that isn't the case in user space, the code for the library to issue mount point control commands has more than doubled in size and is still not working for mount point expiration. This has been made more difficult because libnl isn't thread safe, but I have overcome this limitation for everything but the expire function, I now can't determine whether the problem I have with receiving multicast messages, possibly out of order, on individual netlink sockets opened specifically for this purpose, is due to this or is something I'm doing wrong. The generic netlink implementation allows only one message to be in flight at a time. But my expire selects an expire candidate (if possible), sends a request to the daemon to do the umount, obtains the result status and returns this as the result to the original expire request. Consequently, I need to spawn a kernel thread to do this and return, then listen for the matching multicast message containing the result. I don't particularly like spawning a thread to do this because it opens the possibility of orphaned threads which introduces other difficulties cleaning them up if the user space application goes away or misbehaves. But I'm also having problems catching the multicast messages. This works fine in normal operation but fails badly when I have multiple concurrent expires happening, such as when shutting down the daemon with several hundred active mounts. I can't avoid the fact that netlink doesn't provide the same functionality as the ioctl interface and clearly isn't meant to. So, the question is, what are the criteria to use for deciding that a netlink based implementation isn't appropriate because I think I'm well past it now? Comments please. Ian ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-04-11 7:02 ` Ian Kent @ 2008-04-12 4:03 ` Andrew Morton 2008-04-14 4:45 ` Ian Kent 0 siblings, 1 reply; 40+ messages in thread From: Andrew Morton @ 2008-04-12 4:03 UTC (permalink / raw) To: Ian Kent Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Christoph Hellwig, Al Viro, Thomas Graf, netdev On Fri, 11 Apr 2008 15:02:39 +0800 (WST) Ian Kent <raven@themaw.net> wrote: > On Sat, 1 Mar 2008, Ian Kent wrote: > > > > > On Wed, 2008-02-27 at 21:17 -0800, Andrew Morton wrote: > > > On Tue, 26 Feb 2008 12:23:55 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > > > > > > > Hi Andrew, > > > > > > > > Patch to add miscellaneous device to autofs4 module for > > > > ioctls. > > > > > > Could you please document the new kernel interface which you're proposing? > > > In Docmentation/ or in the changelog? > > > > > > We seem to be passing some string into a miscdevice ioctl and getting some > > > results back. Be aware that this won't be a terribly popular proposal, so > > > I'd suggest that you fully describe the problem which it's trying to solve, > > > and how it solves it, and why the various alternatives (sysfs, netlink, > > > mount options, etc) were judged unsuitable. > > > > It appears I could do this with the generic netlink subsystem. > > I'll have a go at it. > > > > I've spent several weeks on this now and I'm having considerable > difficulty with the expire function. > > First, I think using a raw netlink implementation defeats the point of > using this approach at all due to increased complexity. So I've used the > generic netlink facility and the libnl library for user space. While the > complexity on the kernel side is acceptable that isn't the case in user > space, the code for the library to issue mount point control commands has > more than doubled in size and is still not working for mount point > expiration. This has been made more difficult because libnl isn't > thread safe, but I have overcome this limitation for everything but > the expire function, I now can't determine whether the problem I have with > receiving multicast messages, possibly out of order, on individual > netlink sockets opened specifically for this purpose, is due to this or is > something I'm doing wrong. > > The generic netlink implementation allows only one message to be in flight > at a time. But my expire selects an expire candidate (if possible), sends > a request to the daemon to do the umount, obtains the result status and > returns this as the result to the original expire request. Consequently, I > need to spawn a kernel thread to do this and return, then listen for the > matching multicast message containing the result. I don't particularly > like spawning a thread to do this because it opens the possibility of > orphaned threads which introduces other difficulties cleaning them up if > the user space application goes away or misbehaves. But I'm also having > problems catching the multicast messages. This works fine in normal > operation but fails badly when I have multiple concurrent expires > happening, such as when shutting down the daemon with several hundred > active mounts. I can't avoid the fact that netlink doesn't provide the > same functionality as the ioctl interface and clearly isn't meant to. Gee, it sounds like you went above and beyond the call there. The one-message-in-flight limitation of genetlink is suprising - one would expect a kernel subsystem (especially a networking one) to support queueing. I guess it was expedient and the need had not arisen. > So, the question is, what are the criteria to use for deciding that a > netlink based implementation isn't appropriate because I think I'm well > past it now? > > Comments please. Do I recall correctly in remembering that your original design didn't really add any _new_ concepts to autofs interfacing? That inasmuch as the patch sinned, it was repeating already-committed sins? And: you know more about this than anyone else, and you are (now) unbiased by the presence of existing code. What's your opinion? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 4/4] autofs4 - add miscelaneous device for ioctls 2008-04-12 4:03 ` Andrew Morton @ 2008-04-14 4:45 ` Ian Kent 0 siblings, 0 replies; 40+ messages in thread From: Ian Kent @ 2008-04-14 4:45 UTC (permalink / raw) To: Andrew Morton Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel, Christoph Hellwig, Al Viro, Thomas Graf, netdev On Fri, 2008-04-11 at 21:03 -0700, Andrew Morton wrote: > On Fri, 11 Apr 2008 15:02:39 +0800 (WST) Ian Kent <raven@themaw.net> wrote: > > > On Sat, 1 Mar 2008, Ian Kent wrote: > > > > > > > > On Wed, 2008-02-27 at 21:17 -0800, Andrew Morton wrote: > > > > On Tue, 26 Feb 2008 12:23:55 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > > > > > > > > > Hi Andrew, > > > > > > > > > > Patch to add miscellaneous device to autofs4 module for > > > > > ioctls. > > > > > > > > Could you please document the new kernel interface which you're proposing? > > > > In Docmentation/ or in the changelog? > > > > > > > > We seem to be passing some string into a miscdevice ioctl and getting some > > > > results back. Be aware that this won't be a terribly popular proposal, so > > > > I'd suggest that you fully describe the problem which it's trying to solve, > > > > and how it solves it, and why the various alternatives (sysfs, netlink, > > > > mount options, etc) were judged unsuitable. > > > > > > It appears I could do this with the generic netlink subsystem. > > > I'll have a go at it. > > > > > > > I've spent several weeks on this now and I'm having considerable > > difficulty with the expire function. > > > > First, I think using a raw netlink implementation defeats the point of > > using this approach at all due to increased complexity. So I've used the > > generic netlink facility and the libnl library for user space. While the > > complexity on the kernel side is acceptable that isn't the case in user > > space, the code for the library to issue mount point control commands has > > more than doubled in size and is still not working for mount point > > expiration. This has been made more difficult because libnl isn't > > thread safe, but I have overcome this limitation for everything but > > the expire function, I now can't determine whether the problem I have with > > receiving multicast messages, possibly out of order, on individual > > netlink sockets opened specifically for this purpose, is due to this or is > > something I'm doing wrong. > > > > The generic netlink implementation allows only one message to be in flight > > at a time. But my expire selects an expire candidate (if possible), sends > > a request to the daemon to do the umount, obtains the result status and > > returns this as the result to the original expire request. Consequently, I > > need to spawn a kernel thread to do this and return, then listen for the > > matching multicast message containing the result. I don't particularly > > like spawning a thread to do this because it opens the possibility of > > orphaned threads which introduces other difficulties cleaning them up if > > the user space application goes away or misbehaves. But I'm also having > > problems catching the multicast messages. This works fine in normal > > operation but fails badly when I have multiple concurrent expires > > happening, such as when shutting down the daemon with several hundred > > active mounts. I can't avoid the fact that netlink doesn't provide the > > same functionality as the ioctl interface and clearly isn't meant to. > > Gee, it sounds like you went above and beyond the call there. Hahaha, maybe, but I have to be sure it's not just my own lack of understanding giving me grief! > > The one-message-in-flight limitation of genetlink is suprising - one would > expect a kernel subsystem (especially a networking one) to support > queueing. I guess it was expedient and the need had not arisen. I'm not sure but I think there are some special requirements for such a message bus architecture. I've only skimmed the code but it looked like a mutex for each genetlink family or, ideally, for each socket should be possible. We also need to face the fact that this isn't designed to be a drop in replacement for ioctls as it can't provide (and probably can never provide) the not often used independently re-entrant function call like semantic of the ioctl interface. > > > So, the question is, what are the criteria to use for deciding that a > > netlink based implementation isn't appropriate because I think I'm well > > past it now? > > > > Comments please. > > Do I recall correctly in remembering that your original design didn't > really add any _new_ concepts to autofs interfacing? That inasmuch as > the patch sinned, it was repeating already-committed sins? Almost, it is a re-implementation of the existing ioctl interface. It has an extra entry point so we can obtain a file handle to an autofs mount that has been over mounted and another to get owner info for mount re-construction on daemon restart. Which is what we need to be able to solve the active restart problem. I also made a couple of improvements, namely, allow actual failure status to be returned from the daemon to the kernel rather than always using ENOENT (long overdue, still need to update the daemon though) and added an additional entry point to check if a path is a mount point so we can eliminate some of the high overhead mount table scanning in the daemon. > > And: you know more about this than anyone else, and you are (now) unbiased > by the presence of existing code. What's your opinion? There's no question that genetlink is an elegant solution for common case ioctl functions but, as I say, it's not a complete replacement probably because it's primary purpose in life is to be a message bus implementation rather than specifically an ioctl replacement. As is often the case after posting a "please help" message it occurred to me that there is another way I might be able to do this. Instead of sending an independent umount request I could check, and if a candidate is found, set the expiring flag and return a "yes" status to the daemon and have the same function do the umount, then clear the when returning the status. That would eliminate the ugliness in the daemon and the need to use kernel threads but would open the possibility of the "expiring flag" remaining set if the daemon went away. That would prevent lookups (since we wait for expires to complete) and so prevent manual umount cleanup so I'm not sure yet what the implications of this really are. There is one other concern, the expire in the daemon has become far to complex because I enumerate umount candidates, almost for no other reason than to "count" the number of times I need to call the expire ioctl. This involves scanning the mount table which has proven to be a bit of a killer for large maps. I think the best way to improve this is try and get back to the way the expire was done long ago. That is, when an expire comes in for a mount (file handle) continually call back to the daemon until we can't umount any more mounts, then return the appropriate status to the daemon (now we just expire one mount at a time). This changed because we were getting infinite loops on umount fails in some cases but I think my approach to fixing that was just plain wrong. A genetlink implementation will exclude this possibility and due to the requirements of the message bus architecture I don't think that is likely to change any time soon, if ever. All things considered I'm leaning toward using a misc device to route the ioctls. Ian ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 2/4] autofs4 - add mount option to display mount device 2008-02-26 3:21 [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Ian Kent ` (2 preceding siblings ...) 2008-02-26 3:23 ` [PATCH 4/4] autofs4 - add miscelaneous device for ioctls Ian Kent @ 2008-02-26 4:29 ` Ian Kent 2008-02-28 5:17 ` Andrew Morton 2008-02-28 4:40 ` [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Andrew Morton 4 siblings, 1 reply; 40+ messages in thread From: Ian Kent @ 2008-02-26 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel Hi Andrew, Patch to add a display mount option to show the device number of the autofs mount super block. Signed-off-by: Ian Kent < raven@themaw.net> Ian --- diff -up linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option linux-2.6.25-rc2-mm1/fs/autofs4/inode.c --- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option 2008-02-20 13:01:06.000000000 +0900 +++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:03:45.000000000 +0900 @@ -190,6 +190,7 @@ static int autofs4_show_options(struct s seq_printf(m, ",timeout=%lu", sbi->exp_timeout/HZ); seq_printf(m, ",minproto=%d", sbi->min_proto); seq_printf(m, ",maxproto=%d", sbi->max_proto); + seq_printf(m, ",dev=%d", autofs4_get_dev(sbi)); if (sbi->type & AUTOFS_TYPE_OFFSET) seq_printf(m, ",offset"); @@ -332,7 +333,7 @@ int autofs4_fill_super(struct super_bloc sbi->sb = s; sbi->version = 0; sbi->sub_version = 0; - sbi->type = 0; + sbi->type = AUTOFS_TYPE_INDIRECT; sbi->min_proto = 0; sbi->max_proto = 0; mutex_init(&sbi->wq_mutex); ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 2/4] autofs4 - add mount option to display mount device 2008-02-26 4:29 ` [PATCH 2/4] autofs4 - add mount option to display mount device Ian Kent @ 2008-02-28 5:17 ` Andrew Morton 0 siblings, 0 replies; 40+ messages in thread From: Andrew Morton @ 2008-02-28 5:17 UTC (permalink / raw) To: Ian Kent; +Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel On Tue, 26 Feb 2008 13:29:07 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > --- linux-2.6.25-rc2-mm1/fs/autofs4/inode.c.add-mount-device-display-option 2008-02-20 13:01:06.000000000 +0900 > +++ linux-2.6.25-rc2-mm1/fs/autofs4/inode.c 2008-02-20 13:03:45.000000000 +0900 > @@ -190,6 +190,7 @@ static int autofs4_show_options(struct s > seq_printf(m, ",timeout=%lu", sbi->exp_timeout/HZ); > seq_printf(m, ",minproto=%d", sbi->min_proto); > seq_printf(m, ",maxproto=%d", sbi->max_proto); > + seq_printf(m, ",dev=%d", autofs4_get_dev(sbi)); %u would be more appropriate here. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls 2008-02-26 3:21 [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Ian Kent ` (3 preceding siblings ...) 2008-02-26 4:29 ` [PATCH 2/4] autofs4 - add mount option to display mount device Ian Kent @ 2008-02-28 4:40 ` Andrew Morton 2008-02-28 6:07 ` Ian Kent 4 siblings, 1 reply; 40+ messages in thread From: Andrew Morton @ 2008-02-28 4:40 UTC (permalink / raw) To: Ian Kent; +Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel On Tue, 26 Feb 2008 12:21:58 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > > Hi Andrew, > > There is a problem with active restarts in autofs (that is to > say restarting autofs when there are busy mounts). > > Currently autofs uses "umount -l" to clear active mounts at > restart. While using lazy umount works for most cases, anything > that needs to walk back up the mount tree to construct a path, > such as getcwd(2) and the proc file system /proc/<pid>/cwd, no > longer works because the point from which the path is constructed > has been detached from the mount tree. > > The actual problem with autofs is that it can't reconnect to > existing mounts. Immediately one things of just adding the > ability to remount autofs file systems would solve it, but > alas, that can't work. This is because autofs direct mounts > and the implementation of "on demand mount and expire" of > nested mount trees have the file system mounted on top of > the mount trigger dentry. > > To resolve this a miscellaneous device node for routing ioctl > commands to these mount points has been implemented for the > autofs4 kernel module. > > For those wishing to test this out an updated user space daemon > is needed. Checking out and building from the git repo or > applying all the current patches to the 5.0.3 tar distribution > will do the trick. This is all available at the usual location > on kernel.org. > Could we please be a bit more specific than "the usual location"? Should autofs userspace have an entry in Documentation/Changes? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls 2008-02-28 4:40 ` [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Andrew Morton @ 2008-02-28 6:07 ` Ian Kent 0 siblings, 0 replies; 40+ messages in thread From: Ian Kent @ 2008-02-28 6:07 UTC (permalink / raw) To: Andrew Morton; +Cc: Kernel Mailing List, autofs mailing list, linux-fsdevel On Wed, 2008-02-27 at 20:40 -0800, Andrew Morton wrote: > On Tue, 26 Feb 2008 12:21:58 +0900 (WST) Ian Kent <raven@themaw.net> wrote: > > > > > Hi Andrew, > > > > There is a problem with active restarts in autofs (that is to > > say restarting autofs when there are busy mounts). > > > > Currently autofs uses "umount -l" to clear active mounts at > > restart. While using lazy umount works for most cases, anything > > that needs to walk back up the mount tree to construct a path, > > such as getcwd(2) and the proc file system /proc/<pid>/cwd, no > > longer works because the point from which the path is constructed > > has been detached from the mount tree. > > > > The actual problem with autofs is that it can't reconnect to > > existing mounts. Immediately one things of just adding the > > ability to remount autofs file systems would solve it, but > > alas, that can't work. This is because autofs direct mounts > > and the implementation of "on demand mount and expire" of > > nested mount trees have the file system mounted on top of > > the mount trigger dentry. > > > > To resolve this a miscellaneous device node for routing ioctl > > commands to these mount points has been implemented for the > > autofs4 kernel module. > > > > For those wishing to test this out an updated user space daemon > > is needed. Checking out and building from the git repo or > > applying all the current patches to the 5.0.3 tar distribution > > will do the trick. This is all available at the usual location > > on kernel.org. > > > > Could we please be a bit more specific than "the usual location"? Yes, I should have been more specific. I'll fix that. > > Should autofs userspace have an entry in Documentation/Changes? Sound like a sensible thing to do. I'll include a patch for that when I re-post the patch set. Ian ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2008-04-14 4:51 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-02-26 3:21 [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Ian Kent 2008-02-26 3:22 ` [PATCH 1/4] autofs4 - check for invalid dentry in getpath Ian Kent 2008-02-26 3:23 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor Ian Kent 2008-02-26 5:14 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor - correction Ian Kent 2008-02-28 4:45 ` [PATCH 3/4] autofs4 - track uid and gid of last mount requestor Andrew Morton 2008-02-28 6:22 ` Ian Kent 2008-02-28 6:37 ` Andrew Morton 2008-02-28 7:08 ` Ian Kent 2008-02-28 7:23 ` Andrew Morton 2008-02-28 8:00 ` Ian Kent 2008-02-28 17:13 ` Jeff Moyer 2008-02-28 19:51 ` Serge E. Hallyn 2008-02-29 3:32 ` Ian Kent 2008-02-29 16:09 ` Serge E. Hallyn 2008-02-29 16:20 ` Pavel Emelyanov 2008-02-29 17:42 ` Serge E. Hallyn 2008-03-02 0:49 ` Eric W. Biederman 2008-03-02 1:13 ` Eric W. Biederman 2008-03-03 15:28 ` Serge E. Hallyn 2008-03-04 22:16 ` Eric W. Biederman 2008-02-28 7:51 ` Pavel Emelyanov 2008-02-28 7:59 ` Andrew Morton 2008-02-28 8:06 ` Ian Kent 2008-02-28 12:31 ` [autofs] " Fabio Olive Leite 2008-02-28 20:33 ` Eric W. Biederman 2008-02-26 3:23 ` [PATCH 4/4] autofs4 - add miscelaneous device for ioctls Ian Kent 2008-02-28 5:17 ` Andrew Morton 2008-02-28 6:18 ` Ian Kent 2008-03-13 7:00 ` [RFC] " Ian Kent 2008-03-14 2:45 ` Ian Kent 2008-03-14 12:45 ` Thomas Graf 2008-03-14 14:10 ` Ian Kent 2008-02-29 16:24 ` Ian Kent 2008-04-11 7:02 ` Ian Kent 2008-04-12 4:03 ` Andrew Morton 2008-04-14 4:45 ` Ian Kent 2008-02-26 4:29 ` [PATCH 2/4] autofs4 - add mount option to display mount device Ian Kent 2008-02-28 5:17 ` Andrew Morton 2008-02-28 4:40 ` [PATCH 0/4] autofs4 - autofs needs a miscelaneous device for ioctls Andrew Morton 2008-02-28 6:07 ` Ian Kent
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).