LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: torvalds@linux-foundation.org,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-api@vger.kernel.org, tglx@linutronix.de, serue@us.ibm.com,
dave@linux.vnet.ibm.com, mingo@elte.hu, hpa@zytor.com,
viro@zeniv.linux.org.uk, orenl@cs.columbia.edu
Subject: Re: [RFC v7][PATCH 2/9] General infrastructure for checkpoint restart
Date: Tue, 21 Oct 2008 12:41:30 -0700 [thread overview]
Message-ID: <20081021124130.a002e838.akpm@linux-foundation.org> (raw)
In-Reply-To: <1224481237-4892-3-git-send-email-orenl@cs.columbia.edu>
On Mon, 20 Oct 2008 01:40:30 -0400
Oren Laadan <orenl@cs.columbia.edu> wrote:
> Add those interfaces, as well as helpers needed to easily manage the
> file format. The code is roughly broken out as follows:
>
> checkpoint/sys.c - user/kernel data transfer, as well as setup of the
> checkpoint/restart context (a per-checkpoint data structure for
> housekeeping)
>
> checkpoint/checkpoint.c - output wrappers and basic checkpoint handling
>
> checkpoint/restart.c - input wrappers and basic restart handling
>
> Patches to add the per-architecture support as well as the actual
> work to do the memory checkpoint follow in subsequent patches.
>
>
> ...
>
> +int cr_kwrite(struct cr_ctx *ctx, void *buf, int count)
> +{
> + mm_segment_t oldfs;
> + int ret;
> +
> + oldfs = get_fs();
> + set_fs(KERNEL_DS);
> + ret = cr_uwrite(ctx, buf, count);
> + set_fs(oldfs);
> +
> + return ret;
> +}
The decision to write files direct from within the kernel is a bit
unusual and needs discussion and justification in the changelog,
please.
Other schemes would be to make the data available to userspace via a
pseudo-fs file, netlink, a pipe, blah, blah.
>
> ...
>
> +/*
> + * During checkpoint and restart the code writes outs/reads in data
> + * to/from the chekcpoint image from/to a temporary buffer (ctx->hbuf).
Yuo cnat tpye.
> + * Because operations can be nested, one should call cr_hbuf_get() to
> + * reserve space in the buffer, and then cr_hbuf_put() when no longer
> + * needs that space.
Mangled grammar.
> + */
> +
> +/*
> + * ctx->hbuf is used to hold headers and data of known (or bound),
> + * static sizes. In some cases, multiple headers may be allocated in
> + * a nested manner. The size should accommodate all headers, nested
> + * or not, on all archs.
> + */
> +#define CR_HBUF_TOTAL (8 * 4096)
> +
>
> ...
>
> +/*
> + * helpers to manage CR contexts: allocated for each checkpoint and/or
> + * restart operation, and persists until the operation is completed.
> + */
> +
> +/* unique checkpoint identifier (FIXME: should be per-container) */
> +static atomic_t cr_ctx_count;
This never gets initialised. Use ATOMIC_INIT() here. (It doesn't
matter, but one day it might!)
>
> ...
>
> asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
> {
> - pr_debug("sys_checkpoint not implemented yet\n");
> - return -ENOSYS;
> + struct cr_ctx *ctx;
> + int ret;
> +
> + /* no flags for now */
> + if (flags)
> + return -EINVAL;
> +
> + ctx = cr_ctx_alloc(pid, fd, flags | CR_CTX_CKPT);
> + if (IS_ERR(ctx))
> + return PTR_ERR(ctx);
> +
> + ret = do_checkpoint(ctx);
> +
> + if (!ret)
> + ret = ctx->crid;
> +
> + cr_ctx_free(ctx);
> + return ret;
> }
Is it appropriate that this be an unprivileged operation?
What happens if I pass it a pid which isn't system-wide unique?
What happens if I pass it a pid of a process which I don't own? This
is super security-sensitive and we need to go over the permission
checking with a toothcomb. It needs to be exhaustively described in
the changelog. It might have security/selinux implications - I don't
know, I didn't look, but lights are flashing and bells are ringing over
here.
What happens if I pass it a pid of a process which I _do_ own, but it
does not refer to a container's init process?
If `pid' must refer to a container's init process, isn't it always
equal to 1??
> /**
> * sys_restart - restart a container
> * @crid: checkpoint image identifier
> @@ -36,6 +234,19 @@ asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
> */
> asmlinkage long sys_restart(int crid, int fd, unsigned long flags)
> {
> - pr_debug("sys_restart not implemented yet\n");
> - return -ENOSYS;
> + struct cr_ctx *ctx;
> + int ret;
> +
> + /* no flags for now */
> + if (flags)
> + return -EINVAL;
> +
> + ctx = cr_ctx_alloc(crid, fd, flags | CR_CTX_RSTR);
> + if (IS_ERR(ctx))
> + return PTR_ERR(ctx);
> +
> + ret = do_restart(ctx);
> +
> + cr_ctx_free(ctx);
> + return ret;
> }
Again, this is scary stuff. We're allowing unprivileged userspace to
feed random numbers into kernel data structures.
I'd like to see the security guys take a real close look at all of
this, and for them to do that effectively they should be provided with
a full description of the security design of this feature.
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 9ba495d..e2deded 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -324,12 +324,12 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
>
> EXPORT_SYMBOL(vfs_write);
>
> -static inline loff_t file_pos_read(struct file *file)
> +inline loff_t file_pos_read(struct file *file)
> {
> return file->f_pos;
> }
>
> -static inline void file_pos_write(struct file *file, loff_t pos)
> +inline void file_pos_write(struct file *file, loff_t pos)
> {
> file->f_pos = pos;
> }
Might as well move these to a header and inline them everywhere.
That'd be a separate leadin patch.
next prev parent reply other threads:[~2008-10-21 19:42 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-20 5:40 [RFC v7][PATCH 0/9] Kernel based checkpoint/restart Oren Laadan
2008-10-20 5:40 ` [RFC v7][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-10-20 5:40 ` [RFC v7][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
2008-10-21 19:41 ` Andrew Morton [this message]
2008-10-21 20:24 ` Serge E. Hallyn
2008-10-21 20:41 ` Andrew Morton
2008-10-22 1:33 ` Oren Laadan
2008-10-22 2:55 ` Daniel Jacobowitz
2008-10-22 3:02 ` Dave Hansen
2008-10-22 14:29 ` Daniel Jacobowitz
2008-10-22 15:28 ` Serge E. Hallyn
2008-10-22 16:02 ` Oren Laadan
2008-10-22 17:03 ` Serge E. Hallyn
2008-10-22 18:32 ` Oren Laadan
2008-10-27 8:27 ` Peter Chubb
2008-10-27 11:03 ` Oren Laadan
2008-10-27 16:42 ` Dave Hansen
2008-10-27 17:11 ` Oren Laadan
2008-10-27 20:51 ` Matt Helsley
2008-10-27 21:20 ` Serge E. Hallyn
2008-10-27 21:51 ` Oren Laadan
2008-10-27 22:09 ` Dave Hansen
2008-10-28 18:33 ` Serge E. Hallyn
2008-10-20 5:40 ` [RFC v7][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-10-20 5:40 ` [RFC v7][PATCH 4/9] Dump memory address space Oren Laadan
2008-10-20 5:40 ` [RFC v7][PATCH 5/9] Restore " Oren Laadan
2008-10-20 5:40 ` [RFC v7][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-10-28 16:48 ` Michael Kerrisk
2008-10-20 5:40 ` [RFC v7][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-10-20 5:40 ` [RFC v7][PATCH 8/9] Dump open file descriptors Oren Laadan
2008-10-20 5:40 ` [RFC v7][PATCH 9/9] Restore open file descriprtors Oren Laadan
2008-10-21 19:21 ` [RFC v7][PATCH 0/9] Kernel based checkpoint/restart Andrew Morton
2008-10-21 20:41 ` Dave Hansen
2008-10-22 9:20 ` Ingo Molnar
2008-10-22 11:51 ` Daniel Lezcano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081021124130.a002e838.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=orenl@cs.columbia.edu \
--cc=serue@us.ibm.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--subject='Re: [RFC v7][PATCH 2/9] General infrastructure for checkpoint restart' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).