LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Oren Laadan <orenl@cs.columbia.edu>
Cc: torvalds@linux-foundation.org,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-api@vger.kernel.org, tglx@linutronix.de, serue@us.ibm.com,
	dave@linux.vnet.ibm.com, mingo@elte.hu, hpa@zytor.com,
	viro@zeniv.linux.org.uk, orenl@cs.columbia.edu
Subject: Re: [RFC v7][PATCH 2/9] General infrastructure for checkpoint restart
Date: Tue, 21 Oct 2008 12:41:30 -0700	[thread overview]
Message-ID: <20081021124130.a002e838.akpm@linux-foundation.org> (raw)
In-Reply-To: <1224481237-4892-3-git-send-email-orenl@cs.columbia.edu>

On Mon, 20 Oct 2008 01:40:30 -0400
Oren Laadan <orenl@cs.columbia.edu> wrote:

> Add those interfaces, as well as helpers needed to easily manage the
> file format. The code is roughly broken out as follows:
> 
> checkpoint/sys.c - user/kernel data transfer, as well as setup of the
> checkpoint/restart context (a per-checkpoint data structure for
> housekeeping)
> 
> checkpoint/checkpoint.c - output wrappers and basic checkpoint handling
> 
> checkpoint/restart.c - input wrappers and basic restart handling
> 
> Patches to add the per-architecture support as well as the actual
> work to do the memory checkpoint follow in subsequent patches.
> 
>
> ...
>
> +int cr_kwrite(struct cr_ctx *ctx, void *buf, int count)
> +{
> +	mm_segment_t oldfs;
> +	int ret;
> +
> +	oldfs = get_fs();
> +	set_fs(KERNEL_DS);
> +	ret = cr_uwrite(ctx, buf, count);
> +	set_fs(oldfs);
> +
> +	return ret;
> +}

The decision to write files direct from within the kernel is a bit
unusual and needs discussion and justification in the changelog,
please.

Other schemes would be to make the data available to userspace via a
pseudo-fs file, netlink, a pipe, blah, blah.

>
> ...
>
> +/*
> + * During checkpoint and restart the code writes outs/reads in data
> + * to/from the chekcpoint image from/to a temporary buffer (ctx->hbuf).

Yuo cnat tpye.

> + * Because operations can be nested, one should call cr_hbuf_get() to
> + * reserve space in the buffer, and then cr_hbuf_put() when no longer
> + * needs that space.

Mangled grammar.

> + */
> +
> +/*
> + * ctx->hbuf is used to hold headers and data of known (or bound),
> + * static sizes. In some cases, multiple headers may be allocated in
> + * a nested manner. The size should accommodate all headers, nested
> + * or not, on all archs.
> + */
> +#define CR_HBUF_TOTAL  (8 * 4096)
> +
>
> ...
>
> +/*
> + * helpers to manage CR contexts: allocated for each checkpoint and/or
> + * restart operation, and persists until the operation is completed.
> + */
> +
> +/* unique checkpoint identifier (FIXME: should be per-container) */
> +static atomic_t cr_ctx_count;

This never gets initialised.  Use ATOMIC_INIT() here.  (It doesn't
matter, but one day it might!)

>
> ...
>
>  asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
>  {
> -	pr_debug("sys_checkpoint not implemented yet\n");
> -	return -ENOSYS;
> +	struct cr_ctx *ctx;
> +	int ret;
> +
> +	/* no flags for now */
> +	if (flags)
> +		return -EINVAL;
> +
> +	ctx = cr_ctx_alloc(pid, fd, flags | CR_CTX_CKPT);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	ret = do_checkpoint(ctx);
> +
> +	if (!ret)
> +		ret = ctx->crid;
> +
> +	cr_ctx_free(ctx);
> +	return ret;
>  }

Is it appropriate that this be an unprivileged operation?

What happens if I pass it a pid which isn't system-wide unique?

What happens if I pass it a pid of a process which I don't own?  This
is super security-sensitive and we need to go over the permission
checking with a toothcomb.  It needs to be exhaustively described in
the changelog.  It might have security/selinux implications - I don't
know, I didn't look, but lights are flashing and bells are ringing over
here.

What happens if I pass it a pid of a process which I _do_ own, but it
does not refer to a container's init process?

If `pid' must refer to a container's init process, isn't it always
equal to 1??

>  /**
>   * sys_restart - restart a container
>   * @crid: checkpoint image identifier
> @@ -36,6 +234,19 @@ asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
>   */
>  asmlinkage long sys_restart(int crid, int fd, unsigned long flags)
>  {
> -	pr_debug("sys_restart not implemented yet\n");
> -	return -ENOSYS;
> +	struct cr_ctx *ctx;
> +	int ret;
> +
> +	/* no flags for now */
> +	if (flags)
> +		return -EINVAL;
> +
> +	ctx = cr_ctx_alloc(crid, fd, flags | CR_CTX_RSTR);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	ret = do_restart(ctx);
> +
> +	cr_ctx_free(ctx);
> +	return ret;
>  }

Again, this is scary stuff.  We're allowing unprivileged userspace to
feed random numbers into kernel data structures.

I'd like to see the security guys take a real close look at all of
this, and for them to do that effectively they should be provided with
a full description of the security design of this feature.

> diff --git a/fs/read_write.c b/fs/read_write.c
> index 9ba495d..e2deded 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -324,12 +324,12 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
>  
>  EXPORT_SYMBOL(vfs_write);
>  
> -static inline loff_t file_pos_read(struct file *file)
> +inline loff_t file_pos_read(struct file *file)
>  {
>  	return file->f_pos;
>  }
>  
> -static inline void file_pos_write(struct file *file, loff_t pos)
> +inline void file_pos_write(struct file *file, loff_t pos)
>  {
>  	file->f_pos = pos;
>  }

Might as well move these to a header and inline them everywhere. 
That'd be a separate leadin patch.



  reply	other threads:[~2008-10-21 19:42 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-20  5:40 [RFC v7][PATCH 0/9] Kernel based checkpoint/restart Oren Laadan
2008-10-20  5:40 ` [RFC v7][PATCH 1/9] Create syscalls: sys_checkpoint, sys_restart Oren Laadan
2008-10-20  5:40 ` [RFC v7][PATCH 2/9] General infrastructure for checkpoint restart Oren Laadan
2008-10-21 19:41   ` Andrew Morton [this message]
2008-10-21 20:24     ` Serge E. Hallyn
2008-10-21 20:41       ` Andrew Morton
2008-10-22  1:33       ` Oren Laadan
2008-10-22  2:55         ` Daniel Jacobowitz
2008-10-22  3:02           ` Dave Hansen
2008-10-22 14:29             ` Daniel Jacobowitz
2008-10-22 15:28         ` Serge E. Hallyn
2008-10-22 16:02           ` Oren Laadan
2008-10-22 17:03             ` Serge E. Hallyn
2008-10-22 18:32               ` Oren Laadan
2008-10-27  8:27             ` Peter Chubb
2008-10-27 11:03               ` Oren Laadan
2008-10-27 16:42                 ` Dave Hansen
2008-10-27 17:11                   ` Oren Laadan
2008-10-27 20:51                     ` Matt Helsley
2008-10-27 21:20                       ` Serge E. Hallyn
2008-10-27 21:51                       ` Oren Laadan
2008-10-27 22:09                         ` Dave Hansen
2008-10-28 18:33                           ` Serge E. Hallyn
2008-10-20  5:40 ` [RFC v7][PATCH 3/9] x86 support for checkpoint/restart Oren Laadan
2008-10-20  5:40 ` [RFC v7][PATCH 4/9] Dump memory address space Oren Laadan
2008-10-20  5:40 ` [RFC v7][PATCH 5/9] Restore " Oren Laadan
2008-10-20  5:40 ` [RFC v7][PATCH 6/9] Checkpoint/restart: initial documentation Oren Laadan
2008-10-28 16:48   ` Michael Kerrisk
2008-10-20  5:40 ` [RFC v7][PATCH 7/9] Infrastructure for shared objects Oren Laadan
2008-10-20  5:40 ` [RFC v7][PATCH 8/9] Dump open file descriptors Oren Laadan
2008-10-20  5:40 ` [RFC v7][PATCH 9/9] Restore open file descriprtors Oren Laadan
2008-10-21 19:21 ` [RFC v7][PATCH 0/9] Kernel based checkpoint/restart Andrew Morton
2008-10-21 20:41   ` Dave Hansen
2008-10-22  9:20     ` Ingo Molnar
2008-10-22 11:51       ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081021124130.a002e838.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=orenl@cs.columbia.edu \
    --cc=serue@us.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --subject='Re: [RFC v7][PATCH 2/9] General infrastructure for checkpoint restart' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).