LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/9] OpenVZ kernel based checkpointing/restart
@ 2008-09-03 10:57 Andrey Mirkin
2008-09-03 10:57 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Andrey Mirkin
` (5 more replies)
0 siblings, 6 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
This patchset introduces kernel based checkpointing/restart as it is
implemented in OpenVZ project. This patchset has limited functionality and
are able to checkpoint/restart only single process. Recently Oren Laaden
sent another kernel based implementation of checkpoint/restart. The main
differences between this patchset and Oren's patchset are:
* In this patchset checkpointing initiated not from the process
(right now we do not have a container, only namespaces), Oren's patchset
performs checkpointing from the process context.
* Restart in this patchset is initiated from process, which restarts a new
process (in new namespaces) with saved state. Oren's patchset uses the same
process from which restart was initiated and restore saved state over it.
* Checkpoint/restart functionality in this patchset is implemented as a kernel
module
As checkpointing is initiated not from the process which state should be saved
we should freeze a process before saving its state. Right now Container Freezer
from Matt Helsley can be used for this.
This patchset introduce only a concept how kernel based checkpointing/restart
can be implemented and are able to checkpoint/restart only a single process
with simple VMAs.
I've tried to split my patchset in small patches to make review more easier.
^ permalink raw reply [flat|nested] 69+ messages in thread
* [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls
2008-09-03 10:57 [PATCH 0/9] OpenVZ kernel based checkpointing/restart Andrey Mirkin
@ 2008-09-03 10:57 ` Andrey Mirkin
2008-09-03 10:57 ` [PATCH 2/9] Make checkpoint/restart functionality modular Andrey Mirkin
2008-09-03 11:44 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Cedric Le Goater
2008-09-03 12:28 ` [PATCH 0/9] OpenVZ kernel based checkpointing/restart Cedric Le Goater
` (4 subsequent siblings)
5 siblings, 2 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
Right now they just return -ENOSYS. Later they will provide functionality
to checkpoint and restart a container.
Both syscalls take as arguments a file descriptor and flags.
Also sys_checkpoint take as the first argument a PID of container's init
(later it will be container ID); sys_restart takes as the first argument
a container ID (right now it will not be used).
Signed-off-by: Andrey Mirkin <major@openvz.org>
---
Makefile | 2 +-
arch/x86/kernel/syscall_table_32.S | 2 +
cpt/Makefile | 1 +
cpt/sys_core.c | 38 ++++++++++++++++++++++++++++++++++++
include/asm-x86/unistd_32.h | 2 +
5 files changed, 44 insertions(+), 1 deletions(-)
create mode 100644 cpt/Makefile
create mode 100644 cpt/sys_core.c
diff --git a/Makefile b/Makefile
index ea413fa..1dee5c0 100644
--- a/Makefile
+++ b/Makefile
@@ -619,7 +619,7 @@ export mod_strip_cmd
ifeq ($(KBUILD_EXTMOD),)
-core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/
+core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ cpt/
vmlinux-dirs := $(patsubst %/,%,$(filter %/, $(init-y) $(init-m) \
$(core-y) $(core-m) $(drivers-y) $(drivers-m) \
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index fd9d4f4..4a0d7fb 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -333,3 +333,5 @@ ENTRY(sys_call_table)
.long sys_pipe2
.long sys_inotify_init1
.long sys_hijack
+ .long sys_checkpoint
+ .long sys_restart /* 335 */
diff --git a/cpt/Makefile b/cpt/Makefile
new file mode 100644
index 0000000..2276fb1
--- /dev/null
+++ b/cpt/Makefile
@@ -0,0 +1 @@
+obj-y += sys_core.o
diff --git a/cpt/sys_core.c b/cpt/sys_core.c
new file mode 100644
index 0000000..1a97fb6
--- /dev/null
+++ b/cpt/sys_core.c
@@ -0,0 +1,38 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Author: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+
+/**
+ * sys_checkpoint - checkpoint a container from outside
+ * @pid: pid of the container init(1) process
+ * TODO: should switch to container id later
+ * @fd: file to which save the checkpoint image
+ * @flags: checkpoint operation flags
+ */
+asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
+{
+ return -ENOSYS;
+}
+
+/**
+ * sys_restart - restart a container
+ * @ctid: container id which should be used to restart a container
+ * @fd: file from which read the checkpoint image
+ * @flags: restart operation flags
+ */
+asmlinkage long sys_restart(int ctid, int fd, unsigned long flags)
+{
+ return -ENOSYS;
+}
diff --git a/include/asm-x86/unistd_32.h b/include/asm-x86/unistd_32.h
index 70280da..1a09604 100644
--- a/include/asm-x86/unistd_32.h
+++ b/include/asm-x86/unistd_32.h
@@ -339,6 +339,8 @@
#define __NR_pipe2 331
#define __NR_inotify_init1 332
#define __NR_hijack 333
+#define __NR_checkpoint 334
+#define __NR_restart 335
#ifdef __KERNEL__
--
1.5.6
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH 2/9] Make checkpoint/restart functionality modular
2008-09-03 10:57 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Andrey Mirkin
@ 2008-09-03 10:57 ` Andrey Mirkin
2008-09-03 10:57 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Andrey Mirkin
2008-09-03 14:27 ` [PATCH 2/9] Make checkpoint/restart functionality modular Serge E. Hallyn
2008-09-03 11:44 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Cedric Le Goater
1 sibling, 2 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
A config option CONFIG_CHECKPOINT is introduced.
New structure cpt_operations is introduced to store pointers to
checkpoint/restart functions from module.
Signed-off-by: Andrey Mirkin <major@openvz.org>
---
cpt/Kconfig | 7 +++++++
cpt/Makefile | 4 ++++
cpt/cpt.h | 19 +++++++++++++++++++
cpt/sys.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
cpt/sys_core.c | 29 +++++++++++++++++++++++++++--
init/Kconfig | 2 ++
6 files changed, 107 insertions(+), 2 deletions(-)
create mode 100644 cpt/Kconfig
create mode 100644 cpt/cpt.h
create mode 100644 cpt/sys.c
diff --git a/cpt/Kconfig b/cpt/Kconfig
new file mode 100644
index 0000000..b9bc72d
--- /dev/null
+++ b/cpt/Kconfig
@@ -0,0 +1,7 @@
+config CHECKPOINT
+ tristate "Checkpoint & restart for containers"
+ depends on EXPERIMENTAL
+ default n
+ help
+ This option adds module "cptrst", which allow to save a running
+ container to a file and restart it later using this image file.
diff --git a/cpt/Makefile b/cpt/Makefile
index 2276fb1..bfe75d5 100644
--- a/cpt/Makefile
+++ b/cpt/Makefile
@@ -1 +1,5 @@
obj-y += sys_core.o
+
+obj-$(CONFIG_CHECKPOINT) += cptrst.o
+
+cptrst-objs := sys.o
diff --git a/cpt/cpt.h b/cpt/cpt.h
new file mode 100644
index 0000000..381a9bf
--- /dev/null
+++ b/cpt/cpt.h
@@ -0,0 +1,19 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Author: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+struct cpt_operations
+{
+ struct module * owner;
+ int (*checkpoint)(pid_t pid, int fd, unsigned long flags);
+ int (*restart)(int ctid, int fd, unsigned long flags);
+};
+extern struct cpt_operations cpt_ops;
diff --git a/cpt/sys.c b/cpt/sys.c
new file mode 100644
index 0000000..4051286
--- /dev/null
+++ b/cpt/sys.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Author: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/notifier.h>
+#include <linux/module.h>
+
+#include "cpt.h"
+
+MODULE_LICENSE("GPL");
+
+static int checkpoint(pid_t pid, int fd, unsigned long flags)
+{
+ return -ENOSYS;
+}
+
+static int restart(int ctid, int fd, unsigned long flags)
+{
+ return -ENOSYS;
+}
+
+static int __init init_cptrst(void)
+{
+ cpt_ops.owner = THIS_MODULE;
+ cpt_ops.checkpoint = checkpoint;
+ cpt_ops.restart = restart;
+ return 0;
+}
+module_init(init_cptrst);
+
+static void __exit exit_cptrst(void)
+{
+ cpt_ops.checkpoint = NULL;
+ cpt_ops.restart = NULL;
+ cpt_ops.owner = NULL;
+}
+module_exit(exit_cptrst);
diff --git a/cpt/sys_core.c b/cpt/sys_core.c
index 1a97fb6..5dd1191 100644
--- a/cpt/sys_core.c
+++ b/cpt/sys_core.c
@@ -13,6 +13,13 @@
#include <linux/sched.h>
#include <linux/fs.h>
#include <linux/file.h>
+#include <linux/notifier.h>
+#include <linux/module.h>
+
+#include "cpt.h"
+
+struct cpt_operations cpt_ops = { NULL, NULL, NULL };
+EXPORT_SYMBOL(cpt_ops);
/**
* sys_checkpoint - checkpoint a container from outside
@@ -23,7 +30,16 @@
*/
asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
{
- return -ENOSYS;
+ int ret;
+
+ ret = -ENOSYS;
+
+ if (try_module_get(cpt_ops.owner)) {
+ if (cpt_ops.checkpoint)
+ ret = cpt_ops.checkpoint(pid, fd, flags);
+ module_put(cpt_ops.owner);
+ }
+ return ret;
}
/**
@@ -34,5 +50,14 @@ asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
*/
asmlinkage long sys_restart(int ctid, int fd, unsigned long flags)
{
- return -ENOSYS;
+ int ret;
+
+ ret = -ENOSYS;
+
+ if (try_module_get(cpt_ops.owner)) {
+ if (cpt_ops.restart)
+ ret = cpt_ops.restart(ctid, fd, flags);
+ module_put(cpt_ops.owner);
+ }
+ return ret;
}
diff --git a/init/Kconfig b/init/Kconfig
index 4bd4b0c..d29ed21 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -344,6 +344,8 @@ config CGROUP_FREEZER
Provides a way to freeze and unfreeze all tasks in a
cgroup
+source "cpt/Kconfig"
+
config FAIR_GROUP_SCHED
bool "Group scheduling for SCHED_OTHER"
depends on GROUP_SCHED
--
1.5.6
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH 3/9] Introduce context structure needed during checkpointing/restart
2008-09-03 10:57 ` [PATCH 2/9] Make checkpoint/restart functionality modular Andrey Mirkin
@ 2008-09-03 10:57 ` Andrey Mirkin
2008-09-03 10:57 ` [PATCH 4/9] Introduce container dump function Andrey Mirkin
` (3 more replies)
2008-09-03 14:27 ` [PATCH 2/9] Make checkpoint/restart functionality modular Serge E. Hallyn
1 sibling, 4 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
Add functions for context allocation/destroy.
Introduce functions to read/write image.
Introduce image header and object header.
Signed-off-by: Andrey Mirkin <major@openvz.org>
---
cpt/cpt.h | 37 ++++++++++++++++
cpt/cpt_image.h | 63 +++++++++++++++++++++++++++
cpt/sys.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 224 insertions(+), 3 deletions(-)
create mode 100644 cpt/cpt_image.h
diff --git a/cpt/cpt.h b/cpt/cpt.h
index 381a9bf..607ac1b 100644
--- a/cpt/cpt.h
+++ b/cpt/cpt.h
@@ -10,6 +10,8 @@
*
*/
+#include "cpt_image.h"
+
struct cpt_operations
{
struct module * owner;
@@ -17,3 +19,38 @@ struct cpt_operations
int (*restart)(int ctid, int fd, unsigned long flags);
};
extern struct cpt_operations cpt_ops;
+
+#define CPT_CTX_ERROR -1
+#define CPT_CTX_IDLE 0
+#define CPT_CTX_DUMPING 1
+#define CPT_CTX_UNDUMPING 2
+
+typedef struct cpt_context
+{
+ pid_t pid; /* should be changed to ctid later */
+ int ctx_id; /* context id */
+ struct list_head ctx_list;
+ int refcount;
+ int ctx_state;
+ struct semaphore main_sem;
+
+ int errno;
+
+ struct file *file;
+ loff_t current_object;
+
+ struct list_head object_array[CPT_OBJ_MAX];
+
+ int (*write)(const void *addr, size_t count, struct cpt_context *ctx);
+ int (*read)(void *addr, size_t count, struct cpt_context *ctx);
+} cpt_context_t;
+
+extern int debug_level;
+
+#define cpt_printk(lvl, fmt, args...) do { \
+ if (lvl <= debug_level) \
+ printk(fmt, ##args); \
+ } while (0)
+
+#define eprintk(a...) cpt_printk(1, "CPT ERR: " a)
+#define dprintk(a...) cpt_printk(1, "CPT DBG: " a)
diff --git a/cpt/cpt_image.h b/cpt/cpt_image.h
new file mode 100644
index 0000000..3d26229
--- /dev/null
+++ b/cpt/cpt_image.h
@@ -0,0 +1,63 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Author: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#ifndef __CPT_IMAGE_H_
+#define __CPT_IMAGE_H_ 1
+
+enum _cpt_object_type
+{
+ CPT_OBJ_TASK = 0,
+ CPT_OBJ_MAX,
+ /* The objects above are stored in memory while checkpointing */
+
+ CPT_OBJ_HEAD = 1024,
+};
+
+enum _cpt_content_type {
+ CPT_CONTENT_VOID,
+ CPT_CONTENT_ARRAY,
+ CPT_CONTENT_DATA,
+ CPT_CONTENT_NAME,
+ CPT_CONTENT_REF,
+ CPT_CONTENT_MAX
+};
+
+#define CPT_SIGNATURE0 0x79
+#define CPT_SIGNATURE1 0x1c
+#define CPT_SIGNATURE2 0x01
+#define CPT_SIGNATURE3 0x63
+
+struct cpt_head
+{
+ __u8 cpt_signature[4]; /* Magic number */
+ __u32 cpt_hdrlen; /* Header length */
+ __u16 cpt_image_major; /* Format of this file */
+ __u16 cpt_image_minor; /* Format of this file */
+ __u16 cpt_image_sublevel; /* Format of this file */
+ __u16 cpt_image_extra; /* Format of this file */
+ __u16 cpt_arch; /* Architecture */
+ __u16 cpt_pad1;
+ __u32 cpt_pad2;
+#define CPT_ARCH_I386 0
+ __u64 cpt_time; /* Time */
+} __attribute__ ((aligned (8)));
+
+/* Common object header. */
+struct cpt_object_hdr
+{
+ __u64 cpt_len; /* Size of current chunk of data */
+ __u16 cpt_type; /* Type of object */
+ __u32 cpt_hdrlen; /* Size of header */
+ __u16 cpt_content; /* Content type: array, reference... */
+} __attribute__ ((aligned (8)));
+
+#endif /* __CPT_IMAGE_H_ */
diff --git a/cpt/sys.c b/cpt/sys.c
index 4051286..8334c4c 100644
--- a/cpt/sys.c
+++ b/cpt/sys.c
@@ -13,21 +13,142 @@
#include <linux/sched.h>
#include <linux/fs.h>
#include <linux/file.h>
-#include <linux/notifier.h>
+#include <linux/uaccess.h>
#include <linux/module.h>
#include "cpt.h"
+#include "cpt_image.h"
MODULE_LICENSE("GPL");
+/* Debug level, constant for now */
+int debug_level = 1;
+
+static int file_write(const void *addr, size_t count, struct cpt_context *ctx)
+{
+ mm_segment_t oldfs;
+ ssize_t err = -EBADF;
+ struct file *file = ctx->file;
+
+ oldfs = get_fs(); set_fs(KERNEL_DS);
+ if (file)
+ err = file->f_op->write(file, addr, count, &file->f_pos);
+ set_fs(oldfs);
+ if (err != count)
+ return err >= 0 ? -EIO : err;
+ return 0;
+}
+
+static int file_read(void *addr, size_t count, struct cpt_context *ctx)
+{
+ mm_segment_t oldfs;
+ ssize_t err = -EBADF;
+ struct file *file = ctx->file;
+
+ oldfs = get_fs(); set_fs(KERNEL_DS);
+ if (file)
+ err = file->f_op->read(file, addr, count, &file->f_pos);
+ set_fs(oldfs);
+ if (err != count)
+ return err >= 0 ? -EIO : err;
+ return 0;
+}
+
+struct cpt_context * context_alloc(void)
+{
+ struct cpt_context *ctx;
+ int i;
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx)
+ return NULL;
+
+ init_MUTEX(&ctx->main_sem);
+ ctx->refcount = 1;
+
+ ctx->current_object = -1;
+ ctx->write = file_write;
+ ctx->read = file_read;
+ for (i = 0; i < CPT_OBJ_MAX; i++) {
+ INIT_LIST_HEAD(&ctx->object_array[i]);
+ }
+
+ return ctx;
+}
+
+void context_release(struct cpt_context *ctx)
+{
+ ctx->ctx_state = CPT_CTX_ERROR;
+
+ if (ctx->file)
+ fput(ctx->file);
+ kfree(ctx);
+}
+
+static void context_put(struct cpt_context *ctx)
+{
+ if (!--ctx->refcount)
+ context_release(ctx);
+}
+
static int checkpoint(pid_t pid, int fd, unsigned long flags)
{
- return -ENOSYS;
+ struct file *file;
+ struct cpt_context *ctx;
+ int err;
+
+ err = -EBADF;
+ file = fget(fd);
+ if (!file)
+ goto out;
+
+ err = -ENOMEM;
+ ctx = context_alloc();
+ if (!ctx)
+ goto out_file;
+
+ ctx->file = file;
+ ctx->ctx_state = CPT_CTX_DUMPING;
+
+ /* checkpoint */
+ err = -ENOSYS;
+
+ context_put(ctx);
+
+out_file:
+ fput(file);
+out:
+ return err;
}
static int restart(int ctid, int fd, unsigned long flags)
{
- return -ENOSYS;
+ struct file *file;
+ struct cpt_context *ctx;
+ int err;
+
+ err = -EBADF;
+ file = fget(fd);
+ if (!file)
+ goto out;
+
+ err = -ENOMEM;
+ ctx = context_alloc();
+ if (!ctx)
+ goto out_file;
+
+ ctx->file = file;
+ ctx->ctx_state = CPT_CTX_UNDUMPING;
+
+ /* restart */
+ err = -ENOSYS;
+
+ context_put(ctx);
+
+out_file:
+ fput(file);
+out:
+ return err;
}
static int __init init_cptrst(void)
--
1.5.6
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH 4/9] Introduce container dump function
2008-09-03 10:57 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Andrey Mirkin
@ 2008-09-03 10:57 ` Andrey Mirkin
2008-09-03 10:57 ` [PATCH 5/9] Introduce function to dump process Andrey Mirkin
2008-09-03 14:23 ` [PATCH 4/9] Introduce container dump function Serge E. Hallyn
2008-09-03 12:29 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Matthieu Fertré
` (2 subsequent siblings)
3 siblings, 2 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
Actually right now we are going to dump only one process.
Function for dumping head of image file are added.
Signed-off-by: Andrey Mirkin <major@openvz.org>
---
cpt/Makefile | 2 +-
cpt/checkpoint.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
cpt/cpt.h | 3 ++
cpt/sys.c | 3 +-
kernel/fork.c | 2 +
5 files changed, 82 insertions(+), 2 deletions(-)
create mode 100644 cpt/checkpoint.c
diff --git a/cpt/Makefile b/cpt/Makefile
index bfe75d5..173346b 100644
--- a/cpt/Makefile
+++ b/cpt/Makefile
@@ -2,4 +2,4 @@ obj-y += sys_core.o
obj-$(CONFIG_CHECKPOINT) += cptrst.o
-cptrst-objs := sys.o
+cptrst-objs := sys.o checkpoint.o
diff --git a/cpt/checkpoint.c b/cpt/checkpoint.c
new file mode 100644
index 0000000..b4d9686
--- /dev/null
+++ b/cpt/checkpoint.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Author: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/version.h>
+
+#include "cpt.h"
+
+static int cpt_write_head(struct cpt_context *ctx)
+{
+ struct cpt_head hdr;
+
+ memset(&hdr, 0, sizeof(hdr));
+ hdr.cpt_signature[0] = CPT_SIGNATURE0;
+ hdr.cpt_signature[1] = CPT_SIGNATURE1;
+ hdr.cpt_signature[2] = CPT_SIGNATURE2;
+ hdr.cpt_signature[3] = CPT_SIGNATURE3;
+ hdr.cpt_hdrlen = sizeof(hdr);
+ hdr.cpt_image_major = (LINUX_VERSION_CODE >> 16) & 0xff;
+ hdr.cpt_image_minor = (LINUX_VERSION_CODE >> 8) & 0xff;
+ hdr.cpt_image_sublevel = (LINUX_VERSION_CODE) & 0xff;
+ hdr.cpt_image_extra = 0;
+#if defined(CONFIG_X86_32)
+ hdr.cpt_arch = CPT_ARCH_I386;
+#else
+#error Arch is not supported
+#endif
+ return ctx->write(&hdr, sizeof(hdr), ctx);
+}
+
+int dump_container(struct cpt_context *ctx)
+{
+ int err;
+ struct task_struct *root;
+
+ read_lock(&tasklist_lock);
+ root = find_task_by_vpid(ctx->pid);
+ if (root)
+ get_task_struct(root);
+ read_unlock(&tasklist_lock);
+
+ err = -ESRCH;
+ if (!root) {
+ eprintk("can not find root task\n");
+ return err;
+ }
+ ctx->nsproxy = root->nsproxy;
+ if (!ctx->nsproxy) {
+ eprintk("nsproxy is null\n");
+ goto out;
+ }
+
+ err = cpt_write_head(ctx);
+
+ /* Dump task here */
+ if (!err)
+ err = -ENOSYS;
+
+out:
+ ctx->nsproxy = NULL;
+ put_task_struct(root);
+ return err;
+}
diff --git a/cpt/cpt.h b/cpt/cpt.h
index 607ac1b..b421a11 100644
--- a/cpt/cpt.h
+++ b/cpt/cpt.h
@@ -33,6 +33,7 @@ typedef struct cpt_context
int refcount;
int ctx_state;
struct semaphore main_sem;
+ struct nsproxy *nsproxy;
int errno;
@@ -54,3 +55,5 @@ extern int debug_level;
#define eprintk(a...) cpt_printk(1, "CPT ERR: " a)
#define dprintk(a...) cpt_printk(1, "CPT DBG: " a)
+
+int dump_container(struct cpt_context *ctx);
diff --git a/cpt/sys.c b/cpt/sys.c
index 8334c4c..6801c22 100644
--- a/cpt/sys.c
+++ b/cpt/sys.c
@@ -109,9 +109,10 @@ static int checkpoint(pid_t pid, int fd, unsigned long flags)
ctx->file = file;
ctx->ctx_state = CPT_CTX_DUMPING;
+ ctx->pid = pid;
/* checkpoint */
- err = -ENOSYS;
+ err = dump_container(ctx);
context_put(ctx);
diff --git a/kernel/fork.c b/kernel/fork.c
index 52b5037..f38b43d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -77,6 +77,7 @@ int max_threads; /* tunable limit on nr_threads */
DEFINE_PER_CPU(unsigned long, process_counts) = 0;
__cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
+EXPORT_SYMBOL(tasklist_lock);
int nr_processes(void)
{
@@ -153,6 +154,7 @@ void __put_task_struct(struct task_struct *tsk)
if (!profile_handoff_task(tsk))
free_task(tsk);
}
+EXPORT_SYMBOL(__put_task_struct);
/*
* macro override instead of weak attribute alias, to workaround
--
1.5.6
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH 5/9] Introduce function to dump process
2008-09-03 10:57 ` [PATCH 4/9] Introduce container dump function Andrey Mirkin
@ 2008-09-03 10:57 ` Andrey Mirkin
2008-09-03 10:57 ` [PATCH 6/9] Introduce functions to dump mm Andrey Mirkin
2008-09-03 14:23 ` [PATCH 4/9] Introduce container dump function Serge E. Hallyn
1 sibling, 1 reply; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
Functions to dump task struct, fpu state and registers are added.
All IDs are saved from the POV of process (container) namespace.
Signed-off-by: Andrey Mirkin <major@openvz.org>
---
cpt/Makefile | 2 +-
cpt/checkpoint.c | 2 +-
cpt/cpt.h | 1 +
cpt/cpt_image.h | 123 +++++++++++++++++++++++++++
cpt/cpt_process.c | 236 +++++++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 362 insertions(+), 2 deletions(-)
create mode 100644 cpt/cpt_process.c
diff --git a/cpt/Makefile b/cpt/Makefile
index 173346b..457cc96 100644
--- a/cpt/Makefile
+++ b/cpt/Makefile
@@ -2,4 +2,4 @@ obj-y += sys_core.o
obj-$(CONFIG_CHECKPOINT) += cptrst.o
-cptrst-objs := sys.o checkpoint.o
+cptrst-objs := sys.o checkpoint.o cpt_process.o
diff --git a/cpt/checkpoint.c b/cpt/checkpoint.c
index b4d9686..41c3523 100644
--- a/cpt/checkpoint.c
+++ b/cpt/checkpoint.c
@@ -65,7 +65,7 @@ int dump_container(struct cpt_context *ctx)
/* Dump task here */
if (!err)
- err = -ENOSYS;
+ err = cpt_dump_task(root, ctx);
out:
ctx->nsproxy = NULL;
diff --git a/cpt/cpt.h b/cpt/cpt.h
index b421a11..1bb483d 100644
--- a/cpt/cpt.h
+++ b/cpt/cpt.h
@@ -57,3 +57,4 @@ extern int debug_level;
#define dprintk(a...) cpt_printk(1, "CPT DBG: " a)
int dump_container(struct cpt_context *ctx);
+int cpt_dump_task(struct task_struct *tsk, struct cpt_context *ctx);
diff --git a/cpt/cpt_image.h b/cpt/cpt_image.h
index 3d26229..b7b68e1 100644
--- a/cpt/cpt_image.h
+++ b/cpt/cpt_image.h
@@ -13,6 +13,9 @@
#ifndef __CPT_IMAGE_H_
#define __CPT_IMAGE_H_ 1
+#include <linux/sched.h>
+#include <asm/segment.h>
+
enum _cpt_object_type
{
CPT_OBJ_TASK = 0,
@@ -20,6 +23,8 @@ enum _cpt_object_type
/* The objects above are stored in memory while checkpointing */
CPT_OBJ_HEAD = 1024,
+ CPT_OBJ_X86_REGS,
+ CPT_OBJ_BITS,
};
enum _cpt_content_type {
@@ -28,6 +33,8 @@ enum _cpt_content_type {
CPT_CONTENT_DATA,
CPT_CONTENT_NAME,
CPT_CONTENT_REF,
+ CPT_CONTENT_X86_FPUSTATE,
+ CPT_CONTENT_X86_FPUSTATE_OLD,
CPT_CONTENT_MAX
};
@@ -60,4 +67,120 @@ struct cpt_object_hdr
__u16 cpt_content; /* Content type: array, reference... */
} __attribute__ ((aligned (8)));
+struct cpt_task_image {
+ __u64 cpt_len;
+ __u16 cpt_type;
+ __u32 cpt_hdrlen;
+ __u16 cpt_content;
+
+ __u64 cpt_state;
+ __u64 cpt_flags;
+#define CPT_PF_EXITING 0
+#define CPT_PF_FORKNOEXEC 1
+#define CPT_PF_SUPERPRIV 2
+#define CPT_PF_DUMPCORE 3
+#define CPT_PF_SIGNALED 4
+#define CPT_PF_USED_MATH 5
+
+ __u64 cpt_thrflags;
+ __u64 cpt_thrstatus;
+ __u32 cpt_pid;
+ __u32 cpt_tgid;
+ __u32 cpt_ppid;
+ __u32 cpt_rppid;
+ __u32 cpt_pgrp;
+ __u32 cpt_session;
+ __u32 cpt_old_pgrp;
+ __u32 cpt_leader;
+ __u64 cpt_set_tid;
+ __u64 cpt_clear_tid;
+ __u32 cpt_exit_code;
+ __u32 cpt_exit_signal;
+ __u32 cpt_pdeath_signal;
+ __u32 cpt_user;
+ __u32 cpt_uid;
+ __u32 cpt_euid;
+ __u32 cpt_suid;
+ __u32 cpt_fsuid;
+ __u32 cpt_gid;
+ __u32 cpt_egid;
+ __u32 cpt_sgid;
+ __u32 cpt_fsgid;
+ __u8 cpt_comm[TASK_COMM_LEN];
+ __u64 cpt_tls[GDT_ENTRY_TLS_ENTRIES];
+ __u64 cpt_utime;
+ __u64 cpt_stime;
+ __u64 cpt_utimescaled;
+ __u64 cpt_stimescaled;
+ __u64 cpt_gtime;
+ __u64 cpt_prev_utime;
+ __u64 cpt_prev_stime;
+ __u64 cpt_start_time;
+ __u64 cpt_real_start_time;
+ __u64 cpt_nvcsw;
+ __u64 cpt_nivcsw;
+ __u64 cpt_min_flt;
+ __u64 cpt_maj_flt;
+} __attribute__ ((aligned (8)));
+
+struct cpt_obj_bits
+{
+ __u64 cpt_len;
+ __u16 cpt_type;
+ __u32 cpt_hdrlen;
+ __u16 cpt_content;
+
+ __u32 cpt_size;
+ __u32 __cpt_pad1;
+} __attribute__ ((aligned (8)));
+
+#define CPT_SEG_ZERO 0
+#define CPT_SEG_TLS1 1
+#define CPT_SEG_TLS2 2
+#define CPT_SEG_TLS3 3
+#define CPT_SEG_USER32_DS 4
+#define CPT_SEG_USER32_CS 5
+#define CPT_SEG_USER64_DS 6
+#define CPT_SEG_USER64_CS 7
+#define CPT_SEG_LDT 256
+
+struct cpt_x86_regs
+{
+ __u64 cpt_len;
+ __u16 cpt_type;
+ __u32 cpt_hdrlen;
+ __u16 cpt_content;
+
+ __u32 cpt_debugreg[8];
+ __u32 cpt_gs;
+
+ __u32 cpt_bx;
+ __u32 cpt_cx;
+ __u32 cpt_dx;
+ __u32 cpt_si;
+ __u32 cpt_di;
+ __u32 cpt_bp;
+ __u32 cpt_ax;
+ __u32 cpt_ds;
+ __u32 cpt_es;
+ __u32 cpt_fs;
+ __u32 cpt_orig_ax;
+ __u32 cpt_ip;
+ __u32 cpt_cs;
+ __u32 cpt_flags;
+ __u32 cpt_sp;
+ __u32 cpt_ss;
+} __attribute__ ((aligned (8)));
+
+static inline __u64 cpt_timespec_export(struct timespec *tv)
+{
+ return (((u64)tv->tv_sec) << 32) + tv->tv_nsec;
+}
+
+static inline void cpt_timespec_import(struct timespec *tv, __u64 val)
+{
+ tv->tv_sec = val >> 32;
+ tv->tv_nsec = (val & 0xFFFFFFFF);
+}
+
#endif /* __CPT_IMAGE_H_ */
diff --git a/cpt/cpt_process.c b/cpt/cpt_process.c
new file mode 100644
index 0000000..af4f319
--- /dev/null
+++ b/cpt/cpt_process.c
@@ -0,0 +1,236 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Author: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/version.h>
+#include <linux/nsproxy.h>
+
+#include "cpt.h"
+#include "cpt_image.h"
+
+static unsigned int encode_task_flags(unsigned int task_flags)
+{
+ unsigned int flags = 0;
+
+ if (task_flags & PF_EXITING)
+ flags |= (1 << CPT_PF_EXITING);
+ if (task_flags & PF_FORKNOEXEC)
+ flags |= (1 << CPT_PF_FORKNOEXEC);
+ if (task_flags & PF_SUPERPRIV)
+ flags |= (1 << CPT_PF_SUPERPRIV);
+ if (task_flags & PF_DUMPCORE)
+ flags |= (1 << CPT_PF_DUMPCORE);
+ if (task_flags & PF_SIGNALED)
+ flags |= (1 << CPT_PF_SIGNALED);
+ if (task_flags & PF_USED_MATH)
+ flags |= (1 << CPT_PF_USED_MATH);
+
+ return flags;
+
+}
+
+int cpt_dump_task_struct(struct task_struct *tsk, struct cpt_context *ctx)
+{
+ struct cpt_task_image *t;
+ int i;
+ int err;
+
+ t = kzalloc(sizeof(*t), GFP_KERNEL);
+ if (!t)
+ return -ENOMEM;
+
+ t->cpt_len = sizeof(*t);
+ t->cpt_type = CPT_OBJ_TASK;
+ t->cpt_hdrlen = sizeof(*t);
+ t->cpt_content = CPT_CONTENT_ARRAY;
+
+ t->cpt_state = tsk->state;
+ t->cpt_flags = encode_task_flags(tsk->flags);
+ t->cpt_exit_code = tsk->exit_code;
+ t->cpt_exit_signal = tsk->exit_signal;
+ t->cpt_pdeath_signal = tsk->pdeath_signal;
+ t->cpt_pid = task_pid_nr_ns(tsk, ctx->nsproxy->pid_ns);
+ t->cpt_tgid = task_tgid_nr_ns(tsk, ctx->nsproxy->pid_ns);
+ t->cpt_ppid = tsk->parent ?
+ task_pid_nr_ns(tsk->parent, ctx->nsproxy->pid_ns) : 0;
+ t->cpt_rppid = tsk->real_parent ?
+ task_pid_nr_ns(tsk->real_parent, ctx->nsproxy->pid_ns) : 0;
+ t->cpt_pgrp = task_pgrp_nr_ns(tsk, ctx->nsproxy->pid_ns);
+ t->cpt_session = task_session_nr_ns(tsk, ctx->nsproxy->pid_ns);
+ t->cpt_old_pgrp = 0;
+ if (tsk->signal->tty_old_pgrp)
+ t->cpt_old_pgrp = pid_vnr(tsk->signal->tty_old_pgrp);
+ t->cpt_leader = tsk->group_leader ? task_pid_vnr(tsk->group_leader) : 0;
+ t->cpt_utime = tsk->utime;
+ t->cpt_stime = tsk->stime;
+ t->cpt_utimescaled = tsk->utimescaled;
+ t->cpt_stimescaled = tsk->stimescaled;
+ t->cpt_gtime = tsk->gtime;
+ t->cpt_prev_utime = tsk->prev_utime;
+ t->cpt_prev_stime = tsk->prev_stime;
+ t->cpt_nvcsw = tsk->nvcsw;
+ t->cpt_nivcsw = tsk->nivcsw;
+ t->cpt_start_time = cpt_timespec_export(&tsk->start_time);
+ t->cpt_real_start_time = cpt_timespec_export(&tsk->real_start_time);
+ t->cpt_min_flt = tsk->min_flt;
+ t->cpt_maj_flt = tsk->maj_flt;
+ memcpy(t->cpt_comm, tsk->comm, TASK_COMM_LEN);
+ for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++) {
+ t->cpt_tls[i] = (((u64)tsk->thread.tls_array[i].b) << 32) +
+ tsk->thread.tls_array[i].a;
+ }
+ /* TODO: encode thread flags and status like task flags */
+ t->cpt_thrflags = task_thread_info(tsk)->flags & ~(1<<TIF_FREEZE);
+ t->cpt_thrstatus = task_thread_info(tsk)->status;
+ t->cpt_user = tsk->user->uid;
+ t->cpt_uid = tsk->uid;
+ t->cpt_euid = tsk->euid;
+ t->cpt_suid = tsk->suid;
+ t->cpt_fsuid = tsk->fsuid;
+ t->cpt_gid = tsk->gid;
+ t->cpt_egid = tsk->egid;
+ t->cpt_sgid = tsk->sgid;
+ t->cpt_fsgid = tsk->fsgid;
+
+ err = ctx->write(t, sizeof(*t), ctx);
+
+ kfree(t);
+ return err;
+}
+
+static int cpt_dump_fpustate(struct task_struct *tsk, struct cpt_context *ctx)
+{
+ struct cpt_obj_bits hdr;
+ int err;
+ int content;
+ unsigned long size;
+
+ content = CPT_CONTENT_X86_FPUSTATE;
+ size = sizeof(struct i387_fxsave_struct);
+#ifndef CONFIG_X86_64
+ if (!cpu_has_fxsr) {
+ size = sizeof(struct i387_fsave_struct);
+ content = CPT_CONTENT_X86_FPUSTATE_OLD;
+ }
+#endif
+
+ hdr.cpt_len = sizeof(hdr) + size;
+ hdr.cpt_type = CPT_OBJ_BITS;
+ hdr.cpt_hdrlen = sizeof(hdr);
+ hdr.cpt_content = content;
+ hdr.cpt_size = size;
+ err = ctx->write(&hdr, sizeof(hdr), ctx);
+ if (!err)
+ ctx->write(tsk->thread.xstate, size, ctx);
+ return err;
+}
+
+static u32 encode_segment(u32 segreg)
+{
+ segreg &= 0xFFFF;
+
+ if (segreg == 0)
+ return CPT_SEG_ZERO;
+ if ((segreg & 3) != 3) {
+ eprintk("Invalid RPL of a segment reg %x\n", segreg);
+ return CPT_SEG_ZERO;
+ }
+
+ /* LDT descriptor, it is just an index to LDT array */
+ if (segreg & 4)
+ return CPT_SEG_LDT + (segreg >> 3);
+
+ /* TLS descriptor. */
+ if ((segreg >> 3) >= GDT_ENTRY_TLS_MIN &&
+ (segreg >> 3) <= GDT_ENTRY_TLS_MAX)
+ return CPT_SEG_TLS1 + ((segreg>>3) - GDT_ENTRY_TLS_MIN);
+
+ /* One of standard desriptors */
+#ifdef CONFIG_X86_64
+ if (segreg == __USER32_DS)
+ return CPT_SEG_USER32_DS;
+ if (segreg == __USER32_CS)
+ return CPT_SEG_USER32_CS;
+ if (segreg == __USER_DS)
+ return CPT_SEG_USER64_DS;
+ if (segreg == __USER_CS)
+ return CPT_SEG_USER64_CS;
+#else
+ if (segreg == __USER_DS)
+ return CPT_SEG_USER32_DS;
+ if (segreg == __USER_CS)
+ return CPT_SEG_USER32_CS;
+#endif
+ eprintk("Invalid segment reg %x\n", segreg);
+ return CPT_SEG_ZERO;
+}
+
+static int cpt_dump_registers(struct task_struct *tsk, struct cpt_context *ctx)
+{
+ struct cpt_x86_regs ri;
+ struct pt_regs *pt_regs;
+
+ ri.cpt_len = sizeof(ri);
+ ri.cpt_type = CPT_OBJ_X86_REGS;
+ ri.cpt_hdrlen = sizeof(ri);
+ ri.cpt_content = CPT_CONTENT_VOID;
+
+ ri.cpt_debugreg[0] = tsk->thread.debugreg0;
+ ri.cpt_debugreg[1] = tsk->thread.debugreg1;
+ ri.cpt_debugreg[2] = tsk->thread.debugreg2;
+ ri.cpt_debugreg[3] = tsk->thread.debugreg3;
+ ri.cpt_debugreg[4] = 0;
+ ri.cpt_debugreg[5] = 0;
+ ri.cpt_debugreg[6] = tsk->thread.debugreg6;
+ ri.cpt_debugreg[7] = tsk->thread.debugreg7;
+
+ pt_regs = task_pt_regs(tsk);
+
+ ri.cpt_fs = encode_segment(pt_regs->fs);
+ ri.cpt_gs = encode_segment(tsk->thread.gs);
+
+ ri.cpt_bx = pt_regs->bx;
+ ri.cpt_cx = pt_regs->cx;
+ ri.cpt_dx = pt_regs->dx;
+ ri.cpt_si = pt_regs->si;
+ ri.cpt_di = pt_regs->di;
+ ri.cpt_bp = pt_regs->bp;
+ ri.cpt_ax = pt_regs->ax;
+ ri.cpt_ds = encode_segment(pt_regs->ds);
+ ri.cpt_es = encode_segment(pt_regs->es);
+ ri.cpt_orig_ax = pt_regs->orig_ax;
+ ri.cpt_ip = pt_regs->ip;
+ ri.cpt_cs = encode_segment(pt_regs->cs);
+ ri.cpt_flags = pt_regs->flags;
+ ri.cpt_sp = pt_regs->sp;
+ ri.cpt_ss = encode_segment(pt_regs->ss);
+
+ return ctx->write(&ri, sizeof(ri), ctx);
+}
+
+int cpt_dump_task(struct task_struct *tsk, struct cpt_context *ctx)
+{
+ int err;
+
+ err = cpt_dump_task_struct(tsk, ctx);
+
+ /* Dump task mm */
+
+ if (!err)
+ cpt_dump_fpustate(tsk, ctx);
+ if (!err)
+ cpt_dump_registers(tsk, ctx);
+
+ return err;
+}
--
1.5.6
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH 6/9] Introduce functions to dump mm
2008-09-03 10:57 ` [PATCH 5/9] Introduce function to dump process Andrey Mirkin
@ 2008-09-03 10:57 ` Andrey Mirkin
2008-09-03 10:57 ` [PATCH 7/9] Introduce function for restarting a container Andrey Mirkin
2008-09-03 14:17 ` [PATCH 6/9] Introduce functions to dump mm Louis Rilling
0 siblings, 2 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
Functions to dump mm struct, VMAs and mm context are added.
Signed-off-by: Andrey Mirkin <major@openvz.org>
---
arch/x86/mm/hugetlbpage.c | 2 +
cpt/Makefile | 2 +-
cpt/cpt.h | 1 +
cpt/cpt_image.h | 61 +++++++
cpt/cpt_mm.c | 431 +++++++++++++++++++++++++++++++++++++++++++++
cpt/cpt_process.c | 8 +-
mm/memory.c | 1 +
7 files changed, 497 insertions(+), 5 deletions(-)
create mode 100644 cpt/cpt_mm.c
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 8f307d9..63028e7 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <linux/err.h>
#include <linux/sysctl.h>
+#include <linux/module.h>
#include <asm/mman.h>
#include <asm/tlb.h>
#include <asm/tlbflush.h>
@@ -221,6 +222,7 @@ int pmd_huge(pmd_t pmd)
{
return !!(pmd_val(pmd) & _PAGE_PSE);
}
+EXPORT_SYMBOL(pmd_huge);
int pud_huge(pud_t pud)
{
diff --git a/cpt/Makefile b/cpt/Makefile
index 457cc96..bbb0e37 100644
--- a/cpt/Makefile
+++ b/cpt/Makefile
@@ -2,4 +2,4 @@ obj-y += sys_core.o
obj-$(CONFIG_CHECKPOINT) += cptrst.o
-cptrst-objs := sys.o checkpoint.o cpt_process.o
+cptrst-objs := sys.o checkpoint.o cpt_process.o cpt_mm.o
diff --git a/cpt/cpt.h b/cpt/cpt.h
index 1bb483d..73ae296 100644
--- a/cpt/cpt.h
+++ b/cpt/cpt.h
@@ -58,3 +58,4 @@ extern int debug_level;
int dump_container(struct cpt_context *ctx);
int cpt_dump_task(struct task_struct *tsk, struct cpt_context *ctx);
+int cpt_dump_mm(struct task_struct *tsk, struct cpt_context *ctx);
diff --git a/cpt/cpt_image.h b/cpt/cpt_image.h
index b7b68e1..ae019e7 100644
--- a/cpt/cpt_image.h
+++ b/cpt/cpt_image.h
@@ -16,13 +16,19 @@
#include <linux/sched.h>
#include <asm/segment.h>
+#define CPT_NULL (~0ULL)
+
enum _cpt_object_type
{
CPT_OBJ_TASK = 0,
+ CPT_OBJ_MM,
CPT_OBJ_MAX,
/* The objects above are stored in memory while checkpointing */
CPT_OBJ_HEAD = 1024,
+ CPT_OBJ_VMA,
+ CPT_OBJ_PAGES,
+ CPT_OBJ_NAME,
CPT_OBJ_X86_REGS,
CPT_OBJ_BITS,
};
@@ -35,6 +41,7 @@ enum _cpt_content_type {
CPT_CONTENT_REF,
CPT_CONTENT_X86_FPUSTATE,
CPT_CONTENT_X86_FPUSTATE_OLD,
+ CPT_CONTENT_MM_CONTEXT,
CPT_CONTENT_MAX
};
@@ -123,6 +130,60 @@ struct cpt_task_image {
__u64 cpt_maj_flt;
} __attribute__ ((aligned (8)));
+struct cpt_mm_image {
+ __u64 cpt_len;
+ __u16 cpt_type;
+ __u32 cpt_hdrlen;
+ __u16 cpt_content;
+
+ __u64 cpt_start_code;
+ __u64 cpt_end_code;
+ __u64 cpt_start_data;
+ __u64 cpt_end_data;
+ __u64 cpt_start_brk;
+ __u64 cpt_brk;
+ __u64 cpt_start_stack;
+ __u64 cpt_start_arg;
+ __u64 cpt_end_arg;
+ __u64 cpt_start_env;
+ __u64 cpt_end_env;
+ __u64 cpt_def_flags;
+ __u64 cpt_flags;
+ __u64 cpt_map_count;
+} __attribute__ ((aligned (8)));
+
+struct cpt_vma_image
+{
+ __u64 cpt_len;
+ __u16 cpt_type;
+ __u32 cpt_hdrlen;
+ __u16 cpt_content;
+
+ __u64 cpt_file;
+ __u32 cpt_vma_type;
+#define CPT_VMA_TYPE_0 0
+#define CPT_VMA_FILE 1
+ __u32 cpt_pad;
+
+ __u64 cpt_start;
+ __u64 cpt_end;
+ __u64 cpt_flags;
+ __u64 cpt_pgprot;
+ __u64 cpt_pgoff;
+ __u64 cpt_page_num;
+} __attribute__ ((aligned (8)));
+
+struct cpt_page_block
+{
+ __u64 cpt_len;
+ __u16 cpt_type;
+ __u32 cpt_hdrlen;
+ __u16 cpt_content;
+
+ __u64 cpt_start;
+ __u64 cpt_end;
+} __attribute__ ((aligned (8)));
+
struct cpt_obj_bits
{
__u64 cpt_len;
diff --git a/cpt/cpt_mm.c b/cpt/cpt_mm.c
new file mode 100644
index 0000000..6e025cc
--- /dev/null
+++ b/cpt/cpt_mm.c
@@ -0,0 +1,431 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Authors: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/mm.h>
+#include <linux/errno.h>
+#include <linux/major.h>
+#include <linux/mman.h>
+#include <linux/mnt_namespace.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/pagemap.h>
+#include <linux/hugetlb.h>
+#include <asm/ldt.h>
+
+#include "cpt.h"
+#include "cpt_image.h"
+
+struct page_area
+{
+ int type;
+ unsigned long start;
+ unsigned long end;
+ pgoff_t pgoff;
+ loff_t mm;
+ __u64 list[16];
+};
+
+struct page_desc
+{
+ int type;
+ pgoff_t index;
+ loff_t mm;
+ int shared;
+};
+
+enum {
+ PD_ABSENT,
+ PD_COPY,
+ PD_FUNKEY,
+};
+
+/* 0: page can be obtained from backstore, or still not mapped anonymous page,
+ or something else, which does not requre copy.
+ 1: page requires copy
+ 2: page requres copy but its content is zero. Quite useless.
+ 3: wp page is shared after fork(). It is to be COWed when modified.
+ 4: page is something unsupported... We copy it right now.
+ */
+
+static void page_get_desc(struct vm_area_struct *vma, unsigned long addr,
+ struct page_desc *pdesc, cpt_context_t * ctx)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ pgd_t *pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *ptep, pte;
+ spinlock_t *ptl;
+ struct page *pg = NULL;
+ pgoff_t linear_index = (addr - vma->vm_start)/PAGE_SIZE + vma->vm_pgoff;
+
+ pdesc->index = linear_index;
+ pdesc->shared = 0;
+ pdesc->mm = CPT_NULL;
+
+ if (vma->vm_flags & VM_IO) {
+ pdesc->type = PD_ABSENT;
+ return;
+ }
+
+ pgd = pgd_offset(mm, addr);
+ if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
+ goto out_absent;
+ pud = pud_offset(pgd, addr);
+ if (pud_none(*pud) || unlikely(pud_bad(*pud)))
+ goto out_absent;
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
+ goto out_absent;
+#ifdef CONFIG_X86
+ if (pmd_huge(*pmd)) {
+ eprintk("page_huge\n");
+ goto out_unsupported;
+ }
+#endif
+ ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
+ pte = *ptep;
+ pte_unmap(ptep);
+
+ if (pte_none(pte))
+ goto out_absent_unlock;
+
+ if ((pg = vm_normal_page(vma, addr, pte)) == NULL) {
+ pdesc->type = PD_COPY;
+ goto out_unlock;
+ }
+
+ get_page(pg);
+ spin_unlock(ptl);
+
+ if (pg->mapping && !PageAnon(pg)) {
+ if (vma->vm_file == NULL) {
+ eprintk("pg->mapping!=NULL for fileless vma: %08lx\n", addr);
+ goto out_unsupported;
+ }
+ if (vma->vm_file->f_mapping != pg->mapping) {
+ eprintk("pg->mapping!=f_mapping: %08lx %p %p\n",
+ addr, vma->vm_file->f_mapping, pg->mapping);
+ goto out_unsupported;
+ }
+ pdesc->index = (pg->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT));
+ /* Page is in backstore. For us it is like
+ * it is not present.
+ */
+ goto out_absent;
+ }
+
+ if (PageReserved(pg)) {
+ /* Special case: ZERO_PAGE is used, when an
+ * anonymous page is accessed but not written. */
+ if (pg == ZERO_PAGE(addr)) {
+ if (pte_write(pte)) {
+ eprintk("not funny already, writable ZERO_PAGE\n");
+ goto out_unsupported;
+ }
+ goto out_absent;
+ }
+ eprintk("reserved page %lu at %08lx\n", pg->index, addr);
+ goto out_unsupported;
+ }
+
+ if (!pg->mapping) {
+ eprintk("page without mapping at %08lx\n", addr);
+ goto out_unsupported;
+ }
+
+ pdesc->type = PD_COPY;
+
+out_put:
+ if (pg)
+ put_page(pg);
+ return;
+
+out_unlock:
+ spin_unlock(ptl);
+ goto out_put;
+
+out_absent_unlock:
+ spin_unlock(ptl);
+
+out_absent:
+ pdesc->type = PD_ABSENT;
+ goto out_put;
+
+out_unsupported:
+ pdesc->type = PD_FUNKEY;
+ goto out_put;
+}
+
+static int count_vma_pages(struct vm_area_struct *vma, struct cpt_context *ctx)
+{
+ unsigned long addr;
+ int page_num = 0;
+
+ for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
+ struct page_desc pd;
+
+ page_get_desc(vma, addr, &pd, ctx);
+
+ if (pd.type != PD_COPY) {
+ return -EINVAL;
+ } else {
+ page_num += 1;
+ }
+
+ }
+ return page_num;
+}
+
+/* ATTN: We give "current" to get_user_pages(). This is wrong, but get_user_pages()
+ * does not really need this thing. It just stores some page fault stats there.
+ *
+ * BUG: some archs (f.e. sparc64, but not Intel*) require flush cache pages
+ * before accessing vma.
+ */
+static int dump_pages(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end, struct cpt_context *ctx)
+{
+#define MAX_PAGE_BATCH 16
+ struct page *pg[MAX_PAGE_BATCH];
+ int npages = (end - start)/PAGE_SIZE;
+ int count = 0;
+
+ while (count < npages) {
+ int copy = npages - count;
+ int n;
+
+ if (copy > MAX_PAGE_BATCH)
+ copy = MAX_PAGE_BATCH;
+ n = get_user_pages(current, vma->vm_mm, start, copy,
+ 0, 1, pg, NULL);
+ if (n == copy) {
+ int i;
+ for (i=0; i<n; i++) {
+ char *maddr = kmap(pg[i]);
+ ctx->write(maddr, PAGE_SIZE, ctx);
+ kunmap(pg[i]);
+ }
+ } else {
+ eprintk("get_user_pages fault");
+ for ( ; n > 0; n--)
+ page_cache_release(pg[n-1]);
+ return -EFAULT;
+ }
+ start += n*PAGE_SIZE;
+ count += n;
+ for ( ; n > 0; n--)
+ page_cache_release(pg[n-1]);
+ }
+ return 0;
+}
+
+static int dump_page_block(struct vm_area_struct *vma,
+ struct cpt_page_block *pgb,
+ struct cpt_context *ctx)
+{
+ int err;
+ pgb->cpt_len = sizeof(*pgb) + pgb->cpt_end - pgb->cpt_start;
+ pgb->cpt_type = CPT_OBJ_PAGES;
+ pgb->cpt_hdrlen = sizeof(*pgb);
+ pgb->cpt_content = CPT_CONTENT_DATA;
+
+ err = ctx->write(pgb, sizeof(*pgb), ctx);
+ if (!err)
+ err = dump_pages(vma, pgb->cpt_start, pgb->cpt_end, ctx);
+
+ return err;
+}
+
+static int cpt_dump_dentry(struct path *p, cpt_context_t *ctx)
+{
+ int len;
+ char *path;
+ char *buf;
+ struct cpt_object_hdr o;
+
+ buf = (char *)__get_free_page(GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ path = d_path(p, buf, PAGE_SIZE);
+
+ if (IS_ERR(path)) {
+ free_page((unsigned long)buf);
+ return PTR_ERR(path);
+ }
+
+ len = buf + PAGE_SIZE - 1 - path;
+ o.cpt_len = sizeof(o) + len + 1;
+ o.cpt_type = CPT_OBJ_NAME;
+ o.cpt_hdrlen = sizeof(o);
+ o.cpt_content = CPT_CONTENT_NAME;
+ path[len] = 0;
+
+ ctx->write(&o, sizeof(o), ctx);
+ ctx->write(path, len + 1, ctx);
+ free_page((unsigned long)buf);
+
+ return 0;
+}
+
+static int dump_one_vma(struct mm_struct *mm,
+ struct vm_area_struct *vma, struct cpt_context *ctx)
+{
+ struct cpt_vma_image *v;
+ unsigned long addr;
+ int page_num;
+ int err;
+
+ v = kzalloc(sizeof(*v), GFP_KERNEL);
+ if (!v)
+ return -ENOMEM;
+
+ v->cpt_len = sizeof(*v);
+ v->cpt_type = CPT_OBJ_VMA;
+ v->cpt_hdrlen = sizeof(*v);
+ v->cpt_content = CPT_CONTENT_ARRAY;
+
+ v->cpt_start = vma->vm_start;
+ v->cpt_end = vma->vm_end;
+ v->cpt_flags = vma->vm_flags;
+ if (vma->vm_flags & VM_HUGETLB) {
+ eprintk("huge TLB VMAs are still not supported\n");
+ kfree(v);
+ return -EINVAL;
+ }
+ v->cpt_pgprot = vma->vm_page_prot.pgprot;
+ v->cpt_pgoff = vma->vm_pgoff;
+ v->cpt_file = CPT_NULL;
+ v->cpt_vma_type = CPT_VMA_TYPE_0;
+
+ page_num = count_vma_pages(vma, ctx);
+ if (page_num < 0) {
+ kfree(v);
+ return -EINVAL;
+ }
+ v->cpt_page_num = page_num;
+
+ if (vma->vm_file) {
+ v->cpt_file = 0;
+ v->cpt_vma_type = CPT_VMA_FILE;
+ }
+
+ ctx->write(v, sizeof(*v), ctx);
+ kfree(v);
+
+ if (vma->vm_file) {
+ err = cpt_dump_dentry(&vma->vm_file->f_path, ctx);
+ if (err < 0)
+ return err;
+ }
+
+ for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
+ struct page_desc pd;
+ struct cpt_page_block pgb;
+
+ page_get_desc(vma, addr, &pd, ctx);
+
+ if (pd.type == PD_FUNKEY || pd.type == PD_ABSENT) {
+ eprintk("dump_one_vma: funkey page\n");
+ return -EINVAL;
+ }
+
+ pgb.cpt_start = addr;
+ pgb.cpt_end = addr + PAGE_SIZE;
+ dump_page_block(vma, &pgb, ctx);
+ }
+
+ return 0;
+}
+
+static int cpt_dump_mm_context(struct mm_struct *mm, struct cpt_context *ctx)
+{
+#ifdef CONFIG_X86
+ if (mm->context.size) {
+ struct cpt_obj_bits b;
+ int size;
+
+ mutex_lock(&mm->context.lock);
+
+ b.cpt_type = CPT_OBJ_BITS;
+ b.cpt_len = sizeof(b);
+ b.cpt_content = CPT_CONTENT_MM_CONTEXT;
+ b.cpt_size = mm->context.size * LDT_ENTRY_SIZE;
+
+ ctx->write(&b, sizeof(b), ctx);
+
+ size = mm->context.size * LDT_ENTRY_SIZE;
+
+ ctx->write(mm->context.ldt, size, ctx);
+
+ mutex_unlock(&mm->context.lock);
+ }
+#endif
+ return 0;
+}
+
+int cpt_dump_mm(struct task_struct *tsk, struct cpt_context *ctx)
+{
+ struct mm_struct *mm = tsk->mm;
+ struct cpt_mm_image *v;
+ struct vm_area_struct *vma;
+ int err;
+
+ v = kzalloc(sizeof(*v), GFP_KERNEL);
+ if (!v)
+ return -ENOMEM;
+
+ v->cpt_len = sizeof(*v);
+ v->cpt_type = CPT_OBJ_MM;
+ v->cpt_hdrlen = sizeof(*v);
+ v->cpt_content = CPT_CONTENT_ARRAY;
+
+ v->cpt_start_code = mm->start_code;
+ v->cpt_end_code = mm->end_code;
+ v->cpt_start_data = mm->start_data;
+ v->cpt_end_data = mm->end_data;
+ v->cpt_start_brk = mm->start_brk;
+ v->cpt_brk = mm->brk;
+ v->cpt_start_stack = mm->start_stack;
+ v->cpt_start_arg = mm->arg_start;
+ v->cpt_end_arg = mm->arg_end;
+ v->cpt_start_env = mm->env_start;
+ v->cpt_end_env = mm->env_end;
+ v->cpt_def_flags = mm->def_flags;
+ v->cpt_flags = mm->flags;
+ v->cpt_map_count = mm->map_count;
+
+ err = ctx->write(v, sizeof(*v), ctx);
+ kfree(v);
+
+ if (err) {
+ eprintk("error during writing mm\n");
+ return err;
+ }
+
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ int err;
+
+ if ((err = dump_one_vma(mm, vma, ctx)) != 0)
+ return err;
+ }
+
+ if (!err)
+ err = cpt_dump_mm_context(mm, ctx);
+
+ return err;
+}
+
diff --git a/cpt/cpt_process.c b/cpt/cpt_process.c
index af4f319..7c4b981 100644
--- a/cpt/cpt_process.c
+++ b/cpt/cpt_process.c
@@ -225,12 +225,12 @@ int cpt_dump_task(struct task_struct *tsk, struct cpt_context *ctx)
err = cpt_dump_task_struct(tsk, ctx);
- /* Dump task mm */
-
if (!err)
- cpt_dump_fpustate(tsk, ctx);
+ err = cpt_dump_mm(tsk, ctx);
+ if (!err)
+ err = cpt_dump_fpustate(tsk, ctx);
if (!err)
- cpt_dump_registers(tsk, ctx);
+ err = cpt_dump_registers(tsk, ctx);
return err;
}
diff --git a/mm/memory.c b/mm/memory.c
index 1002f47..479a294 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -481,6 +481,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
out:
return pfn_to_page(pfn);
}
+EXPORT_SYMBOL(vm_normal_page);
/*
* copy one vm_area from one task to the other. Assumes the page tables
--
1.5.6
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH 7/9] Introduce function for restarting a container
2008-09-03 10:57 ` [PATCH 6/9] Introduce functions to dump mm Andrey Mirkin
@ 2008-09-03 10:57 ` Andrey Mirkin
2008-09-03 10:57 ` [PATCH 8/9] Introduce functions to restart a process Andrey Mirkin
2008-09-03 14:17 ` [PATCH 6/9] Introduce functions to dump mm Louis Rilling
1 sibling, 1 reply; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
Actually, right now this function will restart only one process.
Function to read head of dump file is introduced.
Signed-off-by: Andrey Mirkin <major@openvz.org>
---
cpt/Makefile | 2 +-
cpt/cpt.h | 1 +
cpt/restart.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
cpt/sys.c | 2 +-
4 files changed, 90 insertions(+), 2 deletions(-)
create mode 100644 cpt/restart.c
diff --git a/cpt/Makefile b/cpt/Makefile
index bbb0e37..47c7852 100644
--- a/cpt/Makefile
+++ b/cpt/Makefile
@@ -2,4 +2,4 @@ obj-y += sys_core.o
obj-$(CONFIG_CHECKPOINT) += cptrst.o
-cptrst-objs := sys.o checkpoint.o cpt_process.o cpt_mm.o
+cptrst-objs := sys.o checkpoint.o cpt_process.o cpt_mm.o restart.o
diff --git a/cpt/cpt.h b/cpt/cpt.h
index 73ae296..6471246 100644
--- a/cpt/cpt.h
+++ b/cpt/cpt.h
@@ -59,3 +59,4 @@ extern int debug_level;
int dump_container(struct cpt_context *ctx);
int cpt_dump_task(struct task_struct *tsk, struct cpt_context *ctx);
int cpt_dump_mm(struct task_struct *tsk, struct cpt_context *ctx);
+int restart_container(struct cpt_context *ctx);
diff --git a/cpt/restart.c b/cpt/restart.c
new file mode 100644
index 0000000..5770985
--- /dev/null
+++ b/cpt/restart.c
@@ -0,0 +1,87 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Author: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/version.h>
+
+#include "cpt.h"
+#include "cpt_image.h"
+
+int rst_get_object(int type, void *tmp, int size, struct cpt_context *ctx)
+{
+ int err;
+ struct cpt_object_hdr *hdr = tmp;
+ err = ctx->read(hdr, sizeof(struct cpt_object_hdr), ctx);
+ if (err)
+ return err;
+ if (type > 0 && type != hdr->cpt_type)
+ return -EINVAL;
+ if (hdr->cpt_hdrlen < sizeof(struct cpt_object_hdr))
+ return -EINVAL;
+ if (size < sizeof(struct cpt_object_hdr))
+ return -EINVAL;
+ if (hdr->cpt_len < hdr->cpt_hdrlen)
+ return -EINVAL;
+ if (size > hdr->cpt_hdrlen)
+ size = hdr->cpt_hdrlen;
+ if (size > sizeof(*hdr))
+ err = ctx->read(hdr + 1, size - sizeof(*hdr), ctx);
+ return err;
+}
+
+static int rst_read_head(struct cpt_context *ctx)
+{
+ struct cpt_head hdr;
+ int err;
+
+ err = -EBADF;
+ if (!ctx->file)
+ return err;
+
+ err = ctx->read(&hdr, sizeof(hdr), ctx);
+ if (err < 0)
+ return err;
+
+ if (hdr.cpt_signature[0] != CPT_SIGNATURE0 ||
+ hdr.cpt_signature[1] != CPT_SIGNATURE1 ||
+ hdr.cpt_signature[2] != CPT_SIGNATURE2 ||
+ hdr.cpt_signature[3] != CPT_SIGNATURE3) {
+ return -EINVAL;
+ }
+ if (KERNEL_VERSION(hdr.cpt_image_major, hdr.cpt_image_minor,
+ hdr.cpt_image_sublevel) != LINUX_VERSION_CODE)
+ return -EINVAL;
+
+#if defined(CONFIG_X86_32)
+ if (hdr.cpt_arch != CPT_ARCH_I386)
+ return -ENOSYS;
+#else
+#error Arch is not supported
+#endif
+
+ return 0;
+}
+
+int restart_container(struct cpt_context *ctx)
+{
+ int err;
+
+ err = rst_read_head(ctx);
+
+ /* Restart process */
+ if (!err)
+ err = -ENOSYS;
+
+ return err;
+}
diff --git a/cpt/sys.c b/cpt/sys.c
index 6801c22..5b453f2 100644
--- a/cpt/sys.c
+++ b/cpt/sys.c
@@ -142,7 +142,7 @@ static int restart(int ctid, int fd, unsigned long flags)
ctx->ctx_state = CPT_CTX_UNDUMPING;
/* restart */
- err = -ENOSYS;
+ err = restart_container(ctx);
context_put(ctx);
--
1.5.6
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH 8/9] Introduce functions to restart a process
2008-09-03 10:57 ` [PATCH 7/9] Introduce function for restarting a container Andrey Mirkin
@ 2008-09-03 10:57 ` Andrey Mirkin
2008-09-03 10:57 ` [PATCH 9/9] Introduce functions to restore mm Andrey Mirkin
2008-09-03 14:32 ` [PATCH 8/9] Introduce functions to restart a process Louis Rilling
0 siblings, 2 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
Functions to restart process, restore its state, fpu and registers are added.
Signed-off-by: Andrey Mirkin <major@openvz.org>
---
arch/x86/kernel/entry_32.S | 21 +++
arch/x86/kernel/process_32.c | 3 +
cpt/Makefile | 3 +-
cpt/cpt.h | 2 +
cpt/restart.c | 2 +-
cpt/rst_process.c | 277 ++++++++++++++++++++++++++++++++++++++++++
kernel/sched.c | 1 +
7 files changed, 307 insertions(+), 2 deletions(-)
create mode 100644 cpt/rst_process.c
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 109792b..a4848a3 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -225,6 +225,7 @@ ENTRY(ret_from_fork)
GET_THREAD_INFO(%ebp)
popl %eax
CFI_ADJUST_CFA_OFFSET -4
+ret_from_fork_tail:
pushl $0x0202 # Reset kernel eflags
CFI_ADJUST_CFA_OFFSET 4
popfl
@@ -233,6 +234,26 @@ ENTRY(ret_from_fork)
CFI_ENDPROC
END(ret_from_fork)
+ENTRY(i386_ret_from_resume)
+ CFI_STARTPROC
+ pushl %eax
+ CFI_ADJUST_CFA_OFFSET 4
+ call schedule_tail
+ GET_THREAD_INFO(%ebp)
+ popl %eax
+ CFI_ADJUST_CFA_OFFSET -4
+ movl (%esp), %eax
+ testl %eax, %eax
+ jz 1f
+ pushl %esp
+ call *%eax
+ addl $4, %esp
+1:
+ addl $256, %esp
+ jmp ret_from_fork_tail
+ CFI_ENDPROC
+END(i386_ret_from_resume)
+
/*
* Return to user mode is not as complex as all this looks,
* but we want the default path for a system call return to
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4711eed..1bdec02 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -58,6 +58,9 @@
#include <asm/kdebug.h>
asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
+EXPORT_SYMBOL(ret_from_fork);
+asmlinkage void i386_ret_from_resume(void) __asm__("i386_ret_from_resume");
+EXPORT_SYMBOL(i386_ret_from_resume);
DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
EXPORT_PER_CPU_SYMBOL(current_task);
diff --git a/cpt/Makefile b/cpt/Makefile
index 47c7852..689a0eb 100644
--- a/cpt/Makefile
+++ b/cpt/Makefile
@@ -2,4 +2,5 @@ obj-y += sys_core.o
obj-$(CONFIG_CHECKPOINT) += cptrst.o
-cptrst-objs := sys.o checkpoint.o cpt_process.o cpt_mm.o restart.o
+cptrst-objs := sys.o checkpoint.o cpt_process.o cpt_mm.o restart.o \
+ rst_process.o
diff --git a/cpt/cpt.h b/cpt/cpt.h
index 6471246..d59255f 100644
--- a/cpt/cpt.h
+++ b/cpt/cpt.h
@@ -60,3 +60,5 @@ int dump_container(struct cpt_context *ctx);
int cpt_dump_task(struct task_struct *tsk, struct cpt_context *ctx);
int cpt_dump_mm(struct task_struct *tsk, struct cpt_context *ctx);
int restart_container(struct cpt_context *ctx);
+int rst_get_object(int type, void *tmp, int size, struct cpt_context *ctx);
+int rst_restart_process(struct cpt_context *ctx);
diff --git a/cpt/restart.c b/cpt/restart.c
index 5770985..e856955 100644
--- a/cpt/restart.c
+++ b/cpt/restart.c
@@ -81,7 +81,7 @@ int restart_container(struct cpt_context *ctx)
/* Restart process */
if (!err)
- err = -ENOSYS;
+ err = rst_restart_process(ctx);
return err;
}
diff --git a/cpt/rst_process.c b/cpt/rst_process.c
new file mode 100644
index 0000000..6d47f3c
--- /dev/null
+++ b/cpt/rst_process.c
@@ -0,0 +1,277 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Author: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/version.h>
+#include <linux/module.h>
+
+#include "cpt.h"
+#include "cpt_image.h"
+
+#define HOOK_RESERVE 256
+
+struct thr_context {
+ struct completion complete;
+ int error;
+ struct cpt_context *ctx;
+ struct task_struct *tsk;
+};
+
+int local_kernel_thread(int (*fn)(void *), void * arg, unsigned long flags, pid_t pid)
+{
+ pid_t ret;
+
+ if (current->fs == NULL) {
+ /* do_fork_pid() hates processes without fs, oopses. */
+ eprintk("local_kernel_thread: current->fs==NULL\n");
+ return -EINVAL;
+ }
+ if (!try_module_get(THIS_MODULE))
+ return -EBUSY;
+ ret = kernel_thread(fn, arg, flags);
+ if (ret < 0)
+ module_put(THIS_MODULE);
+ return ret;
+}
+
+static unsigned int decode_task_flags(unsigned int task_flags)
+{
+ unsigned int flags = 0;
+
+ if (task_flags & (1 << CPT_PF_EXITING))
+ flags |= PF_EXITING;
+ if (task_flags & (1 << CPT_PF_FORKNOEXEC))
+ flags |= PF_FORKNOEXEC;
+ if (task_flags & (1 << CPT_PF_SUPERPRIV))
+ flags |= PF_SUPERPRIV;
+ if (task_flags & (1 << CPT_PF_DUMPCORE))
+ flags |= PF_DUMPCORE;
+ if (task_flags & (1 << CPT_PF_SIGNALED))
+ flags |= PF_SIGNALED;
+
+ return flags;
+
+}
+
+int rst_restore_task_struct(struct task_struct *tsk, struct cpt_task_image *ti,
+ struct cpt_context *ctx)
+{
+ int i;
+
+ /* Restore only saved flags, comm and tls for now */
+ tsk->flags = decode_task_flags(ti->cpt_flags);
+ clear_tsk_thread_flag(tsk, TIF_FREEZE);
+ memcpy(tsk->comm, ti->cpt_comm, TASK_COMM_LEN);
+ for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++) {
+ tsk->thread.tls_array[i].a = ti->cpt_tls[i] & 0xFFFFFFFF;
+ tsk->thread.tls_array[i].b = ti->cpt_tls[i] >> 32;
+ }
+
+ return 0;
+}
+
+static int rst_restore_fpustate(struct task_struct *tsk, struct cpt_task_image *ti,
+ struct cpt_context *ctx)
+{
+ struct cpt_obj_bits hdr;
+ int err;
+ char *buf;
+
+ clear_stopped_child_used_math(tsk);
+
+ err = rst_get_object(CPT_OBJ_BITS, &hdr, sizeof(hdr), ctx);
+ if (err < 0)
+ return err;
+
+ buf = kmalloc(hdr.cpt_size, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ err = ctx->read(buf, hdr.cpt_size, ctx);
+ if (err)
+ goto out;
+
+ if (hdr.cpt_content == CPT_CONTENT_X86_FPUSTATE && cpu_has_fxsr) {
+ memcpy(&tsk->thread.xstate, buf,
+ sizeof(struct i387_fxsave_struct));
+ if (ti->cpt_flags & CPT_PF_USED_MATH)
+ set_stopped_child_used_math(tsk);
+ }
+#ifndef CONFIG_X86_64
+ else if (hdr.cpt_content == CPT_CONTENT_X86_FPUSTATE_OLD &&
+ !cpu_has_fxsr) {
+ memcpy(&tsk->thread.xstate, buf,
+ sizeof(struct i387_fsave_struct));
+ if (ti->cpt_flags & CPT_PF_USED_MATH)
+ set_stopped_child_used_math(tsk);
+ }
+#endif
+
+out:
+ kfree(buf);
+ return err;
+}
+
+static u32 decode_segment(u32 segid)
+{
+ if (segid == CPT_SEG_ZERO)
+ return 0;
+
+ /* TLS descriptors */
+ if (segid <= CPT_SEG_TLS3)
+ return ((GDT_ENTRY_TLS_MIN + segid - CPT_SEG_TLS1) << 3) + 3;
+
+ /* LDT descriptor, it is just an index to LDT array */
+ if (segid >= CPT_SEG_LDT)
+ return ((segid - CPT_SEG_LDT) << 3) | 7;
+
+ /* Check for one of standard descriptors */
+ if (segid == CPT_SEG_USER32_DS)
+ return __USER_DS;
+ if (segid == CPT_SEG_USER32_CS)
+ return __USER_CS;
+
+ eprintk("Invalid segment reg %d\n", segid);
+ return 0;
+}
+
+static int rst_restore_registers(struct task_struct *tsk, struct cpt_context *ctx)
+{
+ struct cpt_x86_regs ri;
+ struct pt_regs *regs = task_pt_regs(tsk);
+ extern char i386_ret_from_resume;
+ int err;
+
+ err = rst_get_object(CPT_OBJ_X86_REGS, &ri, sizeof(ri), ctx);
+ if (err < 0)
+ return err;
+
+ tsk->thread.sp = (unsigned long) regs;
+ tsk->thread.sp0 = (unsigned long) (regs+1);
+ tsk->thread.ip = (unsigned long) &i386_ret_from_resume;
+
+ tsk->thread.gs = decode_segment(ri.cpt_gs);
+ tsk->thread.debugreg0 = ri.cpt_debugreg[0];
+ tsk->thread.debugreg1 = ri.cpt_debugreg[1];
+ tsk->thread.debugreg2 = ri.cpt_debugreg[2];
+ tsk->thread.debugreg3 = ri.cpt_debugreg[3];
+ tsk->thread.debugreg6 = ri.cpt_debugreg[6];
+ tsk->thread.debugreg7 = ri.cpt_debugreg[7];
+
+ regs->bx = ri.cpt_bx;
+ regs->cx = ri.cpt_cx;
+ regs->dx = ri.cpt_dx;
+ regs->si = ri.cpt_si;
+ regs->di = ri.cpt_di;
+ regs->bp = ri.cpt_bp;
+ regs->ax = ri.cpt_ax;
+ regs->orig_ax = ri.cpt_orig_ax;
+ regs->ip = ri.cpt_ip;
+ regs->flags = ri.cpt_flags;
+ regs->sp = ri.cpt_sp;
+
+ regs->cs = decode_segment(ri.cpt_cs);
+ regs->ss = decode_segment(ri.cpt_ss);
+ regs->ds = decode_segment(ri.cpt_ds);
+ regs->es = decode_segment(ri.cpt_es);
+ regs->fs = decode_segment(ri.cpt_fs);
+
+ tsk->thread.sp -= HOOK_RESERVE;
+ memset((void*)tsk->thread.sp, 0, HOOK_RESERVE);
+
+ return 0;
+}
+
+static int restart_thread(void *arg)
+{
+ struct thr_context *thr_ctx = arg;
+ struct cpt_context *ctx;
+ struct cpt_task_image *ti;
+ int err;
+
+ current->state = TASK_UNINTERRUPTIBLE;
+
+ ctx = thr_ctx->ctx;
+ ti = kmalloc(sizeof(*ti), GFP_KERNEL);
+ if (!ti)
+ return -ENOMEM;
+
+ err = rst_get_object(CPT_OBJ_TASK, ti, sizeof(*ti), ctx);
+ if (!err)
+ err = rst_restore_task_struct(current, ti, ctx);
+ /* Restore mm here */
+ if (!err)
+ err = rst_restore_fpustate(current, ti, ctx);
+ if (!err)
+ err = rst_restore_registers(current, ctx);
+
+ thr_ctx->error = err;
+ complete(&thr_ctx->complete);
+
+ if (!err && (ti->cpt_state & (EXIT_ZOMBIE|EXIT_DEAD))) {
+ do_exit(ti->cpt_exit_code);
+ } else {
+ __set_current_state(TASK_UNINTERRUPTIBLE);
+ }
+
+ kfree(ti);
+ schedule();
+
+ eprintk("leaked %d/%d %p\n", task_pid_nr(current), task_pid_vnr(current), current->mm);
+
+ module_put(THIS_MODULE);
+ complete_and_exit(NULL, 0);
+ return 0;
+}
+static int create_root_task(struct cpt_context *ctx,
+ struct thr_context *thr_ctx)
+{
+ struct task_struct *tsk;
+ int pid;
+
+ thr_ctx->ctx = ctx;
+ thr_ctx->error = 0;
+ init_completion(&thr_ctx->complete);
+
+ /* We should also create container here */
+ pid = local_kernel_thread(restart_thread, thr_ctx,
+ CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
+ CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNET, 0);
+ if (pid < 0)
+ return pid;
+ read_lock(&tasklist_lock);
+ tsk = find_task_by_vpid(pid);
+ if (tsk)
+ get_task_struct(tsk);
+ read_unlock(&tasklist_lock);
+ if (tsk == NULL)
+ return -ESRCH;
+ thr_ctx->tsk = tsk;
+ return 0;
+}
+
+int rst_restart_process(struct cpt_context *ctx)
+{
+ struct thr_context thr_ctx_root;
+ int err;
+
+ err = create_root_task(ctx, &thr_ctx_root);
+ if (err)
+ return err;
+
+ wait_for_completion(&thr_ctx_root.complete);
+ wait_task_inactive(thr_ctx_root.tsk, 0);
+
+ return err;
+}
diff --git a/kernel/sched.c b/kernel/sched.c
index 04160d2..94a23e5 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1970,6 +1970,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
return ncsw;
}
+EXPORT_SYMBOL(wait_task_inactive);
/***
* kick_process - kick a running thread to enter/exit the kernel
--
1.5.6
^ permalink raw reply related [flat|nested] 69+ messages in thread
* [PATCH 9/9] Introduce functions to restore mm
2008-09-03 10:57 ` [PATCH 8/9] Introduce functions to restart a process Andrey Mirkin
@ 2008-09-03 10:57 ` Andrey Mirkin
2008-09-03 14:32 ` [PATCH 8/9] Introduce functions to restart a process Louis Rilling
1 sibling, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 10:57 UTC (permalink / raw)
To: linux-kernel; +Cc: containers, Andrey Mirkin
Functions to restore mm, VMAs and mm context are added.
Signed-off-by: Andrey Mirkin <major@openvz.org>
---
cpt/Makefile | 2 +-
cpt/cpt.h | 1 +
cpt/cpt_image.h | 5 +
cpt/rst_mm.c | 320 +++++++++++++++++++++++++++++++++++++++++++++++++++++
cpt/rst_process.c | 3 +-
mm/mmap.c | 1 +
mm/mprotect.c | 2 +
8 files changed, 336 insertions(+), 2 deletions(-)
create mode 100644 cpt/rst_mm.c
diff --git a/cpt/Makefile b/cpt/Makefile
index 689a0eb..19ca732 100644
--- a/cpt/Makefile
+++ b/cpt/Makefile
@@ -3,4 +3,4 @@ obj-y += sys_core.o
obj-$(CONFIG_CHECKPOINT) += cptrst.o
cptrst-objs := sys.o checkpoint.o cpt_process.o cpt_mm.o restart.o \
- rst_process.o
+ rst_process.o rst_mm.o
diff --git a/cpt/cpt.h b/cpt/cpt.h
index d59255f..f2a1b28 100644
--- a/cpt/cpt.h
+++ b/cpt/cpt.h
@@ -62,3 +62,4 @@ int cpt_dump_mm(struct task_struct *tsk, struct cpt_context *ctx);
int restart_container(struct cpt_context *ctx);
int rst_get_object(int type, void *tmp, int size, struct cpt_context *ctx);
int rst_restart_process(struct cpt_context *ctx);
+int rst_restore_mm(struct cpt_context *ctx);
diff --git a/cpt/cpt_image.h b/cpt/cpt_image.h
index ae019e7..3b5b418 100644
--- a/cpt/cpt_image.h
+++ b/cpt/cpt_image.h
@@ -233,6 +233,11 @@ struct cpt_x86_regs
__u32 cpt_ss;
} __attribute__ ((aligned (8)));
+static inline void __user * cpt_ptr_import(__u64 ptr)
+{
+ return (void*)(unsigned long)ptr;
+}
+
static inline __u64 cpt_timespec_export(struct timespec *tv)
{
return (((u64)tv->tv_sec) << 32) + tv->tv_nsec;
diff --git a/cpt/rst_mm.c b/cpt/rst_mm.c
new file mode 100644
index 0000000..cccbff6
--- /dev/null
+++ b/cpt/rst_mm.c
@@ -0,0 +1,320 @@
+/*
+ * Copyright (C) 2008 Parallels, Inc.
+ *
+ * Author: Andrey Mirkin <major@openvz.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation, version 2 of the
+ * License.
+ *
+ */
+
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/highmem.h>
+#include <linux/pagemap.h>
+#include <linux/vmalloc.h>
+#include <linux/syscalls.h>
+
+#include "cpt.h"
+#include "cpt_image.h"
+
+static unsigned long make_prot(struct cpt_vma_image *vmai)
+{
+ unsigned long prot = 0;
+
+ if (vmai->cpt_flags & VM_READ)
+ prot |= PROT_READ;
+ if (vmai->cpt_flags & VM_WRITE)
+ prot |= PROT_WRITE;
+ if (vmai->cpt_flags & VM_EXEC)
+ prot |= PROT_EXEC;
+ if (vmai->cpt_flags & VM_GROWSDOWN)
+ prot |= PROT_GROWSDOWN;
+ if (vmai->cpt_flags & VM_GROWSUP)
+ prot |= PROT_GROWSUP;
+ return prot;
+}
+
+static unsigned long make_flags(struct cpt_vma_image *vmai)
+{
+ unsigned long flags = MAP_FIXED;
+
+ if (vmai->cpt_flags&(VM_SHARED|VM_MAYSHARE))
+ flags |= MAP_SHARED;
+ else
+ flags |= MAP_PRIVATE;
+
+ if (vmai->cpt_file == CPT_NULL)
+ flags |= MAP_ANONYMOUS;
+ if (vmai->cpt_flags & VM_GROWSDOWN)
+ flags |= MAP_GROWSDOWN;
+#ifdef MAP_GROWSUP
+ if (vmai->cpt_flags & VM_GROWSUP)
+ flags |= MAP_GROWSUP;
+#endif
+ if (vmai->cpt_flags & VM_DENYWRITE)
+ flags |= MAP_DENYWRITE;
+ if (vmai->cpt_flags & VM_EXECUTABLE)
+ flags |= MAP_EXECUTABLE;
+ if (!(vmai->cpt_flags & VM_ACCOUNT))
+ flags |= MAP_NORESERVE;
+ return flags;
+}
+
+static int rst_restore_one_vma(struct cpt_context *ctx)
+{
+ int err;
+ int i;
+ unsigned long addr;
+ struct mm_struct *mm = current->mm;
+ struct cpt_vma_image vmai;
+ struct vm_area_struct *vma;
+ struct file *file = NULL;
+ unsigned long prot;
+
+ err = rst_get_object(CPT_OBJ_VMA, &vmai, sizeof(vmai), ctx);
+ if (err)
+ return err;
+
+ prot = make_prot(&vmai);
+
+ if (vmai.cpt_vma_type == CPT_VMA_FILE) {
+ struct cpt_object_hdr h;
+ int len;
+ char *path;
+
+ err = rst_get_object(CPT_OBJ_NAME, &h, sizeof(h), ctx);
+ if (err)
+ goto out;
+ len = h.cpt_len - sizeof(h);
+ if (len < 0) {
+ err = -EINVAL;
+ goto out;
+ }
+ path = kmalloc(len, GFP_KERNEL);
+ if (!path) {
+ err = -ENOMEM;
+ goto out;
+ }
+ err = ctx->read(path, len, ctx);
+ if (err) {
+ kfree(path);
+ goto out;
+ }
+
+ /* Just open file
+ TODO: open with correct flags */
+ file = filp_open(path, O_RDONLY, 0);
+ kfree(path);
+ if (IS_ERR(file)) {
+ err = PTR_ERR(file);
+ goto out;
+ }
+ }
+
+ down_write(&mm->mmap_sem);
+ addr = do_mmap_pgoff(file, vmai.cpt_start,
+ vmai.cpt_end - vmai.cpt_start,
+ prot, make_flags(&vmai),
+ vmai.cpt_pgoff);
+
+ if (addr != vmai.cpt_start) {
+ up_write(&mm->mmap_sem);
+
+ err = -EINVAL;
+ if (IS_ERR((void*)addr))
+ err = addr;
+ goto out;
+ }
+
+ vma = find_vma(mm, vmai.cpt_start);
+ if (vma == NULL) {
+ up_write(&mm->mmap_sem);
+ eprintk("cannot find mmapped vma\n");
+ err = -ESRCH;
+ goto out;
+ }
+
+ /* do_mmap_pgoff() can merge new area to previous one (not to the next,
+ * we mmap in order, the rest of mm is still unmapped). This can happen
+ * f.e. if flags are to be adjusted later, or if we had different
+ * anon_vma on two adjacent regions. Split it by brute force. */
+ if (vma->vm_start != vmai.cpt_start) {
+ err = split_vma(mm, vma, (unsigned long)vmai.cpt_start, 0);
+ if (err) {
+ up_write(&mm->mmap_sem);
+ eprintk("cannot split vma\n");
+ goto out;
+ }
+ }
+ up_write(&mm->mmap_sem);
+
+ for (i = 0; i < vmai.cpt_page_num; i++) {
+ struct cpt_page_block pb;
+
+ err = rst_get_object(CPT_OBJ_PAGES, &pb, sizeof(pb), ctx);
+ if (err)
+ goto out;
+ if (!(vmai.cpt_flags & VM_ACCOUNT) && !(prot & PROT_WRITE)) {
+ /* I guess this is get_user_pages() messed things,
+ * this happens f.e. when gdb inserts breakpoints.
+ */
+ int j;
+ for (j = 0; j < (pb.cpt_end-pb.cpt_start)/PAGE_SIZE; j++) {
+ struct page *page;
+ void *maddr;
+ err = get_user_pages(current, current->mm,
+ (unsigned long)pb.cpt_start +
+ j * PAGE_SIZE,
+ 1, 1, 1, &page, NULL);
+ if (err == 0)
+ err = -EFAULT;
+ if (err < 0) {
+ eprintk("get_user_pages: %d\n", err);
+ goto out;
+ }
+ err = 0;
+ maddr = kmap(page);
+ if (pb.cpt_content == CPT_CONTENT_VOID) {
+ memset(maddr, 0, PAGE_SIZE);
+ } else if (pb.cpt_content == CPT_CONTENT_DATA) {
+ err = ctx->read(maddr, PAGE_SIZE, ctx);
+ if (err) {
+ kunmap(page);
+ goto out;
+ }
+ } else {
+ err = -EINVAL;
+ kunmap(page);
+ goto out;
+ }
+ set_page_dirty_lock(page);
+ kunmap(page);
+ page_cache_release(page);
+ }
+ } else {
+ if (!(prot & PROT_WRITE))
+ sys_mprotect(vmai.cpt_start,
+ vmai.cpt_end - vmai.cpt_start,
+ prot | PROT_WRITE);
+ if (pb.cpt_content == CPT_CONTENT_VOID) {
+ int j;
+ for (j=0; j<(pb.cpt_end-pb.cpt_start)/sizeof(unsigned long); j++) {
+ err = __put_user(0UL, ((unsigned long __user*)(unsigned long)pb.cpt_start) + j);
+ if (err) {
+ eprintk("__put_user 2 %d\n", err);
+ goto out;
+ }
+ }
+ } else if (pb.cpt_content == CPT_CONTENT_DATA) {
+ err = ctx->read(cpt_ptr_import(pb.cpt_start),
+ pb.cpt_end - pb.cpt_start,
+ ctx);
+ if (err)
+ goto out;
+ } else {
+ err = -EINVAL;
+ goto out;
+ }
+ if (!(prot & PROT_WRITE))
+ sys_mprotect(vmai.cpt_start,
+ vmai.cpt_end - vmai.cpt_start,
+ prot);
+ }
+ }
+
+out:
+ if (file)
+ fput(file);
+ return err;
+}
+
+static int rst_restore_mm_context(struct cpt_context *ctx)
+{
+ struct cpt_obj_bits b;
+ struct mm_struct *mm = current->mm;
+ int oldsize = mm->context.size;
+ int err;
+ void *oldldt;
+ void *newldt;
+
+ err = rst_get_object(CPT_OBJ_BITS, &b, sizeof(b), ctx);
+ if (err)
+ return err;
+
+ if (b.cpt_size > PAGE_SIZE)
+ newldt = vmalloc(b.cpt_size);
+ else
+ newldt = kmalloc(b.cpt_size, GFP_KERNEL);
+
+ if (!newldt)
+ return -ENOMEM;
+
+ err = ctx->read(newldt, b.cpt_size, ctx);
+ if (err)
+ return err;
+
+ oldldt = mm->context.ldt;
+ mm->context.ldt = newldt;
+ mm->context.size = b.cpt_size / LDT_ENTRY_SIZE;
+
+ load_LDT(&mm->context);
+
+ if (oldsize) {
+ if (oldsize * LDT_ENTRY_SIZE > PAGE_SIZE)
+ vfree(oldldt);
+ else
+ kfree(oldldt);
+ }
+
+ return 0;
+}
+
+int rst_restore_mm(struct cpt_context *ctx)
+{
+ int err;
+ int i;
+ struct mm_struct *mm = current->mm;
+ struct cpt_mm_image m;
+
+ err = rst_get_object(CPT_OBJ_MM, &m, sizeof(m), ctx);
+ if (err)
+ return err;
+
+ down_write(&mm->mmap_sem);
+ do_munmap(mm, 0, TASK_SIZE);
+
+ mm->start_code = m.cpt_start_code;
+ mm->end_code = m.cpt_end_code;
+ mm->start_data = m.cpt_start_data;
+ mm->end_data = m.cpt_end_data;
+ mm->start_brk = m.cpt_start_brk;
+ mm->brk = m.cpt_brk;
+ mm->start_stack = m.cpt_start_stack;
+ mm->arg_start = m.cpt_start_arg;
+ mm->arg_end = m.cpt_end_arg;
+ mm->env_start = m.cpt_start_env;
+ mm->env_end = m.cpt_end_env;
+ mm->def_flags = m.cpt_def_flags;
+ mm->flags = m.cpt_flags;
+
+ up_write(&mm->mmap_sem);
+
+ for (i = 0; i < m.cpt_map_count; i++) {
+ err = rst_restore_one_vma(ctx);
+ if (err < 0)
+ goto out;
+ }
+
+ err = rst_restore_mm_context(ctx);
+out:
+ return err;
+}
+
diff --git a/cpt/rst_process.c b/cpt/rst_process.c
index 6d47f3c..84be46e 100644
--- a/cpt/rst_process.c
+++ b/cpt/rst_process.c
@@ -210,7 +210,8 @@ static int restart_thread(void *arg)
err = rst_get_object(CPT_OBJ_TASK, ti, sizeof(*ti), ctx);
if (!err)
err = rst_restore_task_struct(current, ti, ctx);
- /* Restore mm here */
+ if (!err)
+ err = rst_restore_mm(ctx);
if (!err)
err = rst_restore_fpustate(current, ti, ctx);
if (!err)
diff --git a/mm/mmap.c b/mm/mmap.c
index 971d0ed..98d1ba9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1858,6 +1858,7 @@ int split_vma(struct mm_struct * mm, struct vm_area_struct * vma,
return 0;
}
+EXPORT_SYMBOL(split_vma);
/* Munmap is split into 2 main parts -- this part which finds
* what needs doing, and the areas themselves, which do the
diff --git a/mm/mprotect.c b/mm/mprotect.c
index fded06f..47c7d75 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -22,6 +22,7 @@
#include <linux/swap.h>
#include <linux/swapops.h>
#include <linux/mmu_notifier.h>
+#include <linux/module.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
#include <asm/cacheflush.h>
@@ -317,3 +318,4 @@ out:
up_write(¤t->mm->mmap_sem);
return error;
}
+EXPORT_SYMBOL(sys_mprotect);
--
1.5.6
^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls
2008-09-03 10:57 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Andrey Mirkin
2008-09-03 10:57 ` [PATCH 2/9] Make checkpoint/restart functionality modular Andrey Mirkin
@ 2008-09-03 11:44 ` Cedric Le Goater
2008-09-03 13:05 ` [Devel] " Andrey Mirkin
1 sibling, 1 reply; 69+ messages in thread
From: Cedric Le Goater @ 2008-09-03 11:44 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
Andrey Mirkin wrote:
> Right now they just return -ENOSYS. Later they will provide functionality
> to checkpoint and restart a container.
>
> Both syscalls take as arguments a file descriptor and flags.
> Also sys_checkpoint take as the first argument a PID of container's init
> (later it will be container ID); sys_restart takes as the first argument
> a container ID (right now it will not be used).
>
> Signed-off-by: Andrey Mirkin <major@openvz.org>
> ---
> Makefile | 2 +-
> arch/x86/kernel/syscall_table_32.S | 2 +
> cpt/Makefile | 1 +
> cpt/sys_core.c | 38 ++++++++++++++++++++++++++++++++++++
> include/asm-x86/unistd_32.h | 2 +
> 5 files changed, 44 insertions(+), 1 deletions(-)
> create mode 100644 cpt/Makefile
> create mode 100644 cpt/sys_core.c
>
> diff --git a/Makefile b/Makefile
> index ea413fa..1dee5c0 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -619,7 +619,7 @@ export mod_strip_cmd
>
>
> ifeq ($(KBUILD_EXTMOD),)
> -core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/
> +core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ cpt/
>
> vmlinux-dirs := $(patsubst %/,%,$(filter %/, $(init-y) $(init-m) \
> $(core-y) $(core-m) $(drivers-y) $(drivers-m) \
> diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
> index fd9d4f4..4a0d7fb 100644
> --- a/arch/x86/kernel/syscall_table_32.S
> +++ b/arch/x86/kernel/syscall_table_32.S
> @@ -333,3 +333,5 @@ ENTRY(sys_call_table)
> .long sys_pipe2
> .long sys_inotify_init1
> .long sys_hijack
> + .long sys_checkpoint
> + .long sys_restart /* 335 */
this patchset is based on top of :
git://git.kernel.org/pub/scm/linux/kernel/git/daveh/linux-2.6-lxc.git
right ?
C.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 10:57 [PATCH 0/9] OpenVZ kernel based checkpointing/restart Andrey Mirkin
2008-09-03 10:57 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Andrey Mirkin
@ 2008-09-03 12:28 ` Cedric Le Goater
2008-09-03 13:59 ` [Devel] " Andrey Mirkin
2008-09-03 14:18 ` Serge E. Hallyn
2008-09-03 13:49 ` Louis Rilling
` (3 subsequent siblings)
5 siblings, 2 replies; 69+ messages in thread
From: Cedric Le Goater @ 2008-09-03 12:28 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
Andrey Mirkin wrote:
> This patchset introduces kernel based checkpointing/restart as it is
> implemented in OpenVZ project. This patchset has limited functionality and
> are able to checkpoint/restart only single process. Recently Oren Laaden
> sent another kernel based implementation of checkpoint/restart. The main
> differences between this patchset and Oren's patchset are:
>
> * In this patchset checkpointing initiated not from the process
> (right now we do not have a container, only namespaces), Oren's patchset
> performs checkpointing from the process context.
>
> * Restart in this patchset is initiated from process, which restarts a new
> process (in new namespaces) with saved state. Oren's patchset uses the same
> process from which restart was initiated and restore saved state over it.
>
> * Checkpoint/restart functionality in this patchset is implemented as a kernel
> module
why ? Do we really think that C/R implementations will be so different that
we will need C/R ops to support all of them ? I imagine that there could be
different models :
1. brute force : dump it all and kill
2. incremental
3. live migration
...
But I see all of them really tied to the kernel internals.
The first issues I see with this direction are some EXPORT_SYMBOL() that would
be useless without a module.
> As checkpointing is initiated not from the process which state should be saved
> we should freeze a process before saving its state. Right now Container Freezer
> from Matt Helsley can be used for this.
OK that's integrated and Daniel's tools :
http://lxc.cvs.sourceforge.net/lxc/
one more reason to work on integration :)
C.
> This patchset introduce only a concept how kernel based checkpointing/restart
> can be implemented and are able to checkpoint/restart only a single process
> with simple VMAs.
>
> I've tried to split my patchset in small patches to make review more easier.
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/9] Introduce context structure needed during checkpointing/restart
2008-09-03 10:57 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Andrey Mirkin
2008-09-03 10:57 ` [PATCH 4/9] Introduce container dump function Andrey Mirkin
@ 2008-09-03 12:29 ` Matthieu Fertré
2008-09-03 14:11 ` Andrey Mirkin
2008-09-03 13:56 ` Louis Rilling
2008-09-03 14:13 ` Cedric Le Goater
3 siblings, 1 reply; 69+ messages in thread
From: Matthieu Fertré @ 2008-09-03 12:29 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
Andrey Mirkin a écrit :
> Add functions for context allocation/destroy.
> Introduce functions to read/write image.
> Introduce image header and object header.
>
> Signed-off-by: Andrey Mirkin <major@openvz.org>
> ---
> cpt/cpt.h | 37 ++++++++++++++++
> cpt/cpt_image.h | 63 +++++++++++++++++++++++++++
> cpt/sys.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 224 insertions(+), 3 deletions(-)
> create mode 100644 cpt/cpt_image.h
>
> diff --git a/cpt/cpt.h b/cpt/cpt.h
> index 381a9bf..607ac1b 100644
> --- a/cpt/cpt.h
> +++ b/cpt/cpt.h
> @@ -10,6 +10,8 @@
> *
> */
>
> +#include "cpt_image.h"
> +
> struct cpt_operations
> {
> struct module * owner;
> @@ -17,3 +19,38 @@ struct cpt_operations
> int (*restart)(int ctid, int fd, unsigned long flags);
> };
> extern struct cpt_operations cpt_ops;
> +
> +#define CPT_CTX_ERROR -1
> +#define CPT_CTX_IDLE 0
> +#define CPT_CTX_DUMPING 1
> +#define CPT_CTX_UNDUMPING 2
>
Maybe you can define an enum here instead.
> +
> +typedef struct cpt_context
> +{
> + pid_t pid; /* should be changed to ctid later */
> + int ctx_id; /* context id */
> + struct list_head ctx_list;
> + int refcount;
> + int ctx_state;
>
It will be more clear that ctx_state refers to it.
Thanks,
Matthieu
> + struct semaphore main_sem;
> +
> + int errno;
> +
> + struct file *file;
> + loff_t current_object;
> +
> + struct list_head object_array[CPT_OBJ_MAX];
> +
> + int (*write)(const void *addr, size_t count, struct cpt_context *ctx);
> + int (*read)(void *addr, size_t count, struct cpt_context *ctx);
> +} cpt_context_t;
> +
> +extern int debug_level;
> +
> +#define cpt_printk(lvl, fmt, args...) do { \
> + if (lvl <= debug_level) \
> + printk(fmt, ##args); \
> + } while (0)
> +
> +#define eprintk(a...) cpt_printk(1, "CPT ERR: " a)
> +#define dprintk(a...) cpt_printk(1, "CPT DBG: " a)
> diff --git a/cpt/cpt_image.h b/cpt/cpt_image.h
> new file mode 100644
> index 0000000..3d26229
> --- /dev/null
> +++ b/cpt/cpt_image.h
> @@ -0,0 +1,63 @@
> +/*
> + * Copyright (C) 2008 Parallels, Inc.
> + *
> + * Author: Andrey Mirkin <major@openvz.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + */
> +
> +#ifndef __CPT_IMAGE_H_
> +#define __CPT_IMAGE_H_ 1
> +
> +enum _cpt_object_type
> +{
> + CPT_OBJ_TASK = 0,
> + CPT_OBJ_MAX,
> + /* The objects above are stored in memory while checkpointing */
> +
> + CPT_OBJ_HEAD = 1024,
> +};
> +
> +enum _cpt_content_type {
> + CPT_CONTENT_VOID,
> + CPT_CONTENT_ARRAY,
> + CPT_CONTENT_DATA,
> + CPT_CONTENT_NAME,
> + CPT_CONTENT_REF,
> + CPT_CONTENT_MAX
> +};
> +
> +#define CPT_SIGNATURE0 0x79
> +#define CPT_SIGNATURE1 0x1c
> +#define CPT_SIGNATURE2 0x01
> +#define CPT_SIGNATURE3 0x63
> +
> +struct cpt_head
> +{
> + __u8 cpt_signature[4]; /* Magic number */
> + __u32 cpt_hdrlen; /* Header length */
> + __u16 cpt_image_major; /* Format of this file */
> + __u16 cpt_image_minor; /* Format of this file */
> + __u16 cpt_image_sublevel; /* Format of this file */
> + __u16 cpt_image_extra; /* Format of this file */
> + __u16 cpt_arch; /* Architecture */
> + __u16 cpt_pad1;
> + __u32 cpt_pad2;
> +#define CPT_ARCH_I386 0
> + __u64 cpt_time; /* Time */
> +} __attribute__ ((aligned (8)));
> +
> +/* Common object header. */
> +struct cpt_object_hdr
> +{
> + __u64 cpt_len; /* Size of current chunk of data */
> + __u16 cpt_type; /* Type of object */
> + __u32 cpt_hdrlen; /* Size of header */
> + __u16 cpt_content; /* Content type: array, reference... */
> +} __attribute__ ((aligned (8)));
> +
> +#endif /* __CPT_IMAGE_H_ */
> diff --git a/cpt/sys.c b/cpt/sys.c
> index 4051286..8334c4c 100644
> --- a/cpt/sys.c
> +++ b/cpt/sys.c
> @@ -13,21 +13,142 @@
> #include <linux/sched.h>
> #include <linux/fs.h>
> #include <linux/file.h>
> -#include <linux/notifier.h>
> +#include <linux/uaccess.h>
> #include <linux/module.h>
>
> #include "cpt.h"
> +#include "cpt_image.h"
>
> MODULE_LICENSE("GPL");
>
> +/* Debug level, constant for now */
> +int debug_level = 1;
> +
> +static int file_write(const void *addr, size_t count, struct cpt_context *ctx)
> +{
> + mm_segment_t oldfs;
> + ssize_t err = -EBADF;
> + struct file *file = ctx->file;
> +
> + oldfs = get_fs(); set_fs(KERNEL_DS);
> + if (file)
> + err = file->f_op->write(file, addr, count, &file->f_pos);
> + set_fs(oldfs);
> + if (err != count)
> + return err >= 0 ? -EIO : err;
> + return 0;
> +}
> +
> +static int file_read(void *addr, size_t count, struct cpt_context *ctx)
> +{
> + mm_segment_t oldfs;
> + ssize_t err = -EBADF;
> + struct file *file = ctx->file;
> +
> + oldfs = get_fs(); set_fs(KERNEL_DS);
> + if (file)
> + err = file->f_op->read(file, addr, count, &file->f_pos);
> + set_fs(oldfs);
> + if (err != count)
> + return err >= 0 ? -EIO : err;
> + return 0;
> +}
> +
> +struct cpt_context * context_alloc(void)
> +{
> + struct cpt_context *ctx;
> + int i;
> +
> + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> + if (!ctx)
> + return NULL;
> +
> + init_MUTEX(&ctx->main_sem);
> + ctx->refcount = 1;
> +
> + ctx->current_object = -1;
> + ctx->write = file_write;
> + ctx->read = file_read;
> + for (i = 0; i < CPT_OBJ_MAX; i++) {
> + INIT_LIST_HEAD(&ctx->object_array[i]);
> + }
> +
> + return ctx;
> +}
> +
> +void context_release(struct cpt_context *ctx)
> +{
> + ctx->ctx_state = CPT_CTX_ERROR;
> +
> + if (ctx->file)
> + fput(ctx->file);
> + kfree(ctx);
> +}
> +
> +static void context_put(struct cpt_context *ctx)
> +{
> + if (!--ctx->refcount)
> + context_release(ctx);
> +}
> +
> static int checkpoint(pid_t pid, int fd, unsigned long flags)
> {
> - return -ENOSYS;
> + struct file *file;
> + struct cpt_context *ctx;
> + int err;
> +
> + err = -EBADF;
> + file = fget(fd);
> + if (!file)
> + goto out;
> +
> + err = -ENOMEM;
> + ctx = context_alloc();
> + if (!ctx)
> + goto out_file;
> +
> + ctx->file = file;
> + ctx->ctx_state = CPT_CTX_DUMPING;
> +
> + /* checkpoint */
> + err = -ENOSYS;
> +
> + context_put(ctx);
> +
> +out_file:
> + fput(file);
> +out:
> + return err;
> }
>
> static int restart(int ctid, int fd, unsigned long flags)
> {
> - return -ENOSYS;
> + struct file *file;
> + struct cpt_context *ctx;
> + int err;
> +
> + err = -EBADF;
> + file = fget(fd);
> + if (!file)
> + goto out;
> +
> + err = -ENOMEM;
> + ctx = context_alloc();
> + if (!ctx)
> + goto out_file;
> +
> + ctx->file = file;
> + ctx->ctx_state = CPT_CTX_UNDUMPING;
> +
> + /* restart */
> + err = -ENOSYS;
> +
> + context_put(ctx);
> +
> +out_file:
> + fput(file);
> +out:
> + return err;
> }
>
> static int __init init_cptrst(void)
>
--
Matthieu Fertré
IRISA, Campus universitaire de Beaulieu - 35 042 Rennes, France
tel: +33 2 99 84 22 74
fax: +33 2 99 84 71 71
mail: mfertre@irisa.fr
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls
2008-09-03 11:44 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Cedric Le Goater
@ 2008-09-03 13:05 ` Andrey Mirkin
0 siblings, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 13:05 UTC (permalink / raw)
To: devel; +Cc: Cedric Le Goater, Andrey Mirkin, containers, linux-kernel
On Wednesday 03 September 2008 15:44 Cedric Le Goater wrote:
> Andrey Mirkin wrote:
> > Right now they just return -ENOSYS. Later they will provide functionality
> > to checkpoint and restart a container.
> >
> > Both syscalls take as arguments a file descriptor and flags.
> > Also sys_checkpoint take as the first argument a PID of container's init
> > (later it will be container ID); sys_restart takes as the first argument
> > a container ID (right now it will not be used).
> >
> > Signed-off-by: Andrey Mirkin <major@openvz.org>
> > ---
> > Makefile | 2 +-
> > arch/x86/kernel/syscall_table_32.S | 2 +
> > cpt/Makefile | 1 +
> > cpt/sys_core.c | 38
> > ++++++++++++++++++++++++++++++++++++ include/asm-x86/unistd_32.h |
> > 2 +
> > 5 files changed, 44 insertions(+), 1 deletions(-)
> > create mode 100644 cpt/Makefile
> > create mode 100644 cpt/sys_core.c
> >
> > diff --git a/Makefile b/Makefile
> > index ea413fa..1dee5c0 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -619,7 +619,7 @@ export mod_strip_cmd
> >
> >
> > ifeq ($(KBUILD_EXTMOD),)
> > -core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/
> > +core-y += kernel/ mm/ fs/ ipc/ security/ crypto/ block/ cpt/
> >
> > vmlinux-dirs := $(patsubst %/,%,$(filter %/, $(init-y) $(init-m) \
> > $(core-y) $(core-m) $(drivers-y) $(drivers-m) \
> > diff --git a/arch/x86/kernel/syscall_table_32.S
> > b/arch/x86/kernel/syscall_table_32.S index fd9d4f4..4a0d7fb 100644
> > --- a/arch/x86/kernel/syscall_table_32.S
> > +++ b/arch/x86/kernel/syscall_table_32.S
> > @@ -333,3 +333,5 @@ ENTRY(sys_call_table)
> > .long sys_pipe2
> > .long sys_inotify_init1
> > .long sys_hijack
> > + .long sys_checkpoint
> > + .long sys_restart /* 335 */
>
> this patchset is based on top of :
>
> git://git.kernel.org/pub/scm/linux/kernel/git/daveh/linux-2.6-lxc.git
>
> right ?
Yes, it is based on Dave's git tree.
Regards,
Andrey
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 10:57 [PATCH 0/9] OpenVZ kernel based checkpointing/restart Andrey Mirkin
2008-09-03 10:57 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Andrey Mirkin
2008-09-03 12:28 ` [PATCH 0/9] OpenVZ kernel based checkpointing/restart Cedric Le Goater
@ 2008-09-03 13:49 ` Louis Rilling
2008-09-03 14:06 ` Louis Rilling
2008-09-04 8:14 ` Oren Laadan
` (2 subsequent siblings)
5 siblings, 1 reply; 69+ messages in thread
From: Louis Rilling @ 2008-09-03 13:49 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
[-- Attachment #1: Type: text/plain, Size: 1792 bytes --]
On Wed, Sep 03, 2008 at 02:57:47PM +0400, Andrey Mirkin wrote:
> This patchset introduces kernel based checkpointing/restart as it is
> implemented in OpenVZ project. This patchset has limited functionality and
> are able to checkpoint/restart only single process. Recently Oren Laaden
> sent another kernel based implementation of checkpoint/restart. The main
> differences between this patchset and Oren's patchset are:
>
> * In this patchset checkpointing initiated not from the process
> (right now we do not have a container, only namespaces), Oren's patchset
> performs checkpointing from the process context.
>
> * Restart in this patchset is initiated from process, which restarts a new
> process (in new namespaces) with saved state. Oren's patchset uses the same
> process from which restart was initiated and restore saved state over it.
>
> * Checkpoint/restart functionality in this patchset is implemented as a kernel
> module
>
>
> As checkpointing is initiated not from the process which state should be saved
> we should freeze a process before saving its state. Right now Container Freezer
> from Matt Helsley can be used for this.
>
> This patchset introduce only a concept how kernel based checkpointing/restart
> can be implemented and are able to checkpoint/restart only a single process
> with simple VMAs.
>
> I've tried to split my patchset in small patches to make review more easier.
Thank you Andrey for having highlighted the differences with Oren's approach,
and for having split this patchset. Few remarks in reply to the patches.
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/9] Introduce context structure needed during checkpointing/restart
2008-09-03 10:57 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Andrey Mirkin
2008-09-03 10:57 ` [PATCH 4/9] Introduce container dump function Andrey Mirkin
2008-09-03 12:29 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Matthieu Fertré
@ 2008-09-03 13:56 ` Louis Rilling
2008-09-03 14:07 ` Andrey Mirkin
2008-09-03 14:13 ` Cedric Le Goater
3 siblings, 1 reply; 69+ messages in thread
From: Louis Rilling @ 2008-09-03 13:56 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
[-- Attachment #1: Type: text/plain, Size: 2452 bytes --]
On Wed, Sep 03, 2008 at 02:57:50PM +0400, Andrey Mirkin wrote:
> Add functions for context allocation/destroy.
> Introduce functions to read/write image.
> Introduce image header and object header.
>
[...]
> diff --git a/cpt/cpt_image.h b/cpt/cpt_image.h
> new file mode 100644
> index 0000000..3d26229
> --- /dev/null
> +++ b/cpt/cpt_image.h
> @@ -0,0 +1,63 @@
> +/*
> + * Copyright (C) 2008 Parallels, Inc.
> + *
> + * Author: Andrey Mirkin <major@openvz.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + */
> +
> +#ifndef __CPT_IMAGE_H_
> +#define __CPT_IMAGE_H_ 1
> +
> +enum _cpt_object_type
> +{
> + CPT_OBJ_TASK = 0,
> + CPT_OBJ_MAX,
> + /* The objects above are stored in memory while checkpointing */
> +
> + CPT_OBJ_HEAD = 1024,
> +};
> +
> +enum _cpt_content_type {
> + CPT_CONTENT_VOID,
> + CPT_CONTENT_ARRAY,
> + CPT_CONTENT_DATA,
> + CPT_CONTENT_NAME,
> + CPT_CONTENT_REF,
> + CPT_CONTENT_MAX
> +};
> +
> +#define CPT_SIGNATURE0 0x79
> +#define CPT_SIGNATURE1 0x1c
> +#define CPT_SIGNATURE2 0x01
> +#define CPT_SIGNATURE3 0x63
> +
> +struct cpt_head
> +{
> + __u8 cpt_signature[4]; /* Magic number */
> + __u32 cpt_hdrlen; /* Header length */
> + __u16 cpt_image_major; /* Format of this file */
> + __u16 cpt_image_minor; /* Format of this file */
> + __u16 cpt_image_sublevel; /* Format of this file */
> + __u16 cpt_image_extra; /* Format of this file */
> + __u16 cpt_arch; /* Architecture */
> + __u16 cpt_pad1;
> + __u32 cpt_pad2;
> +#define CPT_ARCH_I386 0
Why is this constant precisely defined after the padding?
> + __u64 cpt_time; /* Time */
> +} __attribute__ ((aligned (8)));
> +
> +/* Common object header. */
> +struct cpt_object_hdr
> +{
> + __u64 cpt_len; /* Size of current chunk of data */
> + __u16 cpt_type; /* Type of object */
> + __u32 cpt_hdrlen; /* Size of header */
> + __u16 cpt_content; /* Content type: array, reference... */
This layout looks a bit awkward for 32bits/64bits compatibility. Maybe put
cpt_hdrlen before cpt_type?
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 12:28 ` [PATCH 0/9] OpenVZ kernel based checkpointing/restart Cedric Le Goater
@ 2008-09-03 13:59 ` Andrey Mirkin
2008-09-04 22:55 ` Dave Hansen
2008-09-03 14:18 ` Serge E. Hallyn
1 sibling, 1 reply; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 13:59 UTC (permalink / raw)
To: devel; +Cc: Cedric Le Goater, Andrey Mirkin, containers, linux-kernel
On Wednesday 03 September 2008 16:28 Cedric Le Goater wrote:
> Andrey Mirkin wrote:
> > This patchset introduces kernel based checkpointing/restart as it is
> > implemented in OpenVZ project. This patchset has limited functionality
> > and are able to checkpoint/restart only single process. Recently Oren
> > Laaden sent another kernel based implementation of checkpoint/restart.
> > The main differences between this patchset and Oren's patchset are:
> >
> > * In this patchset checkpointing initiated not from the process
> > (right now we do not have a container, only namespaces), Oren's patchset
> > performs checkpointing from the process context.
> >
> > * Restart in this patchset is initiated from process, which restarts a
> > new process (in new namespaces) with saved state. Oren's patchset uses
> > the same process from which restart was initiated and restore saved state
> > over it.
> >
> > * Checkpoint/restart functionality in this patchset is implemented as a
> > kernel module
>
> why ? Do we really think that C/R implementations will be so different that
> we will need C/R ops to support all of them ? I imagine that there could be
> different models :
>
> 1. brute force : dump it all and kill
> 2. incremental
> 3. live migration
> ...
>
> But I see all of them really tied to the kernel internals.
>
> The first issues I see with this direction are some EXPORT_SYMBOL() that
> would be useless without a module.
Checkpoint/restart functionality is implemented as a kernel module to provide
more flexibility during development process - you can easily unload module,
fix what you need and load module again, you do not need to reboot you server
in this case. We have discussed this question during Container's mini summit
on OLS-2008 and we have found out that it will be convenient to have such an
ability.
>
> > As checkpointing is initiated not from the process which state should be
> > saved we should freeze a process before saving its state. Right now
> > Container Freezer from Matt Helsley can be used for this.
>
> OK that's integrated and Daniel's tools :
>
> http://lxc.cvs.sourceforge.net/lxc/
>
> one more reason to work on integration :)
Yes, you're right :)
Regards,
Andrey
>
> > This patchset introduce only a concept how kernel based
> > checkpointing/restart can be implemented and are able to
> > checkpoint/restart only a single process with simple VMAs.
> >
> > I've tried to split my patchset in small patches to make review more
> > easier. _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/containers
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
>
> _______________________________________________
> Devel mailing list
> Devel@openvz.org
> https://openvz.org/mailman/listinfo/devel
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 13:49 ` Louis Rilling
@ 2008-09-03 14:06 ` Louis Rilling
2008-09-03 14:19 ` Andrey Mirkin
2008-09-03 14:26 ` Cedric Le Goater
0 siblings, 2 replies; 69+ messages in thread
From: Louis Rilling @ 2008-09-03 14:06 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: containers, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2507 bytes --]
On Wed, Sep 03, 2008 at 03:49:51PM +0200, Louis Rilling wrote:
> On Wed, Sep 03, 2008 at 02:57:47PM +0400, Andrey Mirkin wrote:
> > This patchset introduces kernel based checkpointing/restart as it is
> > implemented in OpenVZ project. This patchset has limited functionality and
> > are able to checkpoint/restart only single process. Recently Oren Laaden
> > sent another kernel based implementation of checkpoint/restart. The main
> > differences between this patchset and Oren's patchset are:
> >
> > * In this patchset checkpointing initiated not from the process
> > (right now we do not have a container, only namespaces), Oren's patchset
> > performs checkpointing from the process context.
> >
> > * Restart in this patchset is initiated from process, which restarts a new
> > process (in new namespaces) with saved state. Oren's patchset uses the same
> > process from which restart was initiated and restore saved state over it.
> >
> > * Checkpoint/restart functionality in this patchset is implemented as a kernel
> > module
> >
> >
> > As checkpointing is initiated not from the process which state should be saved
> > we should freeze a process before saving its state. Right now Container Freezer
> > from Matt Helsley can be used for this.
> >
> > This patchset introduce only a concept how kernel based checkpointing/restart
> > can be implemented and are able to checkpoint/restart only a single process
> > with simple VMAs.
> >
> > I've tried to split my patchset in small patches to make review more easier.
>
> Thank you Andrey for having highlighted the differences with Oren's approach,
> and for having split this patchset. Few remarks in reply to the patches.
Forgot a global comment: you will probably get the same (rather pointless for a
proof of concept, IMHO) requests as Oren got, to
1) improve coding style:
a) especially avoid error handling like:
err = foo();
if (!err)
err = bar();
if (!err)
err = baz();
and prefer
err = foo();
if (err)
goto foo_err;
err = bar();
...
b) do not write conditions on a single line, like
if (foo) bar;
2) put arch-dependent code in arch/ subdirs.
3) I probably forgot other ones.
I obviously do not personally request you to take these requests into account ;)
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/9] Introduce context structure needed during checkpointing/restart
2008-09-03 13:56 ` Louis Rilling
@ 2008-09-03 14:07 ` Andrey Mirkin
0 siblings, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 14:07 UTC (permalink / raw)
To: Louis.Rilling; +Cc: linux-kernel, containers
On Wednesday 03 September 2008 17:56 Louis Rilling wrote:
> On Wed, Sep 03, 2008 at 02:57:50PM +0400, Andrey Mirkin wrote:
> > Add functions for context allocation/destroy.
> > Introduce functions to read/write image.
> > Introduce image header and object header.
>
> [...]
>
> > diff --git a/cpt/cpt_image.h b/cpt/cpt_image.h
> > new file mode 100644
> > index 0000000..3d26229
> > --- /dev/null
> > +++ b/cpt/cpt_image.h
> > @@ -0,0 +1,63 @@
> > +/*
> > + * Copyright (C) 2008 Parallels, Inc.
> > + *
> > + * Author: Andrey Mirkin <major@openvz.org>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation, version 2 of the
> > + * License.
> > + *
> > + */
> > +
> > +#ifndef __CPT_IMAGE_H_
> > +#define __CPT_IMAGE_H_ 1
> > +
> > +enum _cpt_object_type
> > +{
> > + CPT_OBJ_TASK = 0,
> > + CPT_OBJ_MAX,
> > + /* The objects above are stored in memory while checkpointing */
> > +
> > + CPT_OBJ_HEAD = 1024,
> > +};
> > +
> > +enum _cpt_content_type {
> > + CPT_CONTENT_VOID,
> > + CPT_CONTENT_ARRAY,
> > + CPT_CONTENT_DATA,
> > + CPT_CONTENT_NAME,
> > + CPT_CONTENT_REF,
> > + CPT_CONTENT_MAX
> > +};
> > +
> > +#define CPT_SIGNATURE0 0x79
> > +#define CPT_SIGNATURE1 0x1c
> > +#define CPT_SIGNATURE2 0x01
> > +#define CPT_SIGNATURE3 0x63
> > +
> > +struct cpt_head
> > +{
> > + __u8 cpt_signature[4]; /* Magic number */
> > + __u32 cpt_hdrlen; /* Header length */
> > + __u16 cpt_image_major; /* Format of this file */
> > + __u16 cpt_image_minor; /* Format of this file */
> > + __u16 cpt_image_sublevel; /* Format of this file */
> > + __u16 cpt_image_extra; /* Format of this file */
> > + __u16 cpt_arch; /* Architecture */
> > + __u16 cpt_pad1;
> > + __u32 cpt_pad2;
> > +#define CPT_ARCH_I386 0
>
> Why is this constant precisely defined after the padding?
I will move it right after cpt_arch. Thanks.
>
> > + __u64 cpt_time; /* Time */
> > +} __attribute__ ((aligned (8)));
> > +
> > +/* Common object header. */
> > +struct cpt_object_hdr
> > +{
> > + __u64 cpt_len; /* Size of current chunk of data */
> > + __u16 cpt_type; /* Type of object */
> > + __u32 cpt_hdrlen; /* Size of header */
> > + __u16 cpt_content; /* Content type: array, reference... */
>
> This layout looks a bit awkward for 32bits/64bits compatibility. Maybe put
> cpt_hdrlen before cpt_type?
You right here. Thanks for the catch. I will change that in next version.
Thanks,
Andrey
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/9] Introduce context structure needed during checkpointing/restart
2008-09-03 12:29 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Matthieu Fertré
@ 2008-09-03 14:11 ` Andrey Mirkin
0 siblings, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 14:11 UTC (permalink / raw)
To: Matthieu Fertré; +Cc: linux-kernel, containers
On Wednesday 03 September 2008 16:29 Matthieu Fertré wrote:
> Andrey Mirkin a écrit :
> > Add functions for context allocation/destroy.
> > Introduce functions to read/write image.
> > Introduce image header and object header.
> >
> > Signed-off-by: Andrey Mirkin <major@openvz.org>
> > ---
> > cpt/cpt.h | 37 ++++++++++++++++
> > cpt/cpt_image.h | 63 +++++++++++++++++++++++++++
> > cpt/sys.c | 127
> > +++++++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed,
> > 224 insertions(+), 3 deletions(-)
> > create mode 100644 cpt/cpt_image.h
> >
> > diff --git a/cpt/cpt.h b/cpt/cpt.h
> > index 381a9bf..607ac1b 100644
> > --- a/cpt/cpt.h
> > +++ b/cpt/cpt.h
> > @@ -10,6 +10,8 @@
> > *
> > */
> >
> > +#include "cpt_image.h"
> > +
> > struct cpt_operations
> > {
> > struct module * owner;
> > @@ -17,3 +19,38 @@ struct cpt_operations
> > int (*restart)(int ctid, int fd, unsigned long flags);
> > };
> > extern struct cpt_operations cpt_ops;
> > +
> > +#define CPT_CTX_ERROR -1
> > +#define CPT_CTX_IDLE 0
> > +#define CPT_CTX_DUMPING 1
> > +#define CPT_CTX_UNDUMPING 2
>
> Maybe you can define an enum here instead.
>
> > +
> > +typedef struct cpt_context
> > +{
> > + pid_t pid; /* should be changed to ctid later */
> > + int ctx_id; /* context id */
> > + struct list_head ctx_list;
> > + int refcount;
> > + int ctx_state;
>
> It will be more clear that ctx_state refers to it.
Thanks for the idea with enum, I'll fix it in next version.
Regards,
Andrey
>
> > + struct semaphore main_sem;
> > +
> > + int errno;
> > +
> > + struct file *file;
> > + loff_t current_object;
> > +
> > + struct list_head object_array[CPT_OBJ_MAX];
> > +
> > + int (*write)(const void *addr, size_t count, struct cpt_context *ctx);
> > + int (*read)(void *addr, size_t count, struct cpt_context *ctx);
> > +} cpt_context_t;
> > +
> > +extern int debug_level;
> > +
> > +#define cpt_printk(lvl, fmt, args...) do { \
> > + if (lvl <= debug_level) \
> > + printk(fmt, ##args); \
> > + } while (0)
> > +
> > +#define eprintk(a...) cpt_printk(1, "CPT ERR: " a)
> > +#define dprintk(a...) cpt_printk(1, "CPT DBG: " a)
> > diff --git a/cpt/cpt_image.h b/cpt/cpt_image.h
> > new file mode 100644
> > index 0000000..3d26229
> > --- /dev/null
> > +++ b/cpt/cpt_image.h
> > @@ -0,0 +1,63 @@
> > +/*
> > + * Copyright (C) 2008 Parallels, Inc.
> > + *
> > + * Author: Andrey Mirkin <major@openvz.org>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation, version 2 of the
> > + * License.
> > + *
> > + */
> > +
> > +#ifndef __CPT_IMAGE_H_
> > +#define __CPT_IMAGE_H_ 1
> > +
> > +enum _cpt_object_type
> > +{
> > + CPT_OBJ_TASK = 0,
> > + CPT_OBJ_MAX,
> > + /* The objects above are stored in memory while checkpointing */
> > +
> > + CPT_OBJ_HEAD = 1024,
> > +};
> > +
> > +enum _cpt_content_type {
> > + CPT_CONTENT_VOID,
> > + CPT_CONTENT_ARRAY,
> > + CPT_CONTENT_DATA,
> > + CPT_CONTENT_NAME,
> > + CPT_CONTENT_REF,
> > + CPT_CONTENT_MAX
> > +};
> > +
> > +#define CPT_SIGNATURE0 0x79
> > +#define CPT_SIGNATURE1 0x1c
> > +#define CPT_SIGNATURE2 0x01
> > +#define CPT_SIGNATURE3 0x63
> > +
> > +struct cpt_head
> > +{
> > + __u8 cpt_signature[4]; /* Magic number */
> > + __u32 cpt_hdrlen; /* Header length */
> > + __u16 cpt_image_major; /* Format of this file */
> > + __u16 cpt_image_minor; /* Format of this file */
> > + __u16 cpt_image_sublevel; /* Format of this file */
> > + __u16 cpt_image_extra; /* Format of this file */
> > + __u16 cpt_arch; /* Architecture */
> > + __u16 cpt_pad1;
> > + __u32 cpt_pad2;
> > +#define CPT_ARCH_I386 0
> > + __u64 cpt_time; /* Time */
> > +} __attribute__ ((aligned (8)));
> > +
> > +/* Common object header. */
> > +struct cpt_object_hdr
> > +{
> > + __u64 cpt_len; /* Size of current chunk of data */
> > + __u16 cpt_type; /* Type of object */
> > + __u32 cpt_hdrlen; /* Size of header */
> > + __u16 cpt_content; /* Content type: array, reference... */
> > +} __attribute__ ((aligned (8)));
> > +
> > +#endif /* __CPT_IMAGE_H_ */
> > diff --git a/cpt/sys.c b/cpt/sys.c
> > index 4051286..8334c4c 100644
> > --- a/cpt/sys.c
> > +++ b/cpt/sys.c
> > @@ -13,21 +13,142 @@
> > #include <linux/sched.h>
> > #include <linux/fs.h>
> > #include <linux/file.h>
> > -#include <linux/notifier.h>
> > +#include <linux/uaccess.h>
> > #include <linux/module.h>
> >
> > #include "cpt.h"
> > +#include "cpt_image.h"
> >
> > MODULE_LICENSE("GPL");
> >
> > +/* Debug level, constant for now */
> > +int debug_level = 1;
> > +
> > +static int file_write(const void *addr, size_t count, struct cpt_context
> > *ctx) +{
> > + mm_segment_t oldfs;
> > + ssize_t err = -EBADF;
> > + struct file *file = ctx->file;
> > +
> > + oldfs = get_fs(); set_fs(KERNEL_DS);
> > + if (file)
> > + err = file->f_op->write(file, addr, count, &file->f_pos);
> > + set_fs(oldfs);
> > + if (err != count)
> > + return err >= 0 ? -EIO : err;
> > + return 0;
> > +}
> > +
> > +static int file_read(void *addr, size_t count, struct cpt_context *ctx)
> > +{
> > + mm_segment_t oldfs;
> > + ssize_t err = -EBADF;
> > + struct file *file = ctx->file;
> > +
> > + oldfs = get_fs(); set_fs(KERNEL_DS);
> > + if (file)
> > + err = file->f_op->read(file, addr, count, &file->f_pos);
> > + set_fs(oldfs);
> > + if (err != count)
> > + return err >= 0 ? -EIO : err;
> > + return 0;
> > +}
> > +
> > +struct cpt_context * context_alloc(void)
> > +{
> > + struct cpt_context *ctx;
> > + int i;
> > +
> > + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> > + if (!ctx)
> > + return NULL;
> > +
> > + init_MUTEX(&ctx->main_sem);
> > + ctx->refcount = 1;
> > +
> > + ctx->current_object = -1;
> > + ctx->write = file_write;
> > + ctx->read = file_read;
> > + for (i = 0; i < CPT_OBJ_MAX; i++) {
> > + INIT_LIST_HEAD(&ctx->object_array[i]);
> > + }
> > +
> > + return ctx;
> > +}
> > +
> > +void context_release(struct cpt_context *ctx)
> > +{
> > + ctx->ctx_state = CPT_CTX_ERROR;
> > +
> > + if (ctx->file)
> > + fput(ctx->file);
> > + kfree(ctx);
> > +}
> > +
> > +static void context_put(struct cpt_context *ctx)
> > +{
> > + if (!--ctx->refcount)
> > + context_release(ctx);
> > +}
> > +
> > static int checkpoint(pid_t pid, int fd, unsigned long flags)
> > {
> > - return -ENOSYS;
> > + struct file *file;
> > + struct cpt_context *ctx;
> > + int err;
> > +
> > + err = -EBADF;
> > + file = fget(fd);
> > + if (!file)
> > + goto out;
> > +
> > + err = -ENOMEM;
> > + ctx = context_alloc();
> > + if (!ctx)
> > + goto out_file;
> > +
> > + ctx->file = file;
> > + ctx->ctx_state = CPT_CTX_DUMPING;
> > +
> > + /* checkpoint */
> > + err = -ENOSYS;
> > +
> > + context_put(ctx);
> > +
> > +out_file:
> > + fput(file);
> > +out:
> > + return err;
> > }
> >
> > static int restart(int ctid, int fd, unsigned long flags)
> > {
> > - return -ENOSYS;
> > + struct file *file;
> > + struct cpt_context *ctx;
> > + int err;
> > +
> > + err = -EBADF;
> > + file = fget(fd);
> > + if (!file)
> > + goto out;
> > +
> > + err = -ENOMEM;
> > + ctx = context_alloc();
> > + if (!ctx)
> > + goto out_file;
> > +
> > + ctx->file = file;
> > + ctx->ctx_state = CPT_CTX_UNDUMPING;
> > +
> > + /* restart */
> > + err = -ENOSYS;
> > +
> > + context_put(ctx);
> > +
> > +out_file:
> > + fput(file);
> > +out:
> > + return err;
> > }
> >
> > static int __init init_cptrst(void)
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/9] Introduce context structure needed during checkpointing/restart
2008-09-03 10:57 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Andrey Mirkin
` (2 preceding siblings ...)
2008-09-03 13:56 ` Louis Rilling
@ 2008-09-03 14:13 ` Cedric Le Goater
2008-09-03 14:29 ` Andrey Mirkin
3 siblings, 1 reply; 69+ messages in thread
From: Cedric Le Goater @ 2008-09-03 14:13 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
> +void context_release(struct cpt_context *ctx)
> +{
> + ctx->ctx_state = CPT_CTX_ERROR;
> +
> + if (ctx->file)
> + fput(ctx->file);
> + kfree(ctx);
> +}
> +
> +static void context_put(struct cpt_context *ctx)
> +{
> + if (!--ctx->refcount)
> + context_release(ctx);
> +}
> +
> static int checkpoint(pid_t pid, int fd, unsigned long flags)
> {
> - return -ENOSYS;
> + struct file *file;
> + struct cpt_context *ctx;
> + int err;
> +
> + err = -EBADF;
> + file = fget(fd);
> + if (!file)
> + goto out;
> +
> + err = -ENOMEM;
> + ctx = context_alloc();
> + if (!ctx)
> + goto out_file;
> +
> + ctx->file = file;
> + ctx->ctx_state = CPT_CTX_DUMPING;
> +
> + /* checkpoint */
> + err = -ENOSYS;
> +
> + context_put(ctx);
> +
> +out_file:
> + fput(file);
> +out:
> + return err;
> }
it looks like fput(file) is done twice in checkpoint() and context_release() ?
C.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 6/9] Introduce functions to dump mm
2008-09-03 10:57 ` [PATCH 6/9] Introduce functions to dump mm Andrey Mirkin
2008-09-03 10:57 ` [PATCH 7/9] Introduce function for restarting a container Andrey Mirkin
@ 2008-09-03 14:17 ` Louis Rilling
1 sibling, 0 replies; 69+ messages in thread
From: Louis Rilling @ 2008-09-03 14:17 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
[-- Attachment #1: Type: text/plain, Size: 11734 bytes --]
On Wed, Sep 03, 2008 at 02:57:53PM +0400, Andrey Mirkin wrote:
> Functions to dump mm struct, VMAs and mm context are added.
>
[...]
> diff --git a/cpt/cpt_mm.c b/cpt/cpt_mm.c
> new file mode 100644
> index 0000000..6e025cc
> --- /dev/null
> +++ b/cpt/cpt_mm.c
> @@ -0,0 +1,431 @@
> +/*
> + * Copyright (C) 2008 Parallels, Inc.
> + *
> + * Authors: Andrey Mirkin <major@openvz.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + */
> +
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/file.h>
> +#include <linux/mm.h>
> +#include <linux/errno.h>
> +#include <linux/major.h>
> +#include <linux/mman.h>
> +#include <linux/mnt_namespace.h>
> +#include <linux/mount.h>
> +#include <linux/namei.h>
> +#include <linux/pagemap.h>
> +#include <linux/hugetlb.h>
> +#include <asm/ldt.h>
> +
> +#include "cpt.h"
> +#include "cpt_image.h"
> +
> +struct page_area
> +{
> + int type;
> + unsigned long start;
> + unsigned long end;
> + pgoff_t pgoff;
> + loff_t mm;
> + __u64 list[16];
> +};
> +
> +struct page_desc
> +{
> + int type;
> + pgoff_t index;
> + loff_t mm;
> + int shared;
> +};
> +
> +enum {
> + PD_ABSENT,
> + PD_COPY,
> + PD_FUNKEY,
> +};
> +
> +/* 0: page can be obtained from backstore, or still not mapped anonymous page,
> + or something else, which does not requre copy.
> + 1: page requires copy
> + 2: page requres copy but its content is zero. Quite useless.
> + 3: wp page is shared after fork(). It is to be COWed when modified.
> + 4: page is something unsupported... We copy it right now.
> + */
> +
> +static void page_get_desc(struct vm_area_struct *vma, unsigned long addr,
> + struct page_desc *pdesc, cpt_context_t * ctx)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> + pgd_t *pgd;
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *ptep, pte;
> + spinlock_t *ptl;
> + struct page *pg = NULL;
> + pgoff_t linear_index = (addr - vma->vm_start)/PAGE_SIZE + vma->vm_pgoff;
> +
> + pdesc->index = linear_index;
> + pdesc->shared = 0;
> + pdesc->mm = CPT_NULL;
> +
> + if (vma->vm_flags & VM_IO) {
> + pdesc->type = PD_ABSENT;
> + return;
> + }
> +
> + pgd = pgd_offset(mm, addr);
> + if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
> + goto out_absent;
> + pud = pud_offset(pgd, addr);
> + if (pud_none(*pud) || unlikely(pud_bad(*pud)))
> + goto out_absent;
> + pmd = pmd_offset(pud, addr);
> + if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
> + goto out_absent;
> +#ifdef CONFIG_X86
> + if (pmd_huge(*pmd)) {
> + eprintk("page_huge\n");
> + goto out_unsupported;
> + }
> +#endif
> + ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
> + pte = *ptep;
> + pte_unmap(ptep);
> +
> + if (pte_none(pte))
> + goto out_absent_unlock;
> +
> + if ((pg = vm_normal_page(vma, addr, pte)) == NULL) {
> + pdesc->type = PD_COPY;
> + goto out_unlock;
> + }
> +
> + get_page(pg);
> + spin_unlock(ptl);
> +
> + if (pg->mapping && !PageAnon(pg)) {
> + if (vma->vm_file == NULL) {
> + eprintk("pg->mapping!=NULL for fileless vma: %08lx\n", addr);
> + goto out_unsupported;
> + }
> + if (vma->vm_file->f_mapping != pg->mapping) {
> + eprintk("pg->mapping!=f_mapping: %08lx %p %p\n",
> + addr, vma->vm_file->f_mapping, pg->mapping);
> + goto out_unsupported;
> + }
> + pdesc->index = (pg->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT));
> + /* Page is in backstore. For us it is like
> + * it is not present.
> + */
> + goto out_absent;
> + }
> +
> + if (PageReserved(pg)) {
> + /* Special case: ZERO_PAGE is used, when an
> + * anonymous page is accessed but not written. */
> + if (pg == ZERO_PAGE(addr)) {
> + if (pte_write(pte)) {
> + eprintk("not funny already, writable ZERO_PAGE\n");
> + goto out_unsupported;
> + }
> + goto out_absent;
> + }
> + eprintk("reserved page %lu at %08lx\n", pg->index, addr);
> + goto out_unsupported;
> + }
> +
> + if (!pg->mapping) {
> + eprintk("page without mapping at %08lx\n", addr);
> + goto out_unsupported;
> + }
> +
> + pdesc->type = PD_COPY;
> +
> +out_put:
> + if (pg)
> + put_page(pg);
> + return;
> +
> +out_unlock:
> + spin_unlock(ptl);
> + goto out_put;
> +
> +out_absent_unlock:
> + spin_unlock(ptl);
> +
> +out_absent:
> + pdesc->type = PD_ABSENT;
> + goto out_put;
> +
> +out_unsupported:
> + pdesc->type = PD_FUNKEY;
> + goto out_put;
> +}
> +
> +static int count_vma_pages(struct vm_area_struct *vma, struct cpt_context *ctx)
> +{
> + unsigned long addr;
> + int page_num = 0;
> +
> + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> + struct page_desc pd;
> +
> + page_get_desc(vma, addr, &pd, ctx);
> +
> + if (pd.type != PD_COPY) {
> + return -EINVAL;
> + } else {
> + page_num += 1;
> + }
> +
> + }
> + return page_num;
> +}
> +
> +/* ATTN: We give "current" to get_user_pages(). This is wrong, but get_user_pages()
> + * does not really need this thing. It just stores some page fault stats there.
> + *
> + * BUG: some archs (f.e. sparc64, but not Intel*) require flush cache pages
> + * before accessing vma.
> + */
> +static int dump_pages(struct vm_area_struct *vma, unsigned long start,
> + unsigned long end, struct cpt_context *ctx)
> +{
> +#define MAX_PAGE_BATCH 16
> + struct page *pg[MAX_PAGE_BATCH];
> + int npages = (end - start)/PAGE_SIZE;
> + int count = 0;
> +
> + while (count < npages) {
> + int copy = npages - count;
> + int n;
> +
> + if (copy > MAX_PAGE_BATCH)
> + copy = MAX_PAGE_BATCH;
> + n = get_user_pages(current, vma->vm_mm, start, copy,
> + 0, 1, pg, NULL);
> + if (n == copy) {
> + int i;
> + for (i=0; i<n; i++) {
> + char *maddr = kmap(pg[i]);
> + ctx->write(maddr, PAGE_SIZE, ctx);
> + kunmap(pg[i]);
> + }
> + } else {
> + eprintk("get_user_pages fault");
> + for ( ; n > 0; n--)
> + page_cache_release(pg[n-1]);
> + return -EFAULT;
> + }
> + start += n*PAGE_SIZE;
> + count += n;
> + for ( ; n > 0; n--)
> + page_cache_release(pg[n-1]);
> + }
> + return 0;
> +}
> +
> +static int dump_page_block(struct vm_area_struct *vma,
> + struct cpt_page_block *pgb,
> + struct cpt_context *ctx)
> +{
> + int err;
> + pgb->cpt_len = sizeof(*pgb) + pgb->cpt_end - pgb->cpt_start;
> + pgb->cpt_type = CPT_OBJ_PAGES;
> + pgb->cpt_hdrlen = sizeof(*pgb);
> + pgb->cpt_content = CPT_CONTENT_DATA;
> +
> + err = ctx->write(pgb, sizeof(*pgb), ctx);
> + if (!err)
> + err = dump_pages(vma, pgb->cpt_start, pgb->cpt_end, ctx);
> +
> + return err;
> +}
> +
> +static int cpt_dump_dentry(struct path *p, cpt_context_t *ctx)
> +{
> + int len;
> + char *path;
> + char *buf;
> + struct cpt_object_hdr o;
> +
> + buf = (char *)__get_free_page(GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + path = d_path(p, buf, PAGE_SIZE);
> +
> + if (IS_ERR(path)) {
> + free_page((unsigned long)buf);
> + return PTR_ERR(path);
> + }
> +
> + len = buf + PAGE_SIZE - 1 - path;
> + o.cpt_len = sizeof(o) + len + 1;
> + o.cpt_type = CPT_OBJ_NAME;
> + o.cpt_hdrlen = sizeof(o);
> + o.cpt_content = CPT_CONTENT_NAME;
> + path[len] = 0;
> +
> + ctx->write(&o, sizeof(o), ctx);
> + ctx->write(path, len + 1, ctx);
> + free_page((unsigned long)buf);
> +
> + return 0;
> +}
> +
> +static int dump_one_vma(struct mm_struct *mm,
> + struct vm_area_struct *vma, struct cpt_context *ctx)
> +{
> + struct cpt_vma_image *v;
> + unsigned long addr;
> + int page_num;
> + int err;
> +
> + v = kzalloc(sizeof(*v), GFP_KERNEL);
> + if (!v)
> + return -ENOMEM;
> +
> + v->cpt_len = sizeof(*v);
> + v->cpt_type = CPT_OBJ_VMA;
> + v->cpt_hdrlen = sizeof(*v);
> + v->cpt_content = CPT_CONTENT_ARRAY;
> +
> + v->cpt_start = vma->vm_start;
> + v->cpt_end = vma->vm_end;
> + v->cpt_flags = vma->vm_flags;
> + if (vma->vm_flags & VM_HUGETLB) {
> + eprintk("huge TLB VMAs are still not supported\n");
> + kfree(v);
> + return -EINVAL;
> + }
> + v->cpt_pgprot = vma->vm_page_prot.pgprot;
> + v->cpt_pgoff = vma->vm_pgoff;
> + v->cpt_file = CPT_NULL;
> + v->cpt_vma_type = CPT_VMA_TYPE_0;
> +
> + page_num = count_vma_pages(vma, ctx);
> + if (page_num < 0) {
> + kfree(v);
> + return -EINVAL;
> + }
> + v->cpt_page_num = page_num;
> +
> + if (vma->vm_file) {
> + v->cpt_file = 0;
> + v->cpt_vma_type = CPT_VMA_FILE;
> + }
> +
> + ctx->write(v, sizeof(*v), ctx);
> + kfree(v);
> +
> + if (vma->vm_file) {
> + err = cpt_dump_dentry(&vma->vm_file->f_path, ctx);
> + if (err < 0)
> + return err;
> + }
> +
> + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
> + struct page_desc pd;
> + struct cpt_page_block pgb;
> +
> + page_get_desc(vma, addr, &pd, ctx);
> +
> + if (pd.type == PD_FUNKEY || pd.type == PD_ABSENT) {
> + eprintk("dump_one_vma: funkey page\n");
> + return -EINVAL;
IIUC, as soon as ZERO_PAGE is hit, checkpoint aborts. I doubt that many
applications will be checkpointable with this limitation.
> + }
> +
> + pgb.cpt_start = addr;
> + pgb.cpt_end = addr + PAGE_SIZE;
> + dump_page_block(vma, &pgb, ctx);
> + }
> +
> + return 0;
> +}
> +
> +static int cpt_dump_mm_context(struct mm_struct *mm, struct cpt_context *ctx)
> +{
> +#ifdef CONFIG_X86
> + if (mm->context.size) {
> + struct cpt_obj_bits b;
> + int size;
> +
> + mutex_lock(&mm->context.lock);
> +
> + b.cpt_type = CPT_OBJ_BITS;
> + b.cpt_len = sizeof(b);
> + b.cpt_content = CPT_CONTENT_MM_CONTEXT;
> + b.cpt_size = mm->context.size * LDT_ENTRY_SIZE;
> +
> + ctx->write(&b, sizeof(b), ctx);
> +
> + size = mm->context.size * LDT_ENTRY_SIZE;
> +
> + ctx->write(mm->context.ldt, size, ctx);
> +
> + mutex_unlock(&mm->context.lock);
> + }
> +#endif
> + return 0;
> +}
> +
> +int cpt_dump_mm(struct task_struct *tsk, struct cpt_context *ctx)
> +{
> + struct mm_struct *mm = tsk->mm;
> + struct cpt_mm_image *v;
> + struct vm_area_struct *vma;
> + int err;
> +
> + v = kzalloc(sizeof(*v), GFP_KERNEL);
> + if (!v)
> + return -ENOMEM;
> +
> + v->cpt_len = sizeof(*v);
> + v->cpt_type = CPT_OBJ_MM;
> + v->cpt_hdrlen = sizeof(*v);
> + v->cpt_content = CPT_CONTENT_ARRAY;
> +
> + v->cpt_start_code = mm->start_code;
> + v->cpt_end_code = mm->end_code;
> + v->cpt_start_data = mm->start_data;
> + v->cpt_end_data = mm->end_data;
> + v->cpt_start_brk = mm->start_brk;
> + v->cpt_brk = mm->brk;
> + v->cpt_start_stack = mm->start_stack;
> + v->cpt_start_arg = mm->arg_start;
> + v->cpt_end_arg = mm->arg_end;
> + v->cpt_start_env = mm->env_start;
> + v->cpt_end_env = mm->env_end;
> + v->cpt_def_flags = mm->def_flags;
> + v->cpt_flags = mm->flags;
> + v->cpt_map_count = mm->map_count;
> +
> + err = ctx->write(v, sizeof(*v), ctx);
> + kfree(v);
> +
> + if (err) {
> + eprintk("error during writing mm\n");
> + return err;
> + }
> +
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
> + int err;
> +
> + if ((err = dump_one_vma(mm, vma, ctx)) != 0)
> + return err;
> + }
Why not down_reading mm->mmap_sem here, while it is properly down_written during
restart? Is it because we can rely on the tasks being frozen during checkpoint,
and not during restart?
Thanks,
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 12:28 ` [PATCH 0/9] OpenVZ kernel based checkpointing/restart Cedric Le Goater
2008-09-03 13:59 ` [Devel] " Andrey Mirkin
@ 2008-09-03 14:18 ` Serge E. Hallyn
1 sibling, 0 replies; 69+ messages in thread
From: Serge E. Hallyn @ 2008-09-03 14:18 UTC (permalink / raw)
To: Cedric Le Goater; +Cc: Andrey Mirkin, containers, linux-kernel
Quoting Cedric Le Goater (clg@fr.ibm.com):
> Andrey Mirkin wrote:
> > This patchset introduces kernel based checkpointing/restart as it is
> > implemented in OpenVZ project. This patchset has limited functionality and
> > are able to checkpoint/restart only single process. Recently Oren Laaden
> > sent another kernel based implementation of checkpoint/restart. The main
> > differences between this patchset and Oren's patchset are:
> >
> > * In this patchset checkpointing initiated not from the process
> > (right now we do not have a container, only namespaces), Oren's patchset
> > performs checkpointing from the process context.
> >
> > * Restart in this patchset is initiated from process, which restarts a new
> > process (in new namespaces) with saved state. Oren's patchset uses the same
> > process from which restart was initiated and restore saved state over it.
> >
> > * Checkpoint/restart functionality in this patchset is implemented as a kernel
> > module
>
> why ? Do we really think that C/R implementations will be so different that
> we will need C/R ops to support all of them ? I imagine that there could be
> different models :
At the mini-summit two reasons were brought up to make it a module:
1. So sysadmins worried about security implications can
completely unload the module
2. So developers can unload and reload the module while
testing.
> 1. brute force : dump it all and kill
> 2. incremental
> 3. live migration
> ...
Actually I don't think we expected to use different implementations for
those.
> But I see all of them really tied to the kernel internals.
>
> The first issues I see with this direction are some EXPORT_SYMBOL() that would
> be useless without a module.
>
> > As checkpointing is initiated not from the process which state should be saved
> > we should freeze a process before saving its state. Right now Container Freezer
> > from Matt Helsley can be used for this.
>
> OK that's integrated and Daniel's tools :
>
> http://lxc.cvs.sourceforge.net/lxc/
>
> one more reason to work on integration :)
>
> C.
>
>
> > This patchset introduce only a concept how kernel based checkpointing/restart
> > can be implemented and are able to checkpoint/restart only a single process
> > with simple VMAs.
> >
> > I've tried to split my patchset in small patches to make review more easier.
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/containers
> >
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 14:06 ` Louis Rilling
@ 2008-09-03 14:19 ` Andrey Mirkin
2008-09-03 14:26 ` Cedric Le Goater
1 sibling, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 14:19 UTC (permalink / raw)
To: Louis.Rilling; +Cc: containers, linux-kernel
On Wednesday 03 September 2008 18:06 Louis Rilling wrote:
> On Wed, Sep 03, 2008 at 03:49:51PM +0200, Louis Rilling wrote:
> > On Wed, Sep 03, 2008 at 02:57:47PM +0400, Andrey Mirkin wrote:
> > > This patchset introduces kernel based checkpointing/restart as it is
> > > implemented in OpenVZ project. This patchset has limited functionality
> > > and are able to checkpoint/restart only single process. Recently Oren
> > > Laaden sent another kernel based implementation of checkpoint/restart.
> > > The main differences between this patchset and Oren's patchset are:
> > >
> > > * In this patchset checkpointing initiated not from the process
> > > (right now we do not have a container, only namespaces), Oren's
> > > patchset performs checkpointing from the process context.
> > >
> > > * Restart in this patchset is initiated from process, which restarts a
> > > new process (in new namespaces) with saved state. Oren's patchset uses
> > > the same process from which restart was initiated and restore saved
> > > state over it.
> > >
> > > * Checkpoint/restart functionality in this patchset is implemented as a
> > > kernel module
> > >
> > >
> > > As checkpointing is initiated not from the process which state should
> > > be saved we should freeze a process before saving its state. Right now
> > > Container Freezer from Matt Helsley can be used for this.
> > >
> > > This patchset introduce only a concept how kernel based
> > > checkpointing/restart can be implemented and are able to
> > > checkpoint/restart only a single process with simple VMAs.
> > >
> > > I've tried to split my patchset in small patches to make review more
> > > easier.
> >
> > Thank you Andrey for having highlighted the differences with Oren's
> > approach, and for having split this patchset. Few remarks in reply to the
> > patches.
>
> Forgot a global comment: you will probably get the same (rather pointless
> for a proof of concept, IMHO) requests as Oren got, to
> 1) improve coding style:
> a) especially avoid error handling like:
> err = foo();
> if (!err)
> err = bar();
> if (!err)
> err = baz();
>
> and prefer
> err = foo();
> if (err)
> goto foo_err;
> err = bar();
> ...
>
> b) do not write conditions on a single line, like
> if (foo) bar;
>
> 2) put arch-dependent code in arch/ subdirs.
> 3) I probably forgot other ones.
>
> I obviously do not personally request you to take these requests into
> account ;)
Thanks for comments.
Unfortunately I have missed 2-3 weeks of discussion of Oren's patches.
Right now I'm reading all threads devoted to checkpointing.
I'll take into account you comments and I'll try to fix all comments in next
version.
Thanks,
Andrey
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 4/9] Introduce container dump function
2008-09-03 10:57 ` [PATCH 4/9] Introduce container dump function Andrey Mirkin
2008-09-03 10:57 ` [PATCH 5/9] Introduce function to dump process Andrey Mirkin
@ 2008-09-03 14:23 ` Serge E. Hallyn
2008-09-03 14:45 ` Andrey Mirkin
1 sibling, 1 reply; 69+ messages in thread
From: Serge E. Hallyn @ 2008-09-03 14:23 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
Quoting Andrey Mirkin (major@openvz.org):
> Actually right now we are going to dump only one process.
> Function for dumping head of image file are added.
>
> Signed-off-by: Andrey Mirkin <major@openvz.org>
> ---
> cpt/Makefile | 2 +-
> cpt/checkpoint.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> cpt/cpt.h | 3 ++
> cpt/sys.c | 3 +-
> kernel/fork.c | 2 +
> 5 files changed, 82 insertions(+), 2 deletions(-)
> create mode 100644 cpt/checkpoint.c
>
> diff --git a/cpt/Makefile b/cpt/Makefile
> index bfe75d5..173346b 100644
> --- a/cpt/Makefile
> +++ b/cpt/Makefile
> @@ -2,4 +2,4 @@ obj-y += sys_core.o
>
> obj-$(CONFIG_CHECKPOINT) += cptrst.o
>
> -cptrst-objs := sys.o
> +cptrst-objs := sys.o checkpoint.o
> diff --git a/cpt/checkpoint.c b/cpt/checkpoint.c
> new file mode 100644
> index 0000000..b4d9686
> --- /dev/null
> +++ b/cpt/checkpoint.c
> @@ -0,0 +1,74 @@
> +/*
> + * Copyright (C) 2008 Parallels, Inc.
> + *
> + * Author: Andrey Mirkin <major@openvz.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + */
> +
> +#include <linux/sched.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include <linux/version.h>
> +
> +#include "cpt.h"
> +
> +static int cpt_write_head(struct cpt_context *ctx)
> +{
> + struct cpt_head hdr;
> +
> + memset(&hdr, 0, sizeof(hdr));
> + hdr.cpt_signature[0] = CPT_SIGNATURE0;
> + hdr.cpt_signature[1] = CPT_SIGNATURE1;
> + hdr.cpt_signature[2] = CPT_SIGNATURE2;
> + hdr.cpt_signature[3] = CPT_SIGNATURE3;
> + hdr.cpt_hdrlen = sizeof(hdr);
> + hdr.cpt_image_major = (LINUX_VERSION_CODE >> 16) & 0xff;
> + hdr.cpt_image_minor = (LINUX_VERSION_CODE >> 8) & 0xff;
> + hdr.cpt_image_sublevel = (LINUX_VERSION_CODE) & 0xff;
> + hdr.cpt_image_extra = 0;
> +#if defined(CONFIG_X86_32)
> + hdr.cpt_arch = CPT_ARCH_I386;
> +#else
> +#error Arch is not supported
> +#endif
> + return ctx->write(&hdr, sizeof(hdr), ctx);
> +}
> +
> +int dump_container(struct cpt_context *ctx)
> +{
> + int err;
> + struct task_struct *root;
> +
> + read_lock(&tasklist_lock);
> + root = find_task_by_vpid(ctx->pid);
> + if (root)
> + get_task_struct(root);
> + read_unlock(&tasklist_lock);
> +
> + err = -ESRCH;
> + if (!root) {
> + eprintk("can not find root task\n");
> + return err;
> + }
> + ctx->nsproxy = root->nsproxy;
> + if (!ctx->nsproxy) {
> + eprintk("nsproxy is null\n");
> + goto out;
> + }
The get_task_struct() above won't pin the tsk->nsproxy
though, will it? So should you be doing a
rcu_read_lock();
nsproxy = get_task_nsproxy(root);
rcu_read_unlock();
to make sure the nsproxy doesn't disappear out from under
you?
(Maybe you do it in a later patch...)
thanks,
-serge
> + err = cpt_write_head(ctx);
> +
> + /* Dump task here */
> + if (!err)
> + err = -ENOSYS;
> +
> +out:
> + ctx->nsproxy = NULL;
> + put_task_struct(root);
> + return err;
> +}
> diff --git a/cpt/cpt.h b/cpt/cpt.h
> index 607ac1b..b421a11 100644
> --- a/cpt/cpt.h
> +++ b/cpt/cpt.h
> @@ -33,6 +33,7 @@ typedef struct cpt_context
> int refcount;
> int ctx_state;
> struct semaphore main_sem;
> + struct nsproxy *nsproxy;
>
> int errno;
>
> @@ -54,3 +55,5 @@ extern int debug_level;
>
> #define eprintk(a...) cpt_printk(1, "CPT ERR: " a)
> #define dprintk(a...) cpt_printk(1, "CPT DBG: " a)
> +
> +int dump_container(struct cpt_context *ctx);
> diff --git a/cpt/sys.c b/cpt/sys.c
> index 8334c4c..6801c22 100644
> --- a/cpt/sys.c
> +++ b/cpt/sys.c
> @@ -109,9 +109,10 @@ static int checkpoint(pid_t pid, int fd, unsigned long flags)
>
> ctx->file = file;
> ctx->ctx_state = CPT_CTX_DUMPING;
> + ctx->pid = pid;
>
> /* checkpoint */
> - err = -ENOSYS;
> + err = dump_container(ctx);
>
> context_put(ctx);
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 52b5037..f38b43d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -77,6 +77,7 @@ int max_threads; /* tunable limit on nr_threads */
> DEFINE_PER_CPU(unsigned long, process_counts) = 0;
>
> __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
> +EXPORT_SYMBOL(tasklist_lock);
>
> int nr_processes(void)
> {
> @@ -153,6 +154,7 @@ void __put_task_struct(struct task_struct *tsk)
> if (!profile_handoff_task(tsk))
> free_task(tsk);
> }
> +EXPORT_SYMBOL(__put_task_struct);
>
> /*
> * macro override instead of weak attribute alias, to workaround
> --
> 1.5.6
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 14:06 ` Louis Rilling
2008-09-03 14:19 ` Andrey Mirkin
@ 2008-09-03 14:26 ` Cedric Le Goater
2008-09-03 14:53 ` Andrey Mirkin
1 sibling, 1 reply; 69+ messages in thread
From: Cedric Le Goater @ 2008-09-03 14:26 UTC (permalink / raw)
To: Louis.Rilling; +Cc: Andrey Mirkin, containers, linux-kernel
> Forgot a global comment: you will probably get the same (rather pointless for a
> proof of concept, IMHO) requests as Oren got, to
> 1) improve coding style:
> a) especially avoid error handling like:
> err = foo();
> if (!err)
> err = bar();
> if (!err)
> err = baz();
>
> and prefer
> err = foo();
> if (err)
> goto foo_err;
> err = bar();
> ...
>
> b) do not write conditions on a single line, like
> if (foo) bar;
>
> 2) put arch-dependent code in arch/ subdirs.
> 3) I probably forgot other ones.
./scripts/checkpatch.pl catches most of it (and it does catch a few on your
patches)
but Dave is even better ! :)
Cheers,
C.
> I obviously do not personally request you to take these requests into account ;)
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2/9] Make checkpoint/restart functionality modular
2008-09-03 10:57 ` [PATCH 2/9] Make checkpoint/restart functionality modular Andrey Mirkin
2008-09-03 10:57 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Andrey Mirkin
@ 2008-09-03 14:27 ` Serge E. Hallyn
2008-09-03 14:51 ` Andrey Mirkin
1 sibling, 1 reply; 69+ messages in thread
From: Serge E. Hallyn @ 2008-09-03 14:27 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
Quoting Andrey Mirkin (major@openvz.org):
> A config option CONFIG_CHECKPOINT is introduced.
> New structure cpt_operations is introduced to store pointers to
> checkpoint/restart functions from module.
Thanks Andrey. The structure of this patchset is very nice, and I'm
finding it very easy to read, at least the first half.
The differences (in yours and Oren's) between checkpointing and
restarting yourself or another task should probably not be important
as people will want to be able to do both in the end, and I'm sure
both patchsets can do both. And you have things like your own
printk helpers in patch 3 that, just as in Oren's set, will have to
go away. But I'm not yet able to see what the real differences are.
Keeping on looking...
> Signed-off-by: Andrey Mirkin <major@openvz.org>
> ---
> cpt/Kconfig | 7 +++++++
> cpt/Makefile | 4 ++++
> cpt/cpt.h | 19 +++++++++++++++++++
> cpt/sys.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> cpt/sys_core.c | 29 +++++++++++++++++++++++++++--
> init/Kconfig | 2 ++
> 6 files changed, 107 insertions(+), 2 deletions(-)
> create mode 100644 cpt/Kconfig
> create mode 100644 cpt/cpt.h
> create mode 100644 cpt/sys.c
>
> diff --git a/cpt/Kconfig b/cpt/Kconfig
> new file mode 100644
> index 0000000..b9bc72d
> --- /dev/null
> +++ b/cpt/Kconfig
> @@ -0,0 +1,7 @@
> +config CHECKPOINT
> + tristate "Checkpoint & restart for containers"
> + depends on EXPERIMENTAL
> + default n
> + help
> + This option adds module "cptrst", which allow to save a running
> + container to a file and restart it later using this image file.
> diff --git a/cpt/Makefile b/cpt/Makefile
> index 2276fb1..bfe75d5 100644
> --- a/cpt/Makefile
> +++ b/cpt/Makefile
> @@ -1 +1,5 @@
> obj-y += sys_core.o
> +
> +obj-$(CONFIG_CHECKPOINT) += cptrst.o
> +
> +cptrst-objs := sys.o
> diff --git a/cpt/cpt.h b/cpt/cpt.h
> new file mode 100644
> index 0000000..381a9bf
> --- /dev/null
> +++ b/cpt/cpt.h
> @@ -0,0 +1,19 @@
> +/*
> + * Copyright (C) 2008 Parallels, Inc.
> + *
> + * Author: Andrey Mirkin <major@openvz.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + */
> +
> +struct cpt_operations
> +{
> + struct module * owner;
> + int (*checkpoint)(pid_t pid, int fd, unsigned long flags);
> + int (*restart)(int ctid, int fd, unsigned long flags);
> +};
> +extern struct cpt_operations cpt_ops;
> diff --git a/cpt/sys.c b/cpt/sys.c
> new file mode 100644
> index 0000000..4051286
> --- /dev/null
> +++ b/cpt/sys.c
> @@ -0,0 +1,48 @@
> +/*
> + * Copyright (C) 2008 Parallels, Inc.
> + *
> + * Author: Andrey Mirkin <major@openvz.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + */
> +
> +#include <linux/sched.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include <linux/notifier.h>
> +#include <linux/module.h>
> +
> +#include "cpt.h"
> +
> +MODULE_LICENSE("GPL");
> +
> +static int checkpoint(pid_t pid, int fd, unsigned long flags)
> +{
> + return -ENOSYS;
> +}
> +
> +static int restart(int ctid, int fd, unsigned long flags)
> +{
> + return -ENOSYS;
> +}
> +
> +static int __init init_cptrst(void)
> +{
> + cpt_ops.owner = THIS_MODULE;
> + cpt_ops.checkpoint = checkpoint;
> + cpt_ops.restart = restart;
> + return 0;
> +}
> +module_init(init_cptrst);
> +
> +static void __exit exit_cptrst(void)
> +{
> + cpt_ops.checkpoint = NULL;
> + cpt_ops.restart = NULL;
> + cpt_ops.owner = NULL;
> +}
> +module_exit(exit_cptrst);
> diff --git a/cpt/sys_core.c b/cpt/sys_core.c
> index 1a97fb6..5dd1191 100644
> --- a/cpt/sys_core.c
> +++ b/cpt/sys_core.c
> @@ -13,6 +13,13 @@
> #include <linux/sched.h>
> #include <linux/fs.h>
> #include <linux/file.h>
> +#include <linux/notifier.h>
> +#include <linux/module.h>
> +
> +#include "cpt.h"
> +
> +struct cpt_operations cpt_ops = { NULL, NULL, NULL };
> +EXPORT_SYMBOL(cpt_ops);
>
> /**
> * sys_checkpoint - checkpoint a container from outside
> @@ -23,7 +30,16 @@
> */
> asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
> {
> - return -ENOSYS;
> + int ret;
> +
> + ret = -ENOSYS;
> +
> + if (try_module_get(cpt_ops.owner)) {
> + if (cpt_ops.checkpoint)
> + ret = cpt_ops.checkpoint(pid, fd, flags);
> + module_put(cpt_ops.owner);
> + }
> + return ret;
> }
>
> /**
> @@ -34,5 +50,14 @@ asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
> */
> asmlinkage long sys_restart(int ctid, int fd, unsigned long flags)
> {
> - return -ENOSYS;
> + int ret;
> +
> + ret = -ENOSYS;
> +
> + if (try_module_get(cpt_ops.owner)) {
> + if (cpt_ops.restart)
> + ret = cpt_ops.restart(ctid, fd, flags);
> + module_put(cpt_ops.owner);
> + }
> + return ret;
> }
> diff --git a/init/Kconfig b/init/Kconfig
> index 4bd4b0c..d29ed21 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -344,6 +344,8 @@ config CGROUP_FREEZER
> Provides a way to freeze and unfreeze all tasks in a
> cgroup
>
> +source "cpt/Kconfig"
> +
> config FAIR_GROUP_SCHED
> bool "Group scheduling for SCHED_OTHER"
> depends on GROUP_SCHED
> --
> 1.5.6
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 3/9] Introduce context structure needed during checkpointing/restart
2008-09-03 14:13 ` Cedric Le Goater
@ 2008-09-03 14:29 ` Andrey Mirkin
0 siblings, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 14:29 UTC (permalink / raw)
To: Cedric Le Goater; +Cc: linux-kernel, containers
On Wednesday 03 September 2008 18:13 Cedric Le Goater wrote:
> > +void context_release(struct cpt_context *ctx)
> > +{
> > + ctx->ctx_state = CPT_CTX_ERROR;
> > +
> > + if (ctx->file)
> > + fput(ctx->file);
> > + kfree(ctx);
> > +}
> > +
> > +static void context_put(struct cpt_context *ctx)
> > +{
> > + if (!--ctx->refcount)
> > + context_release(ctx);
> > +}
> > +
> > static int checkpoint(pid_t pid, int fd, unsigned long flags)
> > {
> > - return -ENOSYS;
> > + struct file *file;
> > + struct cpt_context *ctx;
> > + int err;
> > +
> > + err = -EBADF;
> > + file = fget(fd);
> > + if (!file)
> > + goto out;
> > +
> > + err = -ENOMEM;
> > + ctx = context_alloc();
> > + if (!ctx)
> > + goto out_file;
> > +
> > + ctx->file = file;
> > + ctx->ctx_state = CPT_CTX_DUMPING;
> > +
> > + /* checkpoint */
> > + err = -ENOSYS;
> > +
> > + context_put(ctx);
> > +
> > +out_file:
> > + fput(file);
> > +out:
> > + return err;
> > }
>
> it looks like fput(file) is done twice in checkpoint() and
> context_release() ?
Oh, you are right. I'm sorry, I was in rush and sent an outdated version.
Will resend correct patch shortly.
Thanks,
Andrey
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 8/9] Introduce functions to restart a process
2008-09-03 10:57 ` [PATCH 8/9] Introduce functions to restart a process Andrey Mirkin
2008-09-03 10:57 ` [PATCH 9/9] Introduce functions to restore mm Andrey Mirkin
@ 2008-09-03 14:32 ` Louis Rilling
2008-09-13 17:34 ` Pavel Machek
1 sibling, 1 reply; 69+ messages in thread
From: Louis Rilling @ 2008-09-03 14:32 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
[-- Attachment #1: Type: text/plain, Size: 10644 bytes --]
On Wed, Sep 03, 2008 at 02:57:55PM +0400, Andrey Mirkin wrote:
> Functions to restart process, restore its state, fpu and registers are added.
>
> Signed-off-by: Andrey Mirkin <major@openvz.org>
> ---
> arch/x86/kernel/entry_32.S | 21 +++
> arch/x86/kernel/process_32.c | 3 +
> cpt/Makefile | 3 +-
> cpt/cpt.h | 2 +
> cpt/restart.c | 2 +-
> cpt/rst_process.c | 277 ++++++++++++++++++++++++++++++++++++++++++
> kernel/sched.c | 1 +
> 7 files changed, 307 insertions(+), 2 deletions(-)
> create mode 100644 cpt/rst_process.c
>
> diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
> index 109792b..a4848a3 100644
> --- a/arch/x86/kernel/entry_32.S
> +++ b/arch/x86/kernel/entry_32.S
> @@ -225,6 +225,7 @@ ENTRY(ret_from_fork)
> GET_THREAD_INFO(%ebp)
> popl %eax
> CFI_ADJUST_CFA_OFFSET -4
> +ret_from_fork_tail:
> pushl $0x0202 # Reset kernel eflags
> CFI_ADJUST_CFA_OFFSET 4
> popfl
> @@ -233,6 +234,26 @@ ENTRY(ret_from_fork)
> CFI_ENDPROC
> END(ret_from_fork)
>
> +ENTRY(i386_ret_from_resume)
> + CFI_STARTPROC
> + pushl %eax
> + CFI_ADJUST_CFA_OFFSET 4
> + call schedule_tail
> + GET_THREAD_INFO(%ebp)
> + popl %eax
> + CFI_ADJUST_CFA_OFFSET -4
> + movl (%esp), %eax
> + testl %eax, %eax
> + jz 1f
> + pushl %esp
> + call *%eax
> + addl $4, %esp
> +1:
> + addl $256, %esp
This space reserved for hooks deserves some explanations IMO. I partly know the
reasons, since I have looked at OpenVZ code with you, but I'm sure that almost
nobody knows...
Btw, the constant HOOK_RESERVE should be defined in a common place for assembly
and C as well.
> + jmp ret_from_fork_tail
> + CFI_ENDPROC
> +END(i386_ret_from_resume)
> +
> /*
> * Return to user mode is not as complex as all this looks,
> * but we want the default path for a system call return to
[...]
> diff --git a/cpt/rst_process.c b/cpt/rst_process.c
> new file mode 100644
> index 0000000..6d47f3c
> --- /dev/null
> +++ b/cpt/rst_process.c
> @@ -0,0 +1,277 @@
> +/*
> + * Copyright (C) 2008 Parallels, Inc.
> + *
> + * Author: Andrey Mirkin <major@openvz.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation, version 2 of the
> + * License.
> + *
> + */
> +
> +#include <linux/sched.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include <linux/version.h>
> +#include <linux/module.h>
> +
> +#include "cpt.h"
> +#include "cpt_image.h"
> +
> +#define HOOK_RESERVE 256
> +
> +struct thr_context {
> + struct completion complete;
> + int error;
> + struct cpt_context *ctx;
> + struct task_struct *tsk;
> +};
> +
> +int local_kernel_thread(int (*fn)(void *), void * arg, unsigned long flags, pid_t pid)
> +{
> + pid_t ret;
> +
> + if (current->fs == NULL) {
> + /* do_fork_pid() hates processes without fs, oopses. */
> + eprintk("local_kernel_thread: current->fs==NULL\n");
> + return -EINVAL;
> + }
> + if (!try_module_get(THIS_MODULE))
> + return -EBUSY;
Doing try_module_get(THIS_MODULE) and checking its result is almost always the
symptom of a BUG. I explain myself:
If it is possible that THIS_MODULE is currently unloading, its memory
(especially its text section) can be freed at any time during the call to
local_kernel_thread(), which leads to a beautiful OOPS.
So, either the code is buggy, or this check is useless. I agree however that the
module ref count must be incremented, since the new thread will run code from
it.
> + ret = kernel_thread(fn, arg, flags);
> + if (ret < 0)
> + module_put(THIS_MODULE);
> + return ret;
> +}
> +
> +static unsigned int decode_task_flags(unsigned int task_flags)
> +{
> + unsigned int flags = 0;
> +
> + if (task_flags & (1 << CPT_PF_EXITING))
> + flags |= PF_EXITING;
> + if (task_flags & (1 << CPT_PF_FORKNOEXEC))
> + flags |= PF_FORKNOEXEC;
> + if (task_flags & (1 << CPT_PF_SUPERPRIV))
> + flags |= PF_SUPERPRIV;
> + if (task_flags & (1 << CPT_PF_DUMPCORE))
> + flags |= PF_DUMPCORE;
> + if (task_flags & (1 << CPT_PF_SIGNALED))
> + flags |= PF_SIGNALED;
> +
> + return flags;
> +
> +}
> +
> +int rst_restore_task_struct(struct task_struct *tsk, struct cpt_task_image *ti,
> + struct cpt_context *ctx)
> +{
> + int i;
> +
> + /* Restore only saved flags, comm and tls for now */
> + tsk->flags = decode_task_flags(ti->cpt_flags);
> + clear_tsk_thread_flag(tsk, TIF_FREEZE);
> + memcpy(tsk->comm, ti->cpt_comm, TASK_COMM_LEN);
> + for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++) {
> + tsk->thread.tls_array[i].a = ti->cpt_tls[i] & 0xFFFFFFFF;
> + tsk->thread.tls_array[i].b = ti->cpt_tls[i] >> 32;
> + }
> +
> + return 0;
> +}
> +
> +static int rst_restore_fpustate(struct task_struct *tsk, struct cpt_task_image *ti,
> + struct cpt_context *ctx)
> +{
> + struct cpt_obj_bits hdr;
> + int err;
> + char *buf;
> +
> + clear_stopped_child_used_math(tsk);
> +
> + err = rst_get_object(CPT_OBJ_BITS, &hdr, sizeof(hdr), ctx);
> + if (err < 0)
> + return err;
> +
> + buf = kmalloc(hdr.cpt_size, GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + err = ctx->read(buf, hdr.cpt_size, ctx);
> + if (err)
> + goto out;
> +
> + if (hdr.cpt_content == CPT_CONTENT_X86_FPUSTATE && cpu_has_fxsr) {
> + memcpy(&tsk->thread.xstate, buf,
> + sizeof(struct i387_fxsave_struct));
> + if (ti->cpt_flags & CPT_PF_USED_MATH)
> + set_stopped_child_used_math(tsk);
> + }
> +#ifndef CONFIG_X86_64
> + else if (hdr.cpt_content == CPT_CONTENT_X86_FPUSTATE_OLD &&
> + !cpu_has_fxsr) {
> + memcpy(&tsk->thread.xstate, buf,
> + sizeof(struct i387_fsave_struct));
> + if (ti->cpt_flags & CPT_PF_USED_MATH)
> + set_stopped_child_used_math(tsk);
> + }
> +#endif
> +
> +out:
> + kfree(buf);
> + return err;
> +}
> +
> +static u32 decode_segment(u32 segid)
> +{
> + if (segid == CPT_SEG_ZERO)
> + return 0;
> +
> + /* TLS descriptors */
> + if (segid <= CPT_SEG_TLS3)
> + return ((GDT_ENTRY_TLS_MIN + segid - CPT_SEG_TLS1) << 3) + 3;
> +
> + /* LDT descriptor, it is just an index to LDT array */
> + if (segid >= CPT_SEG_LDT)
> + return ((segid - CPT_SEG_LDT) << 3) | 7;
> +
> + /* Check for one of standard descriptors */
> + if (segid == CPT_SEG_USER32_DS)
> + return __USER_DS;
> + if (segid == CPT_SEG_USER32_CS)
> + return __USER_CS;
> +
> + eprintk("Invalid segment reg %d\n", segid);
> + return 0;
> +}
> +
> +static int rst_restore_registers(struct task_struct *tsk, struct cpt_context *ctx)
> +{
> + struct cpt_x86_regs ri;
> + struct pt_regs *regs = task_pt_regs(tsk);
> + extern char i386_ret_from_resume;
> + int err;
> +
> + err = rst_get_object(CPT_OBJ_X86_REGS, &ri, sizeof(ri), ctx);
> + if (err < 0)
> + return err;
> +
> + tsk->thread.sp = (unsigned long) regs;
> + tsk->thread.sp0 = (unsigned long) (regs+1);
> + tsk->thread.ip = (unsigned long) &i386_ret_from_resume;
> +
> + tsk->thread.gs = decode_segment(ri.cpt_gs);
> + tsk->thread.debugreg0 = ri.cpt_debugreg[0];
> + tsk->thread.debugreg1 = ri.cpt_debugreg[1];
> + tsk->thread.debugreg2 = ri.cpt_debugreg[2];
> + tsk->thread.debugreg3 = ri.cpt_debugreg[3];
> + tsk->thread.debugreg6 = ri.cpt_debugreg[6];
> + tsk->thread.debugreg7 = ri.cpt_debugreg[7];
> +
> + regs->bx = ri.cpt_bx;
> + regs->cx = ri.cpt_cx;
> + regs->dx = ri.cpt_dx;
> + regs->si = ri.cpt_si;
> + regs->di = ri.cpt_di;
> + regs->bp = ri.cpt_bp;
> + regs->ax = ri.cpt_ax;
> + regs->orig_ax = ri.cpt_orig_ax;
> + regs->ip = ri.cpt_ip;
> + regs->flags = ri.cpt_flags;
> + regs->sp = ri.cpt_sp;
> +
> + regs->cs = decode_segment(ri.cpt_cs);
> + regs->ss = decode_segment(ri.cpt_ss);
> + regs->ds = decode_segment(ri.cpt_ds);
> + regs->es = decode_segment(ri.cpt_es);
> + regs->fs = decode_segment(ri.cpt_fs);
> +
> + tsk->thread.sp -= HOOK_RESERVE;
> + memset((void*)tsk->thread.sp, 0, HOOK_RESERVE);
> +
> + return 0;
> +}
> +
> +static int restart_thread(void *arg)
> +{
> + struct thr_context *thr_ctx = arg;
> + struct cpt_context *ctx;
> + struct cpt_task_image *ti;
> + int err;
> +
> + current->state = TASK_UNINTERRUPTIBLE;
> +
> + ctx = thr_ctx->ctx;
> + ti = kmalloc(sizeof(*ti), GFP_KERNEL);
> + if (!ti)
> + return -ENOMEM;
> +
> + err = rst_get_object(CPT_OBJ_TASK, ti, sizeof(*ti), ctx);
> + if (!err)
> + err = rst_restore_task_struct(current, ti, ctx);
> + /* Restore mm here */
> + if (!err)
> + err = rst_restore_fpustate(current, ti, ctx);
> + if (!err)
> + err = rst_restore_registers(current, ctx);
> +
> + thr_ctx->error = err;
> + complete(&thr_ctx->complete);
> +
> + if (!err && (ti->cpt_state & (EXIT_ZOMBIE|EXIT_DEAD))) {
> + do_exit(ti->cpt_exit_code);
> + } else {
> + __set_current_state(TASK_UNINTERRUPTIBLE);
> + }
> +
> + kfree(ti);
> + schedule();
> +
> + eprintk("leaked %d/%d %p\n", task_pid_nr(current), task_pid_vnr(current), current->mm);
> +
> + module_put(THIS_MODULE);
Where is the module ref count dropped in case of success?
> + complete_and_exit(NULL, 0);
> + return 0;
> +}
> +static int create_root_task(struct cpt_context *ctx,
> + struct thr_context *thr_ctx)
> +{
> + struct task_struct *tsk;
> + int pid;
> +
> + thr_ctx->ctx = ctx;
> + thr_ctx->error = 0;
> + init_completion(&thr_ctx->complete);
> +
> + /* We should also create container here */
> + pid = local_kernel_thread(restart_thread, thr_ctx,
> + CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
> + CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNET, 0);
> + if (pid < 0)
> + return pid;
> + read_lock(&tasklist_lock);
> + tsk = find_task_by_vpid(pid);
> + if (tsk)
> + get_task_struct(tsk);
> + read_unlock(&tasklist_lock);
> + if (tsk == NULL)
> + return -ESRCH;
> + thr_ctx->tsk = tsk;
> + return 0;
> +}
> +
> +int rst_restart_process(struct cpt_context *ctx)
> +{
> + struct thr_context thr_ctx_root;
> + int err;
> +
> + err = create_root_task(ctx, &thr_ctx_root);
> + if (err)
> + return err;
> +
> + wait_for_completion(&thr_ctx_root.complete);
> + wait_task_inactive(thr_ctx_root.tsk, 0);
I don't see the point of this two-steps completion. Could you explain?
Thanks,
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 4/9] Introduce container dump function
2008-09-03 14:23 ` [PATCH 4/9] Introduce container dump function Serge E. Hallyn
@ 2008-09-03 14:45 ` Andrey Mirkin
0 siblings, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 14:45 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: linux-kernel, containers
On Wednesday 03 September 2008 18:23 Serge E. Hallyn wrote:
> Quoting Andrey Mirkin (major@openvz.org):
> > Actually right now we are going to dump only one process.
> > Function for dumping head of image file are added.
> >
> > Signed-off-by: Andrey Mirkin <major@openvz.org>
> > ---
> > cpt/Makefile | 2 +-
> > cpt/checkpoint.c | 74
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++ cpt/cpt.h |
> > 3 ++
> > cpt/sys.c | 3 +-
> > kernel/fork.c | 2 +
> > 5 files changed, 82 insertions(+), 2 deletions(-)
> > create mode 100644 cpt/checkpoint.c
> >
> > diff --git a/cpt/Makefile b/cpt/Makefile
> > index bfe75d5..173346b 100644
> > --- a/cpt/Makefile
> > +++ b/cpt/Makefile
> > @@ -2,4 +2,4 @@ obj-y += sys_core.o
> >
> > obj-$(CONFIG_CHECKPOINT) += cptrst.o
> >
> > -cptrst-objs := sys.o
> > +cptrst-objs := sys.o checkpoint.o
> > diff --git a/cpt/checkpoint.c b/cpt/checkpoint.c
> > new file mode 100644
> > index 0000000..b4d9686
> > --- /dev/null
> > +++ b/cpt/checkpoint.c
> > @@ -0,0 +1,74 @@
> > +/*
> > + * Copyright (C) 2008 Parallels, Inc.
> > + *
> > + * Author: Andrey Mirkin <major@openvz.org>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation, version 2 of the
> > + * License.
> > + *
> > + */
> > +
> > +#include <linux/sched.h>
> > +#include <linux/fs.h>
> > +#include <linux/file.h>
> > +#include <linux/version.h>
> > +
> > +#include "cpt.h"
> > +
> > +static int cpt_write_head(struct cpt_context *ctx)
> > +{
> > + struct cpt_head hdr;
> > +
> > + memset(&hdr, 0, sizeof(hdr));
> > + hdr.cpt_signature[0] = CPT_SIGNATURE0;
> > + hdr.cpt_signature[1] = CPT_SIGNATURE1;
> > + hdr.cpt_signature[2] = CPT_SIGNATURE2;
> > + hdr.cpt_signature[3] = CPT_SIGNATURE3;
> > + hdr.cpt_hdrlen = sizeof(hdr);
> > + hdr.cpt_image_major = (LINUX_VERSION_CODE >> 16) & 0xff;
> > + hdr.cpt_image_minor = (LINUX_VERSION_CODE >> 8) & 0xff;
> > + hdr.cpt_image_sublevel = (LINUX_VERSION_CODE) & 0xff;
> > + hdr.cpt_image_extra = 0;
> > +#if defined(CONFIG_X86_32)
> > + hdr.cpt_arch = CPT_ARCH_I386;
> > +#else
> > +#error Arch is not supported
> > +#endif
> > + return ctx->write(&hdr, sizeof(hdr), ctx);
> > +}
> > +
> > +int dump_container(struct cpt_context *ctx)
> > +{
> > + int err;
> > + struct task_struct *root;
> > +
> > + read_lock(&tasklist_lock);
> > + root = find_task_by_vpid(ctx->pid);
> > + if (root)
> > + get_task_struct(root);
> > + read_unlock(&tasklist_lock);
> > +
> > + err = -ESRCH;
> > + if (!root) {
> > + eprintk("can not find root task\n");
> > + return err;
> > + }
> > + ctx->nsproxy = root->nsproxy;
> > + if (!ctx->nsproxy) {
> > + eprintk("nsproxy is null\n");
> > + goto out;
> > + }
>
> The get_task_struct() above won't pin the tsk->nsproxy
> though, will it? So should you be doing a
> rcu_read_lock();
> nsproxy = get_task_nsproxy(root);
> rcu_read_unlock();
> to make sure the nsproxy doesn't disappear out from under
> you?
>
You right here, will fix it in next version.
Thanks,
Andrey
>
> > + err = cpt_write_head(ctx);
> > +
> > + /* Dump task here */
> > + if (!err)
> > + err = -ENOSYS;
> > +
> > +out:
> > + ctx->nsproxy = NULL;
> > + put_task_struct(root);
> > + return err;
> > +}
> > diff --git a/cpt/cpt.h b/cpt/cpt.h
> > index 607ac1b..b421a11 100644
> > --- a/cpt/cpt.h
> > +++ b/cpt/cpt.h
> > @@ -33,6 +33,7 @@ typedef struct cpt_context
> > int refcount;
> > int ctx_state;
> > struct semaphore main_sem;
> > + struct nsproxy *nsproxy;
> >
> > int errno;
> >
> > @@ -54,3 +55,5 @@ extern int debug_level;
> >
> > #define eprintk(a...) cpt_printk(1, "CPT ERR: " a)
> > #define dprintk(a...) cpt_printk(1, "CPT DBG: " a)
> > +
> > +int dump_container(struct cpt_context *ctx);
> > diff --git a/cpt/sys.c b/cpt/sys.c
> > index 8334c4c..6801c22 100644
> > --- a/cpt/sys.c
> > +++ b/cpt/sys.c
> > @@ -109,9 +109,10 @@ static int checkpoint(pid_t pid, int fd, unsigned
> > long flags)
> >
> > ctx->file = file;
> > ctx->ctx_state = CPT_CTX_DUMPING;
> > + ctx->pid = pid;
> >
> > /* checkpoint */
> > - err = -ENOSYS;
> > + err = dump_container(ctx);
> >
> > context_put(ctx);
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 52b5037..f38b43d 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -77,6 +77,7 @@ int max_threads; /* tunable limit on nr_threads */
> > DEFINE_PER_CPU(unsigned long, process_counts) = 0;
> >
> > __cacheline_aligned DEFINE_RWLOCK(tasklist_lock); /* outer */
> > +EXPORT_SYMBOL(tasklist_lock);
> >
> > int nr_processes(void)
> > {
> > @@ -153,6 +154,7 @@ void __put_task_struct(struct task_struct *tsk)
> > if (!profile_handoff_task(tsk))
> > free_task(tsk);
> > }
> > +EXPORT_SYMBOL(__put_task_struct);
> >
> > /*
> > * macro override instead of weak attribute alias, to workaround
> > --
> > 1.5.6
> >
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/containers
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 2/9] Make checkpoint/restart functionality modular
2008-09-03 14:27 ` [PATCH 2/9] Make checkpoint/restart functionality modular Serge E. Hallyn
@ 2008-09-03 14:51 ` Andrey Mirkin
0 siblings, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 14:51 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: linux-kernel, containers
On Wednesday 03 September 2008 18:27 Serge E. Hallyn wrote:
> Quoting Andrey Mirkin (major@openvz.org):
> > A config option CONFIG_CHECKPOINT is introduced.
> > New structure cpt_operations is introduced to store pointers to
> > checkpoint/restart functions from module.
>
> Thanks Andrey. The structure of this patchset is very nice, and I'm
> finding it very easy to read, at least the first half.
>
> The differences (in yours and Oren's) between checkpointing and
> restarting yourself or another task should probably not be important
> as people will want to be able to do both in the end, and I'm sure
> both patchsets can do both. And you have things like your own
> printk helpers in patch 3 that, just as in Oren's set, will have to
> go away. But I'm not yet able to see what the real differences are.
>
> Keeping on looking...
Thanks for reviewing. My owen printk is just for debug purpose, I will change
them to something already existing.
I'm waiting for more comments from you.
Thanks,
Andrey
>
> > Signed-off-by: Andrey Mirkin <major@openvz.org>
> > ---
> > cpt/Kconfig | 7 +++++++
> > cpt/Makefile | 4 ++++
> > cpt/cpt.h | 19 +++++++++++++++++++
> > cpt/sys.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> > cpt/sys_core.c | 29 +++++++++++++++++++++++++++--
> > init/Kconfig | 2 ++
> > 6 files changed, 107 insertions(+), 2 deletions(-)
> > create mode 100644 cpt/Kconfig
> > create mode 100644 cpt/cpt.h
> > create mode 100644 cpt/sys.c
> >
> > diff --git a/cpt/Kconfig b/cpt/Kconfig
> > new file mode 100644
> > index 0000000..b9bc72d
> > --- /dev/null
> > +++ b/cpt/Kconfig
> > @@ -0,0 +1,7 @@
> > +config CHECKPOINT
> > + tristate "Checkpoint & restart for containers"
> > + depends on EXPERIMENTAL
> > + default n
> > + help
> > + This option adds module "cptrst", which allow to save a running
> > + container to a file and restart it later using this image file.
> > diff --git a/cpt/Makefile b/cpt/Makefile
> > index 2276fb1..bfe75d5 100644
> > --- a/cpt/Makefile
> > +++ b/cpt/Makefile
> > @@ -1 +1,5 @@
> > obj-y += sys_core.o
> > +
> > +obj-$(CONFIG_CHECKPOINT) += cptrst.o
> > +
> > +cptrst-objs := sys.o
> > diff --git a/cpt/cpt.h b/cpt/cpt.h
> > new file mode 100644
> > index 0000000..381a9bf
> > --- /dev/null
> > +++ b/cpt/cpt.h
> > @@ -0,0 +1,19 @@
> > +/*
> > + * Copyright (C) 2008 Parallels, Inc.
> > + *
> > + * Author: Andrey Mirkin <major@openvz.org>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation, version 2 of the
> > + * License.
> > + *
> > + */
> > +
> > +struct cpt_operations
> > +{
> > + struct module * owner;
> > + int (*checkpoint)(pid_t pid, int fd, unsigned long flags);
> > + int (*restart)(int ctid, int fd, unsigned long flags);
> > +};
> > +extern struct cpt_operations cpt_ops;
> > diff --git a/cpt/sys.c b/cpt/sys.c
> > new file mode 100644
> > index 0000000..4051286
> > --- /dev/null
> > +++ b/cpt/sys.c
> > @@ -0,0 +1,48 @@
> > +/*
> > + * Copyright (C) 2008 Parallels, Inc.
> > + *
> > + * Author: Andrey Mirkin <major@openvz.org>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation, version 2 of the
> > + * License.
> > + *
> > + */
> > +
> > +#include <linux/sched.h>
> > +#include <linux/fs.h>
> > +#include <linux/file.h>
> > +#include <linux/notifier.h>
> > +#include <linux/module.h>
> > +
> > +#include "cpt.h"
> > +
> > +MODULE_LICENSE("GPL");
> > +
> > +static int checkpoint(pid_t pid, int fd, unsigned long flags)
> > +{
> > + return -ENOSYS;
> > +}
> > +
> > +static int restart(int ctid, int fd, unsigned long flags)
> > +{
> > + return -ENOSYS;
> > +}
> > +
> > +static int __init init_cptrst(void)
> > +{
> > + cpt_ops.owner = THIS_MODULE;
> > + cpt_ops.checkpoint = checkpoint;
> > + cpt_ops.restart = restart;
> > + return 0;
> > +}
> > +module_init(init_cptrst);
> > +
> > +static void __exit exit_cptrst(void)
> > +{
> > + cpt_ops.checkpoint = NULL;
> > + cpt_ops.restart = NULL;
> > + cpt_ops.owner = NULL;
> > +}
> > +module_exit(exit_cptrst);
> > diff --git a/cpt/sys_core.c b/cpt/sys_core.c
> > index 1a97fb6..5dd1191 100644
> > --- a/cpt/sys_core.c
> > +++ b/cpt/sys_core.c
> > @@ -13,6 +13,13 @@
> > #include <linux/sched.h>
> > #include <linux/fs.h>
> > #include <linux/file.h>
> > +#include <linux/notifier.h>
> > +#include <linux/module.h>
> > +
> > +#include "cpt.h"
> > +
> > +struct cpt_operations cpt_ops = { NULL, NULL, NULL };
> > +EXPORT_SYMBOL(cpt_ops);
> >
> > /**
> > * sys_checkpoint - checkpoint a container from outside
> > @@ -23,7 +30,16 @@
> > */
> > asmlinkage long sys_checkpoint(pid_t pid, int fd, unsigned long flags)
> > {
> > - return -ENOSYS;
> > + int ret;
> > +
> > + ret = -ENOSYS;
> > +
> > + if (try_module_get(cpt_ops.owner)) {
> > + if (cpt_ops.checkpoint)
> > + ret = cpt_ops.checkpoint(pid, fd, flags);
> > + module_put(cpt_ops.owner);
> > + }
> > + return ret;
> > }
> >
> > /**
> > @@ -34,5 +50,14 @@ asmlinkage long sys_checkpoint(pid_t pid, int fd,
> > unsigned long flags) */
> > asmlinkage long sys_restart(int ctid, int fd, unsigned long flags)
> > {
> > - return -ENOSYS;
> > + int ret;
> > +
> > + ret = -ENOSYS;
> > +
> > + if (try_module_get(cpt_ops.owner)) {
> > + if (cpt_ops.restart)
> > + ret = cpt_ops.restart(ctid, fd, flags);
> > + module_put(cpt_ops.owner);
> > + }
> > + return ret;
> > }
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 4bd4b0c..d29ed21 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -344,6 +344,8 @@ config CGROUP_FREEZER
> > Provides a way to freeze and unfreeze all tasks in a
> > cgroup
> >
> > +source "cpt/Kconfig"
> > +
> > config FAIR_GROUP_SCHED
> > bool "Group scheduling for SCHED_OTHER"
> > depends on GROUP_SCHED
> > --
> > 1.5.6
> >
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/containers
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 14:26 ` Cedric Le Goater
@ 2008-09-03 14:53 ` Andrey Mirkin
0 siblings, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-09-03 14:53 UTC (permalink / raw)
To: Cedric Le Goater; +Cc: Louis.Rilling, containers, linux-kernel
On Wednesday 03 September 2008 18:26 Cedric Le Goater wrote:
> > Forgot a global comment: you will probably get the same (rather pointless
> > for a proof of concept, IMHO) requests as Oren got, to
> > 1) improve coding style:
> > a) especially avoid error handling like:
> > err = foo();
> > if (!err)
> > err = bar();
> > if (!err)
> > err = baz();
> >
> > and prefer
> > err = foo();
> > if (err)
> > goto foo_err;
> > err = bar();
> > ...
> >
> > b) do not write conditions on a single line, like
> > if (foo) bar;
> >
> > 2) put arch-dependent code in arch/ subdirs.
> > 3) I probably forgot other ones.
>
> ./scripts/checkpatch.pl catches most of it (and it does catch a few on your
> patches)
Thanks, I will use this script to brush up the code.
>
> but Dave is even better ! :)
I'm waiting for his comments :)
Regards,
Andrey
>
> > I obviously do not personally request you to take these requests into
> > account ;)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 10:57 [PATCH 0/9] OpenVZ kernel based checkpointing/restart Andrey Mirkin
` (2 preceding siblings ...)
2008-09-03 13:49 ` Louis Rilling
@ 2008-09-04 8:14 ` Oren Laadan
2008-09-04 14:05 ` Dave Hansen
2008-10-17 23:33 ` Dave Hansen
5 siblings, 0 replies; 69+ messages in thread
From: Oren Laadan @ 2008-09-04 8:14 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
Andrey Mirkin wrote:
> This patchset introduces kernel based checkpointing/restart as it is
> implemented in OpenVZ project. This patchset has limited functionality and
> are able to checkpoint/restart only single process. Recently Oren Laaden
> sent another kernel based implementation of checkpoint/restart. The main
> differences between this patchset and Oren's patchset are:
Thanks Andrey. I'll take a close look at it by the weekend. Just a couple
of comments after a very brief look at the code:
>
> * In this patchset checkpointing initiated not from the process
> (right now we do not have a container, only namespaces), Oren's patchset
> performs checkpointing from the process context.
Actually, my patch is written to checkpoint from outside the context of the
target task; for simplicity only, it is applied to the current process.
>
> * Restart in this patchset is initiated from process, which restarts a new
> process (in new namespaces) with saved state. Oren's patchset uses the same
> process from which restart was initiated and restore saved state over it.
Here, too, handling of namespaces was left out in order to provide a simple
proof of concept. In fact, spawning a new process and entering a new namespace
can be easily done in user-space.
Oren.
>
> * Checkpoint/restart functionality in this patchset is implemented as a kernel
> module
>
>
> As checkpointing is initiated not from the process which state should be saved
> we should freeze a process before saving its state. Right now Container Freezer
> from Matt Helsley can be used for this.
>
> This patchset introduce only a concept how kernel based checkpointing/restart
> can be implemented and are able to checkpoint/restart only a single process
> with simple VMAs.
>
> I've tried to split my patchset in small patches to make review more easier.
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 10:57 [PATCH 0/9] OpenVZ kernel based checkpointing/restart Andrey Mirkin
` (3 preceding siblings ...)
2008-09-04 8:14 ` Oren Laadan
@ 2008-09-04 14:05 ` Dave Hansen
2008-10-17 23:33 ` Dave Hansen
5 siblings, 0 replies; 69+ messages in thread
From: Dave Hansen @ 2008-09-04 14:05 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
So, just like the I/O controller patches, we surely can't just throw
patch sets back and forth at each other. We're also sure to wear out
any potential reviewers, especially on LKML.
The differences you've described between this and Oren's patches are
pretty small, all things considered. Would you be willing to simply
help with Oren's patches or port things over from this set instead of
working on a completely disjoint set?
-- Dave
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 13:59 ` [Devel] " Andrey Mirkin
@ 2008-09-04 22:55 ` Dave Hansen
0 siblings, 0 replies; 69+ messages in thread
From: Dave Hansen @ 2008-09-04 22:55 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: devel, containers, Cedric Le Goater, linux-kernel
On Wed, 2008-09-03 at 17:59 +0400, Andrey Mirkin wrote:
> > The first issues I see with this direction are some EXPORT_SYMBOL() that
> > would be useless without a module.
>
> Checkpoint/restart functionality is implemented as a kernel module to provide
> more flexibility during development process - you can easily unload module,
> fix what you need and load module again, you do not need to reboot you server
> in this case. We have discussed this question during Container's mini summit
> on OLS-2008 and we have found out that it will be convenient to have such an
> ability.
It would probably be best to separate out the bits that are needed to
make this feature build as a module. Since, as you say, this is only
useful during development, we'll have to rip these bits out eventually
to get it merged. Might as well do that from the start.
-- Dave
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 8/9] Introduce functions to restart a process
2008-09-03 14:32 ` [PATCH 8/9] Introduce functions to restart a process Louis Rilling
@ 2008-09-13 17:34 ` Pavel Machek
0 siblings, 0 replies; 69+ messages in thread
From: Pavel Machek @ 2008-09-13 17:34 UTC (permalink / raw)
To: Louis Rilling; +Cc: Andrey Mirkin, linux-kernel, containers
Hi!
> > @@ -233,6 +234,26 @@ ENTRY(ret_from_fork)
> > CFI_ENDPROC
> > END(ret_from_fork)
> >
> > +ENTRY(i386_ret_from_resume)
> > + CFI_STARTPROC
> > + pushl %eax
> > + CFI_ADJUST_CFA_OFFSET 4
> > + call schedule_tail
> > + GET_THREAD_INFO(%ebp)
> > + popl %eax
> > + CFI_ADJUST_CFA_OFFSET -4
> > + movl (%esp), %eax
> > + testl %eax, %eax
> > + jz 1f
> > + pushl %esp
> > + call *%eax
> > + addl $4, %esp
> > +1:
> > + addl $256, %esp
Can we do something with naming here? resume usually means 'resume
from s2ram'...
> > diff --git a/cpt/rst_process.c b/cpt/rst_process.c
> > new file mode 100644
> > index 0000000..6d47f3c
> > --- /dev/null
> > +++ b/cpt/rst_process.c
> > @@ -0,0 +1,277 @@
....and you are not even consistent. Plus, rst_ will probably be
understood as a reset by most people.
Hmmm... your syscalls get fd. What happens if I pass something like
/proc/self/maps to them?
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-09-03 10:57 [PATCH 0/9] OpenVZ kernel based checkpointing/restart Andrey Mirkin
` (4 preceding siblings ...)
2008-09-04 14:05 ` Dave Hansen
@ 2008-10-17 23:33 ` Dave Hansen
2008-10-20 11:10 ` Louis Rilling
2008-10-20 12:14 ` [Devel] " Andrey Mirkin
5 siblings, 2 replies; 69+ messages in thread
From: Dave Hansen @ 2008-10-17 23:33 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: linux-kernel, containers
On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> This patchset introduces kernel based checkpointing/restart as it is
> implemented in OpenVZ project. This patchset has limited functionality and
> are able to checkpoint/restart only single process. Recently Oren Laaden
> sent another kernel based implementation of checkpoint/restart. The main
> differences between this patchset and Oren's patchset are:
Hi Andrey,
I'm curious what you want to happen with this patch set. Is there
something specific in Oren's set that deficient which you need
implemented? Are there some technical reasons you prefer this code?
-- Dave
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-17 23:33 ` Dave Hansen
@ 2008-10-20 11:10 ` Louis Rilling
2008-10-20 13:25 ` Daniel Lezcano
2008-10-20 16:36 ` Dave Hansen
2008-10-20 12:14 ` [Devel] " Andrey Mirkin
1 sibling, 2 replies; 69+ messages in thread
From: Louis Rilling @ 2008-10-20 11:10 UTC (permalink / raw)
To: Dave Hansen; +Cc: Andrey Mirkin, linux-kernel, containers, Oren Laadan
[-- Attachment #1: Type: text/plain, Size: 1159 bytes --]
On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> > This patchset introduces kernel based checkpointing/restart as it is
> > implemented in OpenVZ project. This patchset has limited functionality and
> > are able to checkpoint/restart only single process. Recently Oren Laaden
> > sent another kernel based implementation of checkpoint/restart. The main
> > differences between this patchset and Oren's patchset are:
>
> Hi Andrey,
>
> I'm curious what you want to happen with this patch set. Is there
> something specific in Oren's set that deficient which you need
> implemented? Are there some technical reasons you prefer this code?
To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
approach, shouldn't Oren answer the same questions with respect to Andrey's
patchset?
I'm afraid that we are forgetting to take the best from both approaches...
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-17 23:33 ` Dave Hansen
2008-10-20 11:10 ` Louis Rilling
@ 2008-10-20 12:14 ` Andrey Mirkin
2008-10-20 15:55 ` Dave Hansen
2008-10-20 17:17 ` Oren Laadan
1 sibling, 2 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-10-20 12:14 UTC (permalink / raw)
To: devel; +Cc: Dave Hansen, Pavel Emelyanov, containers, linux-kernel
On Saturday 18 October 2008 03:33 Dave Hansen wrote:
> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> > This patchset introduces kernel based checkpointing/restart as it is
> > implemented in OpenVZ project. This patchset has limited functionality
> > and are able to checkpoint/restart only single process. Recently Oren
> > Laaden sent another kernel based implementation of checkpoint/restart.
> > The main differences between this patchset and Oren's patchset are:
>
> Hi Andrey,
>
> I'm curious what you want to happen with this patch set. Is there
> something specific in Oren's set that deficient which you need
> implemented? Are there some technical reasons you prefer this code?
Hi Dave,
Right now my patchset (v2) provides an ability to checkpoint and restart a
group of processes. The process of checkpointing and restart can be initiated
from external process (not from the process which should be checkpointed).
Also I think that all the restart job (including process forking) should be
done in kernel, as in this case we will not depend on user space and will be
more secure. This is also implemented in my patchset.
Andrey
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 11:10 ` Louis Rilling
@ 2008-10-20 13:25 ` Daniel Lezcano
2008-10-20 13:48 ` Cedric Le Goater
2008-10-20 15:53 ` Oren Laadan
2008-10-20 16:36 ` Dave Hansen
1 sibling, 2 replies; 69+ messages in thread
From: Daniel Lezcano @ 2008-10-20 13:25 UTC (permalink / raw)
To: Louis.Rilling; +Cc: Dave Hansen, containers, Andrey Mirkin, linux-kernel
Louis Rilling wrote:
> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>> This patchset introduces kernel based checkpointing/restart as it is
>>> implemented in OpenVZ project. This patchset has limited functionality and
>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>> sent another kernel based implementation of checkpoint/restart. The main
>>> differences between this patchset and Oren's patchset are:
>> Hi Andrey,
>>
>> I'm curious what you want to happen with this patch set. Is there
>> something specific in Oren's set that deficient which you need
>> implemented? Are there some technical reasons you prefer this code?
>
> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
> approach, shouldn't Oren answer the same questions with respect to Andrey's
> patchset?
>
> I'm afraid that we are forgetting to take the best from both approaches...
I agree with Louis.
I played with Oren's patchset and tryed to port it on x86_64. I was able
to sys_checkpoint/sys_restart but if you remove the restoring of the
general registers, the restart still works. I am not an expert on asm,
but my hypothesis is when we call sys_checkpoint the registers are saved
on the stack by the syscall and when we restore the memory of the
process, we restore the stack and the stacked registers are restored
when exiting the sys_restart. That make me feel there is an important
gap between external checkpoint and internal checkpoint.
Dmitry's patchset is nice too, but IMO, it goes too far from what we
decided to do at the container mini-summit. I think there are a lot of
design questions to be solved before going further.
IMHO we should look at Dmitry patchset and merge the external checkpoint
code to Oren's patchset in order to checkpoint *one* process and have
the process to restart itself. At this point, we can begin to talk about
the restart itself, shall we have the kernel to fork the processes to be
restarted ? shall we fork from userspace and implement some mechanism to
have each processes to restart themselves ? etc...
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 13:25 ` Daniel Lezcano
@ 2008-10-20 13:48 ` Cedric Le Goater
2008-10-20 13:49 ` Daniel Lezcano
2008-10-20 15:53 ` Oren Laadan
1 sibling, 1 reply; 69+ messages in thread
From: Cedric Le Goater @ 2008-10-20 13:48 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Louis.Rilling, Dave Hansen, containers, Andrey Mirkin, linux-kernel
>> I'm afraid that we are forgetting to take the best from both
>> approaches...
>
> I agree with Louis.
>
> I played with Oren's patchset and tryed to port it on x86_64. I was able
> to sys_checkpoint/sys_restart but if you remove the restoring of the
> general registers, the restart still works. I am not an expert on asm,
> but my hypothesis is when we call sys_checkpoint the registers are saved
> on the stack by the syscall and when we restore the memory of the
> process, we restore the stack and the stacked registers are restored
> when exiting the sys_restart. That make me feel there is an important
> gap between external checkpoint and internal checkpoint.
>
> Dmitry's patchset is nice too, but IMO, it goes too far from what we
I think you are talking about Andrey.
C.
> decided to do at the container mini-summit. I think there are a lot of
> design questions to be solved before going further.
>
> IMHO we should look at Dmitry patchset and merge the external checkpoint
> code to Oren's patchset in order to checkpoint *one* process and have
> the process to restart itself. At this point, we can begin to talk about
> the restart itself, shall we have the kernel to fork the processes to be
> restarted ? shall we fork from userspace and implement some mechanism to
> have each processes to restart themselves ? etc...
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 13:48 ` Cedric Le Goater
@ 2008-10-20 13:49 ` Daniel Lezcano
0 siblings, 0 replies; 69+ messages in thread
From: Daniel Lezcano @ 2008-10-20 13:49 UTC (permalink / raw)
To: Cedric Le Goater
Cc: Louis.Rilling, Dave Hansen, containers, Andrey Mirkin, linux-kernel
Cedric Le Goater wrote:
>>> I'm afraid that we are forgetting to take the best from both
>>> approaches...
>> I agree with Louis.
>>
>> I played with Oren's patchset and tryed to port it on x86_64. I was able
>> to sys_checkpoint/sys_restart but if you remove the restoring of the
>> general registers, the restart still works. I am not an expert on asm,
>> but my hypothesis is when we call sys_checkpoint the registers are saved
>> on the stack by the syscall and when we restore the memory of the
>> process, we restore the stack and the stacked registers are restored
>> when exiting the sys_restart. That make me feel there is an important
>> gap between external checkpoint and internal checkpoint.
>>
>> Dmitry's patchset is nice too, but IMO, it goes too far from what we
>
> I think you are talking about Andrey.
Yes :)
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 13:25 ` Daniel Lezcano
2008-10-20 13:48 ` Cedric Le Goater
@ 2008-10-20 15:53 ` Oren Laadan
2008-10-20 16:37 ` Daniel Lezcano
` (2 more replies)
1 sibling, 3 replies; 69+ messages in thread
From: Oren Laadan @ 2008-10-20 15:53 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Louis.Rilling, linux-kernel, containers, Andrey Mirkin, Dave Hansen
Daniel Lezcano wrote:
> Louis Rilling wrote:
>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>>> This patchset introduces kernel based checkpointing/restart as it is
>>>> implemented in OpenVZ project. This patchset has limited functionality and
>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>>> sent another kernel based implementation of checkpoint/restart. The main
>>>> differences between this patchset and Oren's patchset are:
>>> Hi Andrey,
>>>
>>> I'm curious what you want to happen with this patch set. Is there
>>> something specific in Oren's set that deficient which you need
>>> implemented? Are there some technical reasons you prefer this code?
>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
>> approach, shouldn't Oren answer the same questions with respect to Andrey's
>> patchset?
>>
>> I'm afraid that we are forgetting to take the best from both approaches...
>
> I agree with Louis.
>
> I played with Oren's patchset and tryed to port it on x86_64. I was able
> to sys_checkpoint/sys_restart but if you remove the restoring of the
> general registers, the restart still works. I am not an expert on asm,
> but my hypothesis is when we call sys_checkpoint the registers are saved
> on the stack by the syscall and when we restore the memory of the
> process, we restore the stack and the stacked registers are restored
> when exiting the sys_restart. That make me feel there is an important
> gap between external checkpoint and internal checkpoint.
This is a misconception: my patches are not "internal checkpoint". My
patches are basically "external checkpoint" by design, which *also*
accommodates self-checkpointing (aka internal). The same holds for the
restart. The implementation is demonstrated with "self-checkpoint" to
avoid complicating things at this early stage of proof-of-concept.
For multiple processes all that is needed is a container and a loop
on the checkpoint side, and a method to recreate processes on the
restart side. Andrew suggests to do it in kernel space, I still have
doubts.
While I held out the multi-process part of the patch so far because I
was explicitly asked to do it, it seems like this would be a good time
to push it out and get feedback.
>
> Dmitry's patchset is nice too, but IMO, it goes too far from what we
> decided to do at the container mini-summit. I think there are a lot of
> design questions to be solved before going further.
>
> IMHO we should look at Dmitry patchset and merge the external checkpoint
> code to Oren's patchset in order to checkpoint *one* process and have
> the process to restart itself. At this point, we can begin to talk about
> the restart itself, shall we have the kernel to fork the processes to be
> restarted ? shall we fork from userspace and implement some mechanism to
> have each processes to restart themselves ? etc...
>
In both approaches, processes restart themselves, in the sense that a
process to be restarted eventually calls "do_restart()" (or equivalent).
The only question is how processes are created. Andrew's patch creates
everything inside the kernel. I would like to still give it a try outside
the kernel. Everything is ready, except that we need a way to pre-select
a PID for the new child... we never agreed on that one, did we ?
If we go ahead with the kernel-based process creation, it's easy to merge
it to the current patch-set.
Oren.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 12:14 ` [Devel] " Andrey Mirkin
@ 2008-10-20 15:55 ` Dave Hansen
2008-10-27 14:07 ` Andrey Mirkin
2008-10-20 17:17 ` Oren Laadan
1 sibling, 1 reply; 69+ messages in thread
From: Dave Hansen @ 2008-10-20 15:55 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: devel, containers, Pavel Emelyanov, linux-kernel
On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
> Right now my patchset (v2) provides an ability to checkpoint and restart a
> group of processes. The process of checkpointing and restart can be initiated
> from external process (not from the process which should be checkpointed).
Absolutely. Oren's code does it this way to make for a smaller patch at
first. The syscall takes a pid argument so it is surely expected to be
expanded upon later.
> Also I think that all the restart job (including process forking) should be
> done in kernel, as in this case we will not depend on user space and will be
> more secure. This is also implemented in my patchset.
Do you think that this is an approach that Oren's patches are married
to, or is this a "feature" we can add on later?
I don't care which patch set we end up sticking in the kernel. I'm
trying to figure out which code we can more easily build upon in the
future. The fact that Oren's or yours can't do certain little things
right now does not bother me.
Honestly, I'm a little more confident that everyone can work with Oren
since he managed to get 7 revisions of his patch out and make some
pretty large changes while in the same time the OpenVZ patch was only
released twice. I'm not sure what has changed in the OpenVZ patch
between releases, either.
Are there any reasons that you absolutely can not use the code Oren
posted? Will it not fulfill your needs somehow? If so, could you
please elaborate on how?
-- Dave
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 11:10 ` Louis Rilling
2008-10-20 13:25 ` Daniel Lezcano
@ 2008-10-20 16:36 ` Dave Hansen
1 sibling, 0 replies; 69+ messages in thread
From: Dave Hansen @ 2008-10-20 16:36 UTC (permalink / raw)
To: Louis.Rilling; +Cc: containers, Andrey Mirkin, linux-kernel
On Mon, 2008-10-20 at 13:10 +0200, Louis Rilling wrote:
> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
> approach, shouldn't Oren answer the same questions with respect to Andrey's
> patchset?
I'm only really "supporting" Oren's patch set because he got it out way
before the OpenVZ one showed up, and has kept integrating improvements
into it to keep me interested. The OpenVZ approach does a few things
that Oren's does not, and it *is* a bit more mature.
I'm quite willing to jump camps if there's something compelling in what
Andrey has posted, even though I've put quite a bit of time in to
reviewing and improving Oren's. I'm looking for that "something". :)
-- Dave
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 15:53 ` Oren Laadan
@ 2008-10-20 16:37 ` Daniel Lezcano
2008-10-20 17:23 ` Serge E. Hallyn
2008-10-27 14:45 ` [Devel] " Andrey Mirkin
2008-10-20 16:51 ` Serge E. Hallyn
2008-10-21 9:36 ` Cedric Le Goater
2 siblings, 2 replies; 69+ messages in thread
From: Daniel Lezcano @ 2008-10-20 16:37 UTC (permalink / raw)
To: Oren Laadan
Cc: Louis.Rilling, linux-kernel, containers, Andrey Mirkin, Dave Hansen
Oren Laadan wrote:
>
> Daniel Lezcano wrote:
>> Louis Rilling wrote:
>>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>>>> This patchset introduces kernel based checkpointing/restart as it is
>>>>> implemented in OpenVZ project. This patchset has limited functionality and
>>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>>>> sent another kernel based implementation of checkpoint/restart. The main
>>>>> differences between this patchset and Oren's patchset are:
>>>> Hi Andrey,
>>>>
>>>> I'm curious what you want to happen with this patch set. Is there
>>>> something specific in Oren's set that deficient which you need
>>>> implemented? Are there some technical reasons you prefer this code?
>>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
>>> approach, shouldn't Oren answer the same questions with respect to Andrey's
>>> patchset?
>>>
>>> I'm afraid that we are forgetting to take the best from both approaches...
>> I agree with Louis.
>>
>> I played with Oren's patchset and tryed to port it on x86_64. I was able
>> to sys_checkpoint/sys_restart but if you remove the restoring of the
>> general registers, the restart still works. I am not an expert on asm,
>> but my hypothesis is when we call sys_checkpoint the registers are saved
>> on the stack by the syscall and when we restore the memory of the
>> process, we restore the stack and the stacked registers are restored
>> when exiting the sys_restart. That make me feel there is an important
>> gap between external checkpoint and internal checkpoint.
>
> This is a misconception: my patches are not "internal checkpoint". My
> patches are basically "external checkpoint" by design, which *also*
> accommodates self-checkpointing (aka internal). The same holds for the
> restart. The implementation is demonstrated with "self-checkpoint" to
> avoid complicating things at this early stage of proof-of-concept.
Yep, I read your patchset :)
I just want to clarify what we want to demonstrate with this patchset
for the proof-of-concept ? A self CR does not show what are the
complicate parts of the CR, we are just showing we can dump the memory
from the kernel and do setcontext/getcontext.
We state at the container mini-summit on an approach:
1. Pre-dump
2. Freeze the container
3. Dump
4. Thaw/Kill the container
5. Post-dump
We already have the freezer, and we can forget for now pre-dump and
post-dump.
IMHO, for the proof-of-concept we should do a minimal CR (like you did),
but conforming with these 5 points, but that means we have to do an
external checkpoint.
If the POC conforms with that, the patchset will be a little different
and that will show what are the difficult part for restarting a process,
especially to restart it at the frozen state :) and that will give an
idea from 10000 feets of the big picture.
> For multiple processes all that is needed is a container and a loop
> on the checkpoint side, and a method to recreate processes on the
> restart side. Andrew suggests to do it in kernel space, I still have
> doubts.
A question to Andrey, do you, in OpenVZ, restart "externally" or it is
the first process of the pid namespace which calls sys_restart and then
populates the pid namespace ?
> While I held out the multi-process part of the patch so far because I
> was explicitly asked to do it, it seems like this would be a good time
> to push it out and get feedback.
IMHO it is too soon...
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 15:53 ` Oren Laadan
2008-10-20 16:37 ` Daniel Lezcano
@ 2008-10-20 16:51 ` Serge E. Hallyn
2008-10-21 9:36 ` Cedric Le Goater
2 siblings, 0 replies; 69+ messages in thread
From: Serge E. Hallyn @ 2008-10-20 16:51 UTC (permalink / raw)
To: Oren Laadan
Cc: Daniel Lezcano, Louis.Rilling, containers, Dave Hansen,
linux-kernel, Andrey Mirkin
Quoting Oren Laadan (orenl@cs.columbia.edu):
> This is a misconception: my patches are not "internal checkpoint". My
> patches are basically "external checkpoint" by design, which *also*
> accommodates self-checkpointing (aka internal). The same holds for the
> restart. The implementation is demonstrated with "self-checkpoint" to
> avoid complicating things at this early stage of proof-of-concept.
>
> For multiple processes all that is needed is a container and a loop
> on the checkpoint side, and a method to recreate processes on the
> restart side. Andrew suggests to do it in kernel space, I still have
> doubts.
Yes I still prefer in-kernel. Can you elaborate on advantages of doing
more work in userspace?
> While I held out the multi-process part of the patch so far because I
Yup, and i appreciate your restraint until now :) It made your patchset
much easier to review.
> was explicitly asked to do it, it seems like this would be a good time
> to push it out and get feedback.
Can you send that out as a patch(set) on top of your v7? I'd love to
see (and test) it.
thanks,
-serge
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 12:14 ` [Devel] " Andrey Mirkin
2008-10-20 15:55 ` Dave Hansen
@ 2008-10-20 17:17 ` Oren Laadan
2008-10-27 14:38 ` Andrey Mirkin
1 sibling, 1 reply; 69+ messages in thread
From: Oren Laadan @ 2008-10-20 17:17 UTC (permalink / raw)
To: Andrey Mirkin
Cc: devel, containers, Pavel Emelyanov, linux-kernel, Dave Hansen
Andrey Mirkin wrote:
> On Saturday 18 October 2008 03:33 Dave Hansen wrote:
>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>> This patchset introduces kernel based checkpointing/restart as it is
>>> implemented in OpenVZ project. This patchset has limited functionality
>>> and are able to checkpoint/restart only single process. Recently Oren
>>> Laaden sent another kernel based implementation of checkpoint/restart.
>>> The main differences between this patchset and Oren's patchset are:
>> Hi Andrey,
>>
>> I'm curious what you want to happen with this patch set. Is there
>> something specific in Oren's set that deficient which you need
>> implemented? Are there some technical reasons you prefer this code?
>
> Hi Dave,
>
> Right now my patchset (v2) provides an ability to checkpoint and restart a
> group of processes. The process of checkpointing and restart can be initiated
> from external process (not from the process which should be checkpointed).
Both patchsets share the same design, namely be able to checkpoint and
restart multiple processes, with the operation initiated by an external
processes.
I deliberately left out the part that handles multiple processes to
keeps things simple for initial review, and until we decide on the
question of kernel- or user- based process creation on restart.
> Also I think that all the restart job (including process forking) should be
> done in kernel, as in this case we will not depend on user space and will be
> more secure. This is also implemented in my patchset.
I'm not convinced that creating the processes in the kernel makes it
more secure. Can you elaborate ? for the discussion, let's compare
these two basic scenarios:
1) container and processes are created in user space; each process
calls "sys_restart()" which eventually calls "do_restart()" that
does kernel-based restart.
2) container and processes are created in kernel space; each process
calls "do_restart()" to do kernel-based restart.
In fact, creating in user based makes it easier to enforce capabilities
and limits of the user. It also simplifies the debugging significantly,
and allows us to delegate the entire issue of containers and namespace
management back to user space, where it probably belongs.
On the other hand, doing it in kernel space likely to produce simpler
code for the creation of the processes.
Thanks,
Oren.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 16:37 ` Daniel Lezcano
@ 2008-10-20 17:23 ` Serge E. Hallyn
2008-10-21 0:18 ` Oren Laadan
2008-10-27 14:45 ` [Devel] " Andrey Mirkin
1 sibling, 1 reply; 69+ messages in thread
From: Serge E. Hallyn @ 2008-10-20 17:23 UTC (permalink / raw)
To: Daniel Lezcano
Cc: Oren Laadan, Louis.Rilling, containers, Dave Hansen,
linux-kernel, Andrey Mirkin
Quoting Daniel Lezcano (dlezcano@fr.ibm.com):
> Oren Laadan wrote:
> >
> > Daniel Lezcano wrote:
> >> Louis Rilling wrote:
> >>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
> >>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> >>>>> This patchset introduces kernel based checkpointing/restart as it is
> >>>>> implemented in OpenVZ project. This patchset has limited functionality and
> >>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
> >>>>> sent another kernel based implementation of checkpoint/restart. The main
> >>>>> differences between this patchset and Oren's patchset are:
> >>>> Hi Andrey,
> >>>>
> >>>> I'm curious what you want to happen with this patch set. Is there
> >>>> something specific in Oren's set that deficient which you need
> >>>> implemented? Are there some technical reasons you prefer this code?
> >>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
> >>> approach, shouldn't Oren answer the same questions with respect to Andrey's
> >>> patchset?
> >>>
> >>> I'm afraid that we are forgetting to take the best from both approaches...
> >> I agree with Louis.
> >>
> >> I played with Oren's patchset and tryed to port it on x86_64. I was able
> >> to sys_checkpoint/sys_restart but if you remove the restoring of the
> >> general registers, the restart still works. I am not an expert on asm,
> >> but my hypothesis is when we call sys_checkpoint the registers are saved
> >> on the stack by the syscall and when we restore the memory of the
> >> process, we restore the stack and the stacked registers are restored
> >> when exiting the sys_restart. That make me feel there is an important
> >> gap between external checkpoint and internal checkpoint.
> >
> > This is a misconception: my patches are not "internal checkpoint". My
> > patches are basically "external checkpoint" by design, which *also*
> > accommodates self-checkpointing (aka internal). The same holds for the
> > restart. The implementation is demonstrated with "self-checkpoint" to
> > avoid complicating things at this early stage of proof-of-concept.
>
> Yep, I read your patchset :)
>
> I just want to clarify what we want to demonstrate with this patchset
> for the proof-of-concept ? A self CR does not show what are the
> complicate parts of the CR, we are just showing we can dump the memory
> from the kernel and do setcontext/getcontext.
>
> We state at the container mini-summit on an approach:
>
> 1. Pre-dump
> 2. Freeze the container
> 3. Dump
> 4. Thaw/Kill the container
> 5. Post-dump
>
> We already have the freezer, and we can forget for now pre-dump and
> post-dump.
>
> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
> but conforming with these 5 points, but that means we have to do an
> external checkpoint.
Right, Oren, iiuc you are insisting that 'external checkpoint' and
'multiple task checkpoint' are the same thing. But they aren't.
Rather, I think that what we say is 'multiple tasks c/r' is what you say
should be done from user-space :)
So particularly given that your patchset seems to be in good shape,
I'd like to see external checkpoint explicitly supported. Please
just call me a dunce if v7 already works for that.
thanks,
-serge
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 17:23 ` Serge E. Hallyn
@ 2008-10-21 0:18 ` Oren Laadan
2008-10-21 0:58 ` Serge E. Hallyn
2008-10-21 13:24 ` Daniel Lezcano
0 siblings, 2 replies; 69+ messages in thread
From: Oren Laadan @ 2008-10-21 0:18 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: Daniel Lezcano, Louis.Rilling, containers, Dave Hansen,
linux-kernel, Andrey Mirkin
Serge E. Hallyn wrote:
> Quoting Daniel Lezcano (dlezcano@fr.ibm.com):
>> Oren Laadan wrote:
>>> Daniel Lezcano wrote:
>>>> Louis Rilling wrote:
>>>>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>>>>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>>>>>> This patchset introduces kernel based checkpointing/restart as it is
>>>>>>> implemented in OpenVZ project. This patchset has limited functionality and
>>>>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>>>>>> sent another kernel based implementation of checkpoint/restart. The main
>>>>>>> differences between this patchset and Oren's patchset are:
>>>>>> Hi Andrey,
>>>>>>
>>>>>> I'm curious what you want to happen with this patch set. Is there
>>>>>> something specific in Oren's set that deficient which you need
>>>>>> implemented? Are there some technical reasons you prefer this code?
>>>>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
>>>>> approach, shouldn't Oren answer the same questions with respect to Andrey's
>>>>> patchset?
>>>>>
>>>>> I'm afraid that we are forgetting to take the best from both approaches...
>>>> I agree with Louis.
>>>>
>>>> I played with Oren's patchset and tryed to port it on x86_64. I was able
>>>> to sys_checkpoint/sys_restart but if you remove the restoring of the
>>>> general registers, the restart still works. I am not an expert on asm,
>>>> but my hypothesis is when we call sys_checkpoint the registers are saved
>>>> on the stack by the syscall and when we restore the memory of the
>>>> process, we restore the stack and the stacked registers are restored
>>>> when exiting the sys_restart. That make me feel there is an important
>>>> gap between external checkpoint and internal checkpoint.
>>> This is a misconception: my patches are not "internal checkpoint". My
>>> patches are basically "external checkpoint" by design, which *also*
>>> accommodates self-checkpointing (aka internal). The same holds for the
>>> restart. The implementation is demonstrated with "self-checkpoint" to
>>> avoid complicating things at this early stage of proof-of-concept.
>> Yep, I read your patchset :)
>>
>> I just want to clarify what we want to demonstrate with this patchset
>> for the proof-of-concept ? A self CR does not show what are the
>> complicate parts of the CR, we are just showing we can dump the memory
>> from the kernel and do setcontext/getcontext.
>>
>> We state at the container mini-summit on an approach:
>>
>> 1. Pre-dump
>> 2. Freeze the container
>> 3. Dump
>> 4. Thaw/Kill the container
>> 5. Post-dump
>>
>> We already have the freezer, and we can forget for now pre-dump and
>> post-dump.
>>
>> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
>> but conforming with these 5 points, but that means we have to do an
>> external checkpoint.
>
> Right, Oren, iiuc you are insisting that 'external checkpoint' and
> 'multiple task checkpoint' are the same thing. But they aren't.
> Rather, I think that what we say is 'multiple tasks c/r' is what you say
> should be done from user-space :)
Then I don't explain myself clearly :)
The only thing I consider doing in user space is the creation of
the container, the namespaces and the processes.
I argue that "external checkpoint of a single process" is very few
lines of code away from "self checkpoint" that is in v7.
I'm not sure how you define "external restart" ? eventually, the
processes restart themselves. It is a question of how the processes
are created to begin with.
>
> So particularly given that your patchset seems to be in good shape,
> I'd like to see external checkpoint explicitly supported. Please
> just call me a dunce if v7 already works for that.
>
It seems like you want a single process to checkpoint a single (other)
process, and then a single process to start a single (other) process.
I tried to explicitly avoid dealing with the container (user space ?
kernel space ?) and with creating new processes (user space ? kernel
space ?).
Nevertheless, it's the _same_ code. All that is needed is to make the
checkpoint syscall refer to the other task instead of self, and the
restart should create a container and fork there, then call sys_restart.
I guess instead of repeating this argument over, I will go ahead and
post a patch on top of v7 to demonstrate this (without a container,
therefore without preserving the original pid).
Oren.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-21 0:18 ` Oren Laadan
@ 2008-10-21 0:58 ` Serge E. Hallyn
2008-10-21 13:24 ` Daniel Lezcano
1 sibling, 0 replies; 69+ messages in thread
From: Serge E. Hallyn @ 2008-10-21 0:58 UTC (permalink / raw)
To: Oren Laadan
Cc: Daniel Lezcano, Louis.Rilling, containers, Dave Hansen,
linux-kernel, Andrey Mirkin
Quoting Oren Laadan (orenl@cs.columbia.edu):
>
>
> Serge E. Hallyn wrote:
> > Quoting Daniel Lezcano (dlezcano@fr.ibm.com):
> >> Oren Laadan wrote:
> >>> Daniel Lezcano wrote:
> >>>> Louis Rilling wrote:
> >>>>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
> >>>>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> >>>>>>> This patchset introduces kernel based checkpointing/restart as it is
> >>>>>>> implemented in OpenVZ project. This patchset has limited functionality and
> >>>>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
> >>>>>>> sent another kernel based implementation of checkpoint/restart. The main
> >>>>>>> differences between this patchset and Oren's patchset are:
> >>>>>> Hi Andrey,
> >>>>>>
> >>>>>> I'm curious what you want to happen with this patch set. Is there
> >>>>>> something specific in Oren's set that deficient which you need
> >>>>>> implemented? Are there some technical reasons you prefer this code?
> >>>>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
> >>>>> approach, shouldn't Oren answer the same questions with respect to Andrey's
> >>>>> patchset?
> >>>>>
> >>>>> I'm afraid that we are forgetting to take the best from both approaches...
> >>>> I agree with Louis.
> >>>>
> >>>> I played with Oren's patchset and tryed to port it on x86_64. I was able
> >>>> to sys_checkpoint/sys_restart but if you remove the restoring of the
> >>>> general registers, the restart still works. I am not an expert on asm,
> >>>> but my hypothesis is when we call sys_checkpoint the registers are saved
> >>>> on the stack by the syscall and when we restore the memory of the
> >>>> process, we restore the stack and the stacked registers are restored
> >>>> when exiting the sys_restart. That make me feel there is an important
> >>>> gap between external checkpoint and internal checkpoint.
> >>> This is a misconception: my patches are not "internal checkpoint". My
> >>> patches are basically "external checkpoint" by design, which *also*
> >>> accommodates self-checkpointing (aka internal). The same holds for the
> >>> restart. The implementation is demonstrated with "self-checkpoint" to
> >>> avoid complicating things at this early stage of proof-of-concept.
> >> Yep, I read your patchset :)
> >>
> >> I just want to clarify what we want to demonstrate with this patchset
> >> for the proof-of-concept ? A self CR does not show what are the
> >> complicate parts of the CR, we are just showing we can dump the memory
> >> from the kernel and do setcontext/getcontext.
> >>
> >> We state at the container mini-summit on an approach:
> >>
> >> 1. Pre-dump
> >> 2. Freeze the container
> >> 3. Dump
> >> 4. Thaw/Kill the container
> >> 5. Post-dump
> >>
> >> We already have the freezer, and we can forget for now pre-dump and
> >> post-dump.
> >>
> >> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
> >> but conforming with these 5 points, but that means we have to do an
> >> external checkpoint.
> >
> > Right, Oren, iiuc you are insisting that 'external checkpoint' and
> > 'multiple task checkpoint' are the same thing. But they aren't.
> > Rather, I think that what we say is 'multiple tasks c/r' is what you say
> > should be done from user-space :)
>
> Then I don't explain myself clearly :)
>
> The only thing I consider doing in user space is the creation of
> the container, the namespaces and the processes.
That I understand.
> I argue that "external checkpoint of a single process" is very few
> lines of code away from "self checkpoint" that is in v7.
>
> I'm not sure how you define "external restart" ? eventually, the
If I ever said external restart, I actually meant external checkpoint.
I understand that a task should call sys_restart() itself.
> processes restart themselves. It is a question of how the processes
> are created to begin with.
>
> >
> > So particularly given that your patchset seems to be in good shape,
> > I'd like to see external checkpoint explicitly supported. Please
> > just call me a dunce if v7 already works for that.
> >
>
> It seems like you want a single process to checkpoint a single (other)
> process, and then a single process to start a single (other) process.
Yup.
> I tried to explicitly avoid dealing with the container (user space ?
> kernel space ?) and with creating new processes (user space ? kernel
> space ?).
And that's the right thing to do. But:
> Nevertheless, it's the _same_ code. All that is needed is to make the
I was under the impression that sys_checkpoint() on some other task's
pid and then restarting with that image would fail right now.
> checkpoint syscall refer to the other task instead of self, and the
> restart should create a container and fork there, then call sys_restart.
>
> I guess instead of repeating this argument over, I will go ahead and
> post a patch on top of v7 to demonstrate this (without a container,
Cool, thanks!
> therefore without preserving the original pid).
Yes, as i believe you said in another email earlier today, we have not
decided about how to restore the pid. Eric continues to argue for
playing games with /proc/sys/kernel/pid_max.
-serge
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 15:53 ` Oren Laadan
2008-10-20 16:37 ` Daniel Lezcano
2008-10-20 16:51 ` Serge E. Hallyn
@ 2008-10-21 9:36 ` Cedric Le Goater
2 siblings, 0 replies; 69+ messages in thread
From: Cedric Le Goater @ 2008-10-21 9:36 UTC (permalink / raw)
To: Oren Laadan
Cc: Daniel Lezcano, Louis.Rilling, containers, Dave Hansen,
linux-kernel, Andrey Mirkin
>> IMHO we should look at Dmitry patchset and merge the external checkpoint
>> code to Oren's patchset in order to checkpoint *one* process and have
>> the process to restart itself. At this point, we can begin to talk about
>> the restart itself, shall we have the kernel to fork the processes to be
>> restarted ? shall we fork from userspace and implement some mechanism to
>> have each processes to restart themselves ? etc...
>>
>
> In both approaches, processes restart themselves, in the sense that a
> process to be restarted eventually calls "do_restart()" (or equivalent).
>
> The only question is how processes are created. Andrew's patch creates
> everything inside the kernel. I would like to still give it a try outside
> the kernel. Everything is ready, except that we need a way to pre-select
> a PID for the new child... we never agreed on that one, did we ?
what do you mean ? like a clone_with_pid() routine ?
> If we go ahead with the kernel-based process creation, it's easy to merge
> it to the current patch-set.
Both solution are valid. Nevertheless, I would choose the solution
sharing existing code and being arch independent.
Now, we can start by duplicating code and see later how we could
share. But for mainline inclusion, I think that code reuse is a
faster path.
C.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-21 0:18 ` Oren Laadan
2008-10-21 0:58 ` Serge E. Hallyn
@ 2008-10-21 13:24 ` Daniel Lezcano
1 sibling, 0 replies; 69+ messages in thread
From: Daniel Lezcano @ 2008-10-21 13:24 UTC (permalink / raw)
To: Oren Laadan
Cc: Serge E. Hallyn, Louis.Rilling, containers, Dave Hansen,
linux-kernel, Andrey Mirkin
Oren Laadan wrote:
>
> Serge E. Hallyn wrote:
>> Quoting Daniel Lezcano (dlezcano@fr.ibm.com):
>>> Oren Laadan wrote:
>>>> Daniel Lezcano wrote:
>>>>> Louis Rilling wrote:
>>>>>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
>>>>>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
>>>>>>>> This patchset introduces kernel based checkpointing/restart as it is
>>>>>>>> implemented in OpenVZ project. This patchset has limited functionality and
>>>>>>>> are able to checkpoint/restart only single process. Recently Oren Laaden
>>>>>>>> sent another kernel based implementation of checkpoint/restart. The main
>>>>>>>> differences between this patchset and Oren's patchset are:
>>>>>>> Hi Andrey,
>>>>>>>
>>>>>>> I'm curious what you want to happen with this patch set. Is there
>>>>>>> something specific in Oren's set that deficient which you need
>>>>>>> implemented? Are there some technical reasons you prefer this code?
>>>>>> To be fair, and since (IIRC) the initial intent was to start with OpenVZ's
>>>>>> approach, shouldn't Oren answer the same questions with respect to Andrey's
>>>>>> patchset?
>>>>>>
>>>>>> I'm afraid that we are forgetting to take the best from both approaches...
>>>>> I agree with Louis.
>>>>>
>>>>> I played with Oren's patchset and tryed to port it on x86_64. I was able
>>>>> to sys_checkpoint/sys_restart but if you remove the restoring of the
>>>>> general registers, the restart still works. I am not an expert on asm,
>>>>> but my hypothesis is when we call sys_checkpoint the registers are saved
>>>>> on the stack by the syscall and when we restore the memory of the
>>>>> process, we restore the stack and the stacked registers are restored
>>>>> when exiting the sys_restart. That make me feel there is an important
>>>>> gap between external checkpoint and internal checkpoint.
>>>> This is a misconception: my patches are not "internal checkpoint". My
>>>> patches are basically "external checkpoint" by design, which *also*
>>>> accommodates self-checkpointing (aka internal). The same holds for the
>>>> restart. The implementation is demonstrated with "self-checkpoint" to
>>>> avoid complicating things at this early stage of proof-of-concept.
>>> Yep, I read your patchset :)
>>>
>>> I just want to clarify what we want to demonstrate with this patchset
>>> for the proof-of-concept ? A self CR does not show what are the
>>> complicate parts of the CR, we are just showing we can dump the memory
>>> from the kernel and do setcontext/getcontext.
>>>
>>> We state at the container mini-summit on an approach:
>>>
>>> 1. Pre-dump
>>> 2. Freeze the container
>>> 3. Dump
>>> 4. Thaw/Kill the container
>>> 5. Post-dump
>>>
>>> We already have the freezer, and we can forget for now pre-dump and
>>> post-dump.
>>>
>>> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
>>> but conforming with these 5 points, but that means we have to do an
>>> external checkpoint.
>> Right, Oren, iiuc you are insisting that 'external checkpoint' and
>> 'multiple task checkpoint' are the same thing. But they aren't.
>> Rather, I think that what we say is 'multiple tasks c/r' is what you say
>> should be done from user-space :)
>
> Then I don't explain myself clearly :)
>
> The only thing I consider doing in user space is the creation of
> the container, the namespaces and the processes.
>
> I argue that "external checkpoint of a single process" is very few
> lines of code away from "self checkpoint" that is in v7.
>
> I'm not sure how you define "external restart" ? eventually, the
> processes restart themselves. It is a question of how the processes
> are created to begin with.
>
>> So particularly given that your patchset seems to be in good shape,
>> I'd like to see external checkpoint explicitly supported. Please
>> just call me a dunce if v7 already works for that.
>>
>
> It seems like you want a single process to checkpoint a single (other)
> process, and then a single process to start a single (other) process.
>
> I tried to explicitly avoid dealing with the container (user space ?
> kernel space ?) and with creating new processes (user space ? kernel
> space ?).
>
> Nevertheless, it's the _same_ code. All that is needed is to make the
> checkpoint syscall refer to the other task instead of self, and the
> restart should create a container and fork there, then call sys_restart.
>
> I guess instead of repeating this argument over, I will go ahead and
> post a patch on top of v7 to demonstrate this (without a container,
> therefore without preserving the original pid).
Cedric made a patch for the external checkpoint:
http://lxc.sourceforge.net/patches/2.6.27/2.6.27-rc8-lxc1/0035-enable-external-checkpoint.patch
The main difference is you will need to freeze the process because it
will not block itself via a syscall (there is the freezer patchset).
For the restart, perhaps you can just do a process calling sys_restart
and so we delay the fork from user/kernel discussion, no ?
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 15:55 ` Dave Hansen
@ 2008-10-27 14:07 ` Andrey Mirkin
2008-10-27 14:39 ` Oren Laadan
2008-11-03 19:35 ` Oren Laadan
0 siblings, 2 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-10-27 14:07 UTC (permalink / raw)
To: Dave Hansen; +Cc: devel, containers, linux-kernel
On Monday 20 October 2008 19:55 Dave Hansen wrote:
> On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
> > Right now my patchset (v2) provides an ability to checkpoint and restart
> > a group of processes. The process of checkpointing and restart can be
> > initiated from external process (not from the process which should be
> > checkpointed).
>
> Absolutely. Oren's code does it this way to make for a smaller patch at
> first. The syscall takes a pid argument so it is surely expected to be
> expanded upon later.
>
> > Also I think that all the restart job (including process forking) should
> > be done in kernel, as in this case we will not depend on user space and
> > will be more secure. This is also implemented in my patchset.
>
> Do you think that this is an approach that Oren's patches are married
> to, or is this a "feature" we can add on later?
Well, AFAICS from Oren's patch set his approach is oriented on process
creation in user space. I think we should choose right now what approach will
be used for process creation.
We have two options here: fork processes in kernel or fork them in user space.
If process will be forked in user space, then there will be a gap when process
will be in user space and can be killed with received signal before entering
kernel. Also we will need a functionolity to create processes with predefined
PID. I think it is not very good to provide such ability to user space. That
is why we prefer in OpenVZ to do all the job in kernel.
> I don't care which patch set we end up sticking in the kernel. I'm
> trying to figure out which code we can more easily build upon in the
> future. The fact that Oren's or yours can't do certain little things
> right now does not bother me.
>
> Honestly, I'm a little more confident that everyone can work with Oren
> since he managed to get 7 revisions of his patch out and make some
> pretty large changes while in the same time the OpenVZ patch was only
> released twice. I'm not sure what has changed in the OpenVZ patch
> between releases, either.
That is my fault. I am working right now on my Ph.D, that is why my activity
is not very high. But now I hope I will have more time for that.
> Are there any reasons that you absolutely can not use the code Oren
> posted? Will it not fulfill your needs somehow? If so, could you
> please elaborate on how?
We have one major difference with Oren's code - how processes are created
during restr.
Right now I'm trying to port kernel process creation on top of Oren's patches.
I agree that working in collaboration will speed up merging of checkpointing
to mainstream.
Andrey
P.S.: Sorry for late reply, my mailer attached your e-mail to wrong thread.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 17:17 ` Oren Laadan
@ 2008-10-27 14:38 ` Andrey Mirkin
0 siblings, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-10-27 14:38 UTC (permalink / raw)
To: Oren Laadan; +Cc: devel, containers, linux-kernel, Dave Hansen
On Monday 20 October 2008 21:17 Oren Laadan wrote:
> Andrey Mirkin wrote:
> > On Saturday 18 October 2008 03:33 Dave Hansen wrote:
> >> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> >>> This patchset introduces kernel based checkpointing/restart as it is
> >>> implemented in OpenVZ project. This patchset has limited functionality
> >>> and are able to checkpoint/restart only single process. Recently Oren
> >>> Laaden sent another kernel based implementation of checkpoint/restart.
> >>> The main differences between this patchset and Oren's patchset are:
> >>
> >> Hi Andrey,
> >>
> >> I'm curious what you want to happen with this patch set. Is there
> >> something specific in Oren's set that deficient which you need
> >> implemented? Are there some technical reasons you prefer this code?
> >
> > Hi Dave,
> >
> > Right now my patchset (v2) provides an ability to checkpoint and restart
> > a group of processes. The process of checkpointing and restart can be
> > initiated from external process (not from the process which should be
> > checkpointed).
>
> Both patchsets share the same design, namely be able to checkpoint and
> restart multiple processes, with the operation initiated by an external
> processes.
>
> I deliberately left out the part that handles multiple processes to
> keeps things simple for initial review, and until we decide on the
> question of kernel- or user- based process creation on restart.
I agree that multiple process handling is not needed for initial review. But I
believe that the question with process creation should be discussed right
now.
> > Also I think that all the restart job (including process forking) should
> > be done in kernel, as in this case we will not depend on user space and
> > will be more secure. This is also implemented in my patchset.
>
> I'm not convinced that creating the processes in the kernel makes it
> more secure. Can you elaborate ? for the discussion, let's compare
> these two basic scenarios:
>
> 1) container and processes are created in user space; each process
> calls "sys_restart()" which eventually calls "do_restart()" that
> does kernel-based restart.
Well, in this case there will be a gap after process is returned from fork but
before entering kernel. During that time process can be killed by delivered
signal. Another drawback of this approach is that we will need to provide an
ability for user to create processes with predefined PID.
> 2) container and processes are created in kernel space; each process
> calls "do_restart()" to do kernel-based restart.
>
> In fact, creating in user based makes it easier to enforce capabilities
> and limits of the user. It also simplifies the debugging significantly,
> and allows us to delegate the entire issue of containers and namespace
> management back to user space, where it probably belongs.
>
> On the other hand, doing it in kernel space likely to produce simpler
> code for the creation of the processes.
You right here. Both approaches have pros and cons, but I think that kernel
approach has more advantages
Andrey
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-27 14:07 ` Andrey Mirkin
@ 2008-10-27 14:39 ` Oren Laadan
2008-10-30 6:02 ` Andrey Mirkin
2008-11-03 19:35 ` Oren Laadan
1 sibling, 1 reply; 69+ messages in thread
From: Oren Laadan @ 2008-10-27 14:39 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: Dave Hansen, containers, linux-kernel
Andrey Mirkin wrote:
> On Monday 20 October 2008 19:55 Dave Hansen wrote:
>> On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
>>> Right now my patchset (v2) provides an ability to checkpoint and restart
>>> a group of processes. The process of checkpointing and restart can be
>>> initiated from external process (not from the process which should be
>>> checkpointed).
>> Absolutely. Oren's code does it this way to make for a smaller patch at
>> first. The syscall takes a pid argument so it is surely expected to be
>> expanded upon later.
>>
>>> Also I think that all the restart job (including process forking) should
>>> be done in kernel, as in this case we will not depend on user space and
>>> will be more secure. This is also implemented in my patchset.
>> Do you think that this is an approach that Oren's patches are married
>> to, or is this a "feature" we can add on later?
>
> Well, AFAICS from Oren's patch set his approach is oriented on process
> creation in user space. I think we should choose right now what approach will
> be used for process creation.
This is inaccurate.
I intentionally did not address how processes will be created, by
simply allowing either way to be added to the patch.
I do agree that we probably want to decide how to do it. However,
there is also room to allow for both approaches, in a compatible
way, should we wish to explore both.
> We have two options here: fork processes in kernel or fork them in user space.
> If process will be forked in user space, then there will be a gap when process
> will be in user space and can be killed with received signal before entering
Why do we care about it ?
Why is there a difference if it is killed before or after entering
the kernel (e.g. user aborted restart, or kernel OOM kicked in) ?
> kernel. Also we will need a functionolity to create processes with predefined
> PID. I think it is not very good to provide such ability to user space. That
> is why we prefer in OpenVZ to do all the job in kernel.
This is the weak side of creating the processes in user space -
that we need such an interface. Note, however, that we can
easily "hide" it inside the interface of the sys_restart() call,
and restrict how it may be used.
Oren.
>
>> I don't care which patch set we end up sticking in the kernel. I'm
>> trying to figure out which code we can more easily build upon in the
>> future. The fact that Oren's or yours can't do certain little things
>> right now does not bother me.
>>
>> Honestly, I'm a little more confident that everyone can work with Oren
>> since he managed to get 7 revisions of his patch out and make some
>> pretty large changes while in the same time the OpenVZ patch was only
>> released twice. I'm not sure what has changed in the OpenVZ patch
>> between releases, either.
>
> That is my fault. I am working right now on my Ph.D, that is why my activity
> is not very high. But now I hope I will have more time for that.
>
>> Are there any reasons that you absolutely can not use the code Oren
>> posted? Will it not fulfill your needs somehow? If so, could you
>> please elaborate on how?
>
> We have one major difference with Oren's code - how processes are created
> during restr.
> Right now I'm trying to port kernel process creation on top of Oren's patches.
> I agree that working in collaboration will speed up merging of checkpointing
> to mainstream.
>
> Andrey
>
> P.S.: Sorry for late reply, my mailer attached your e-mail to wrong thread.
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-20 16:37 ` Daniel Lezcano
2008-10-20 17:23 ` Serge E. Hallyn
@ 2008-10-27 14:45 ` Andrey Mirkin
1 sibling, 0 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-10-27 14:45 UTC (permalink / raw)
To: devel
Cc: Daniel Lezcano, Oren Laadan, Louis.Rilling, containers,
linux-kernel, Dave Hansen
On Monday 20 October 2008 20:37 Daniel Lezcano wrote:
> Oren Laadan wrote:
> > Daniel Lezcano wrote:
> >> Louis Rilling wrote:
> >>> On Fri, Oct 17, 2008 at 04:33:03PM -0700, Dave Hansen wrote:
> >>>> On Wed, 2008-09-03 at 14:57 +0400, Andrey Mirkin wrote:
> >>>>> This patchset introduces kernel based checkpointing/restart as it is
> >>>>> implemented in OpenVZ project. This patchset has limited
> >>>>> functionality and are able to checkpoint/restart only single process.
> >>>>> Recently Oren Laaden sent another kernel based implementation of
> >>>>> checkpoint/restart. The main differences between this patchset and
> >>>>> Oren's patchset are:
> >>>>
> >>>> Hi Andrey,
> >>>>
> >>>> I'm curious what you want to happen with this patch set. Is there
> >>>> something specific in Oren's set that deficient which you need
> >>>> implemented? Are there some technical reasons you prefer this code?
> >>>
> >>> To be fair, and since (IIRC) the initial intent was to start with
> >>> OpenVZ's approach, shouldn't Oren answer the same questions with
> >>> respect to Andrey's patchset?
> >>>
> >>> I'm afraid that we are forgetting to take the best from both
> >>> approaches...
> >>
> >> I agree with Louis.
> >>
> >> I played with Oren's patchset and tryed to port it on x86_64. I was able
> >> to sys_checkpoint/sys_restart but if you remove the restoring of the
> >> general registers, the restart still works. I am not an expert on asm,
> >> but my hypothesis is when we call sys_checkpoint the registers are saved
> >> on the stack by the syscall and when we restore the memory of the
> >> process, we restore the stack and the stacked registers are restored
> >> when exiting the sys_restart. That make me feel there is an important
> >> gap between external checkpoint and internal checkpoint.
> >
> > This is a misconception: my patches are not "internal checkpoint". My
> > patches are basically "external checkpoint" by design, which *also*
> > accommodates self-checkpointing (aka internal). The same holds for the
> > restart. The implementation is demonstrated with "self-checkpoint" to
> > avoid complicating things at this early stage of proof-of-concept.
>
> Yep, I read your patchset :)
>
> I just want to clarify what we want to demonstrate with this patchset
> for the proof-of-concept ? A self CR does not show what are the
> complicate parts of the CR, we are just showing we can dump the memory
> from the kernel and do setcontext/getcontext.
>
> We state at the container mini-summit on an approach:
>
> 1. Pre-dump
> 2. Freeze the container
> 3. Dump
> 4. Thaw/Kill the container
> 5. Post-dump
>
> We already have the freezer, and we can forget for now pre-dump and
> post-dump.
>
> IMHO, for the proof-of-concept we should do a minimal CR (like you did),
> but conforming with these 5 points, but that means we have to do an
> external checkpoint.
>
> If the POC conforms with that, the patchset will be a little different
> and that will show what are the difficult part for restarting a process,
> especially to restart it at the frozen state :) and that will give an
> idea from 10000 feets of the big picture.
>
> > For multiple processes all that is needed is a container and a loop
> > on the checkpoint side, and a method to recreate processes on the
> > restart side. Andrew suggests to do it in kernel space, I still have
> > doubts.
>
> A question to Andrey, do you, in OpenVZ, restart "externally" or it is
> the first process of the pid namespace which calls sys_restart and then
> populates the pid namespace ?
In OpenVZ we are creating first task and namespaces from sys_restart.
Andrey
>
> > While I held out the multi-process part of the patch so far because I
> > was explicitly asked to do it, it seems like this would be a good time
> > to push it out and get feedback.
>
> IMHO it is too soon...
>
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
>
> _______________________________________________
> Devel mailing list
> Devel@openvz.org
> https://openvz.org/mailman/listinfo/devel
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-27 14:39 ` Oren Laadan
@ 2008-10-30 6:02 ` Andrey Mirkin
2008-10-30 11:47 ` Louis Rilling
` (2 more replies)
0 siblings, 3 replies; 69+ messages in thread
From: Andrey Mirkin @ 2008-10-30 6:02 UTC (permalink / raw)
To: Oren Laadan, Dave Hansen, Serge E. Hallyn, Cedric Le Goater,
Daniel Lezcano, Louis Rilling
Cc: containers, linux-kernel
On Monday 27 October 2008 17:39 Oren Laadan wrote:
> Andrey Mirkin wrote:
> > On Monday 20 October 2008 19:55 Dave Hansen wrote:
> >> On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
> >>> Right now my patchset (v2) provides an ability to checkpoint and
> >>> restart a group of processes. The process of checkpointing and restart
> >>> can be initiated from external process (not from the process which
> >>> should be checkpointed).
> >>
> >> Absolutely. Oren's code does it this way to make for a smaller patch at
> >> first. The syscall takes a pid argument so it is surely expected to be
> >> expanded upon later.
> >>
> >>> Also I think that all the restart job (including process forking)
> >>> should be done in kernel, as in this case we will not depend on user
> >>> space and will be more secure. This is also implemented in my patchset.
> >>
> >> Do you think that this is an approach that Oren's patches are married
> >> to, or is this a "feature" we can add on later?
> >
> > Well, AFAICS from Oren's patch set his approach is oriented on process
> > creation in user space. I think we should choose right now what approach
> > will be used for process creation.
>
> This is inaccurate.
>
> I intentionally did not address how processes will be created, by
> simply allowing either way to be added to the patch.
Yes, you right. Either way is possible with your patchset. But as I understand
in ZAP you are using user space process creation. No?
That is why I think that your design is more convenient for user process
creation.
> I do agree that we probably want to decide how to do it. However,
> there is also room to allow for both approaches, in a compatible
> way, should we wish to explore both.
Yes, we can implement both approaches. Do you think we really need this?
> > We have two options here: fork processes in kernel or fork them in user
> > space. If process will be forked in user space, then there will be a gap
> > when process will be in user space and can be killed with received signal
> > before entering
>
> Why do we care about it ?
> Why is there a difference if it is killed before or after entering
> the kernel (e.g. user aborted restart, or kernel OOM kicked in) ?
If one process is killed during restart then you can even do not notice that
(if processes are created from user space and then call sys_restart). And you
will get not the same state as before C/R.
> > kernel. Also we will need a functionolity to create processes with
> > predefined PID. I think it is not very good to provide such ability to
> > user space. That is why we prefer in OpenVZ to do all the job in kernel.
>
> This is the weak side of creating the processes in user space -
> that we need such an interface. Note, however, that we can
> easily "hide" it inside the interface of the sys_restart() call,
> and restrict how it may be used.
Of course we can "hide" it somehow, but anyway we will have a hole and that is
not good.
Anyway we should ask everyone what they think about user- and kernel- based
process creation.
Dave, Serge, Cedric, Daniel, Louis what do you think about that?
Andrey
> >> I don't care which patch set we end up sticking in the kernel. I'm
> >> trying to figure out which code we can more easily build upon in the
> >> future. The fact that Oren's or yours can't do certain little things
> >> right now does not bother me.
> >>
> >> Honestly, I'm a little more confident that everyone can work with Oren
> >> since he managed to get 7 revisions of his patch out and make some
> >> pretty large changes while in the same time the OpenVZ patch was only
> >> released twice. I'm not sure what has changed in the OpenVZ patch
> >> between releases, either.
> >
> > That is my fault. I am working right now on my Ph.D, that is why my
> > activity is not very high. But now I hope I will have more time for that.
> >
> >> Are there any reasons that you absolutely can not use the code Oren
> >> posted? Will it not fulfill your needs somehow? If so, could you
> >> please elaborate on how?
> >
> > We have one major difference with Oren's code - how processes are created
> > during restr.
> > Right now I'm trying to port kernel process creation on top of Oren's
> > patches. I agree that working in collaboration will speed up merging of
> > checkpointing to mainstream.
> >
> > Andrey
> >
> > P.S.: Sorry for late reply, my mailer attached your e-mail to wrong
> > thread. _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 6:02 ` Andrey Mirkin
@ 2008-10-30 11:47 ` Louis Rilling
2008-10-30 17:08 ` Dave Hansen
2008-10-30 17:45 ` Oren Laadan
2008-10-30 14:08 ` Serge E. Hallyn
2008-10-30 17:03 ` Dave Hansen
2 siblings, 2 replies; 69+ messages in thread
From: Louis Rilling @ 2008-10-30 11:47 UTC (permalink / raw)
To: Andrey Mirkin
Cc: Oren Laadan, Dave Hansen, Serge E. Hallyn, Cedric Le Goater,
Daniel Lezcano, containers, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2328 bytes --]
On Thu, Oct 30, 2008 at 10:02:44AM +0400, Andrey Mirkin wrote:
> > > kernel. Also we will need a functionolity to create processes with
> > > predefined PID. I think it is not very good to provide such ability to
> > > user space. That is why we prefer in OpenVZ to do all the job in kernel.
> >
> > This is the weak side of creating the processes in user space -
> > that we need such an interface. Note, however, that we can
> > easily "hide" it inside the interface of the sys_restart() call,
> > and restrict how it may be used.
>
> Of course we can "hide" it somehow, but anyway we will have a hole and that is
> not good.
>
> Anyway we should ask everyone what they think about user- and kernel- based
> process creation.
> Dave, Serge, Cedric, Daniel, Louis what do you think about that?
Frankly, I'm not convinced (yet) that one approach is better than the other one.
I only *tend* to prefer kernel-based, for the reasons explained below. I know
that there are arguments in favor of userspace (I've at least seen
security-related ones), but I let their authors detail them (again).
In Kerrighed this is kernel-based, and will remain kernel-based because we
checkpoint a distributed task tree, and want to restart it as mush as possible
with the same distribution. The distributed protocol used for restart is
currently too fragile and complex to rely on customized user-space
implementations. That said, if someone brings very good arguments in favor of
userspace implementations, we might consider changing this.
Without taking distributed restart into account, I also tend to prefer
kernel-based, mainly for two (not so strong) reasons:
1) this prevents userspace from doing weird things, like changing the task tree
and let the kernel detect it and deal with the mess this creates (think about
two threads being restarted in separate processes that do not even share their
parents). But one can argue that userspace can change the checkpoint image as
well, so that the kernel must check for such weird things anyway.
2) restart will be more efficient with respect to shared objects.
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 6:02 ` Andrey Mirkin
2008-10-30 11:47 ` Louis Rilling
@ 2008-10-30 14:08 ` Serge E. Hallyn
2008-10-30 17:03 ` Dave Hansen
2 siblings, 0 replies; 69+ messages in thread
From: Serge E. Hallyn @ 2008-10-30 14:08 UTC (permalink / raw)
To: Andrey Mirkin
Cc: Oren Laadan, Dave Hansen, Cedric Le Goater, Daniel Lezcano,
Louis Rilling, containers, linux-kernel
Quoting Andrey Mirkin (major@openvz.org):
> Anyway we should ask everyone what they think about user- and kernel- based
> process creation.
> Dave, Serge, Cedric, Daniel, Louis what do you think about that?
I prefer kernel. Pretty sure Dave prefers user-space.
I'd say the thing to do is push the core API that supports single-thread
c/r. Then try to push the patch to do recreate processes from the
kernel, and let the arguments for and against that stand on their own.
-serge
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 6:02 ` Andrey Mirkin
2008-10-30 11:47 ` Louis Rilling
2008-10-30 14:08 ` Serge E. Hallyn
@ 2008-10-30 17:03 ` Dave Hansen
2 siblings, 0 replies; 69+ messages in thread
From: Dave Hansen @ 2008-10-30 17:03 UTC (permalink / raw)
To: Andrey Mirkin
Cc: Oren Laadan, Serge E. Hallyn, Cedric Le Goater, Daniel Lezcano,
Louis Rilling, containers, linux-kernel
On Thu, 2008-10-30 at 10:02 +0400, Andrey Mirkin wrote:
> Anyway we should ask everyone what they think about user- and kernel- based
> process creation.
> Dave, Serge, Cedric, Daniel, Louis what do you think about that?
My worry is where a single sys_restart() plus in-kernel process creation
takes us.
In practice, what do we do? Do we single-thread the entire restore
process? Or, do we do in-kernel process creation and have multiple
kernel threads trying to read out of different points in the checkpoint
file, trying to restore all their own states in parallel? Does that
mean that we can't in practice restore from a fd like a pipe or a
network socket?
In the same way, if we *do* create the processes in userspace, how do we
do _that_? Do we just fork() and sleep() until the kernel comes along
and blows our state away? How does the kernel process doing the
restoring tell userspace how many things to fork? How do we match these
new userspace processes up with the ones coming out of the checkpoint
process?
To me, it's just way too early to talk about this stuff. Both
approaches have their issues, and I'm yet to see the differences
manifested in code so I can really sink my teeth into them
-- Dave
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 11:47 ` Louis Rilling
@ 2008-10-30 17:08 ` Dave Hansen
2008-10-30 18:01 ` Louis Rilling
2008-10-30 17:45 ` Oren Laadan
1 sibling, 1 reply; 69+ messages in thread
From: Dave Hansen @ 2008-10-30 17:08 UTC (permalink / raw)
To: Louis.Rilling
Cc: Andrey Mirkin, containers, linux-kernel, Cedric Le Goater,
Daniel Lezcano
On Thu, 2008-10-30 at 12:47 +0100, Louis Rilling wrote:
> 1) this prevents userspace from doing weird things, like changing the task tree
> and let the kernel detect it and deal with the mess this creates (think about
> two threads being restarted in separate processes that do not even share their
> parents). But one can argue that userspace can change the checkpoint image as
> well, so that the kernel must check for such weird things anyway.
To me, this is one of the strongest arguments out there for doing
restart as much as possible with existing user<->kernel APIs. Having
the kernel detect and clean up userspace's messes is not going to work.
We might as well just do things in the kernel rather than do that.
What we *should* do is leverage all of the existing APIs that we already
have instead of creating completely new code paths into which my butter
fingers can introduce new kernel bugs.
> 2) restart will be more efficient with respect to shared objects.
Can you quantify this? Which objects? How much more efficient?
-- Dave
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 11:47 ` Louis Rilling
2008-10-30 17:08 ` Dave Hansen
@ 2008-10-30 17:45 ` Oren Laadan
2008-10-30 18:14 ` Louis Rilling
1 sibling, 1 reply; 69+ messages in thread
From: Oren Laadan @ 2008-10-30 17:45 UTC (permalink / raw)
To: Louis.Rilling
Cc: Andrey Mirkin, Dave Hansen, Serge E. Hallyn, Cedric Le Goater,
Daniel Lezcano, containers, linux-kernel
Louis Rilling wrote:
> On Thu, Oct 30, 2008 at 10:02:44AM +0400, Andrey Mirkin wrote:
>>>> kernel. Also we will need a functionolity to create processes with
>>>> predefined PID. I think it is not very good to provide such ability to
>>>> user space. That is why we prefer in OpenVZ to do all the job in kernel.
>>> This is the weak side of creating the processes in user space -
>>> that we need such an interface. Note, however, that we can
>>> easily "hide" it inside the interface of the sys_restart() call,
>>> and restrict how it may be used.
>> Of course we can "hide" it somehow, but anyway we will have a hole and that is
>> not good.
>>
>> Anyway we should ask everyone what they think about user- and kernel- based
>> process creation.
>> Dave, Serge, Cedric, Daniel, Louis what do you think about that?
>
> Frankly, I'm not convinced (yet) that one approach is better than the other one.
> I only *tend* to prefer kernel-based, for the reasons explained below. I know
> that there are arguments in favor of userspace (I've at least seen
> security-related ones), but I let their authors detail them (again).
I'm not convinced either. I think both implementation can eventually
work well.
>
> In Kerrighed this is kernel-based, and will remain kernel-based because we
> checkpoint a distributed task tree, and want to restart it as mush as possible
> with the same distribution. The distributed protocol used for restart is
> currently too fragile and complex to rely on customized user-space
> implementations. That said, if someone brings very good arguments in favor of
> userspace implementations, we might consider changing this.
Zap also has distributed checkpoint which does not require strict
kernel-side ordering. Do you need that because you do SSI ?
>
> Without taking distributed restart into account, I also tend to prefer
> kernel-based, mainly for two (not so strong) reasons:
> 1) this prevents userspace from doing weird things, like changing the task tree
> and let the kernel detect it and deal with the mess this creates (think about
> two threads being restarted in separate processes that do not even share their
> parents). But one can argue that userspace can change the checkpoint image as
> well, so that the kernel must check for such weird things anyway.
I don't really buy this argument. First, as you say, user can change
the checkpoint image file. Second, you can verify in the kernel that
the real relationships of the processes match those specified (and
expected from) the image file. That's pretty straightforward.
> 2) restart will be more efficient with respect to shared objects.
Can you elaborate on this ? In what sense "more efficient" ?
Note that the topic in question is not whether to do the entire restart
from user space (and I argue that most work should be done in the kernel),
but rather whether process creation (and only that) should be done in
kernel or user space.
Quick thoughts of pros/cons of each approach are:
user space:
+ re-use existing api (fork)
+ easier to debug
+ will allow 'handmade' resources restart: it was mentioned before that
one may want to reattach stdout to a different place after restart; a
user based restart of processes can make this much easier: e.g. the
user process can create the alternative resources, give them to the
kernel and only then call sys_restart)
+ arch-independent code
- a bit slower than in kernel space
- requires a clone-with-specific-pid syscall or interface
kernel space:
+ a bit easier to control everything
+ a bit faster than user space
+ no need for user-visible interface for clone-with-...
- arch-dependent code
- needs special code to fight 'fork-bomb'
So, I'm not convinced, and I even think there may be room to both, for
the time being. I volunteer to support the user-space alternative while
we make up our minds.
Oren.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 17:08 ` Dave Hansen
@ 2008-10-30 18:01 ` Louis Rilling
2008-10-30 18:28 ` Oren Laadan
0 siblings, 1 reply; 69+ messages in thread
From: Louis Rilling @ 2008-10-30 18:01 UTC (permalink / raw)
To: Dave Hansen
Cc: Daniel Lezcano, containers, Cedric Le Goater, Andrey Mirkin,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3039 bytes --]
On Thu, Oct 30, 2008 at 10:08:44AM -0700, Dave Hansen wrote:
> On Thu, 2008-10-30 at 12:47 +0100, Louis Rilling wrote:
> > 1) this prevents userspace from doing weird things, like changing the task tree
> > and let the kernel detect it and deal with the mess this creates (think about
> > two threads being restarted in separate processes that do not even share their
> > parents). But one can argue that userspace can change the checkpoint image as
> > well, so that the kernel must check for such weird things anyway.
>
> To me, this is one of the strongest arguments out there for doing
> restart as much as possible with existing user<->kernel APIs. Having
> the kernel detect and clean up userspace's messes is not going to work.
> We might as well just do things in the kernel rather than do that.
>
> What we *should* do is leverage all of the existing APIs that we already
> have instead of creating completely new code paths into which my butter
> fingers can introduce new kernel bugs.
>
> > 2) restart will be more efficient with respect to shared objects.
>
> Can you quantify this? Which objects? How much more efficient?
Quantify? No. I expect that investigating both approaches will show us numbers.
Unless Oren already has some?
Which objects? I think that two kinds will especially matter: objects usually
shared only inside a thread group (mm_struct, fs_struct, files_struct,
signal_struct and sighand_struct), and individual file descriptors. The point is
to avoid creating new structures before destroying them because the restarted
task shares them with a previously restarted one.
Concerning individual file descriptors, limiting the number of open files before
calling sys_restart() may avoid these useless creations/destructions (actually
the "useless" work mainly consists in managing ref counts since file descriptors
are shared after fork()).
Concerning thread-shared structures, it is probably easy for userspace to guess
which clone flags to use when restarting threads, but
1) kernel-space will have to check that the sharing is correct anyway, and
2) kernel-space will have to fix it anyway if structures are not shared in an
obvious manner between tasks (think about A creating B with shared files_struct,
B creating C with shared files_struct, B unsharing its files_struct, and then
checkpoint).
So, with a userspace implementation, useless structures will be created anyway,
and optimizing the common cases (regular threads) just duplicates kernel's work
of checking which shared structure to use for each task to restart.
With a kernel-space implementation, all useless creations can be avoided, and no
duplicate work is needed.
That said, numbers may show us that useless creations are not so
time-consuming, but we won't know before seeing them...
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 17:45 ` Oren Laadan
@ 2008-10-30 18:14 ` Louis Rilling
2008-10-30 18:32 ` Oren Laadan
0 siblings, 1 reply; 69+ messages in thread
From: Louis Rilling @ 2008-10-30 18:14 UTC (permalink / raw)
To: Oren Laadan
Cc: Andrey Mirkin, Dave Hansen, Serge E. Hallyn, Cedric Le Goater,
Daniel Lezcano, containers, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3710 bytes --]
On Thu, Oct 30, 2008 at 01:45:25PM -0400, Oren Laadan wrote:
>
>
> Louis Rilling wrote:
> > In Kerrighed this is kernel-based, and will remain kernel-based because we
> > checkpoint a distributed task tree, and want to restart it as mush as possible
> > with the same distribution. The distributed protocol used for restart is
> > currently too fragile and complex to rely on customized user-space
> > implementations. That said, if someone brings very good arguments in favor of
> > userspace implementations, we might consider changing this.
>
> Zap also has distributed checkpoint which does not require strict
> kernel-side ordering. Do you need that because you do SSI ?
Yes. Tasks from different nodes have parent-children, session leader, etc.
relationships, and the distributed management of struct pid lifecycle is a bit
touchy too. By the way, splitting the checkpoint image in one file for each task
helps us a lot to make restart parallel, because it is more efficient for the file
system to handle parallel reads of different files from different nodes than
parallel reads on a single file descriptor from different nodes.
>
> >
> > Without taking distributed restart into account, I also tend to prefer
> > kernel-based, mainly for two (not so strong) reasons:
> > 1) this prevents userspace from doing weird things, like changing the task tree
> > and let the kernel detect it and deal with the mess this creates (think about
> > two threads being restarted in separate processes that do not even share their
> > parents). But one can argue that userspace can change the checkpoint image as
> > well, so that the kernel must check for such weird things anyway.
>
> I don't really buy this argument. First, as you say, user can change
> the checkpoint image file. Second, you can verify in the kernel that
> the real relationships of the processes match those specified (and
> expected from) the image file. That's pretty straightforward.
>
> > 2) restart will be more efficient with respect to shared objects.
>
> Can you elaborate on this ? In what sense "more efficient" ?
>
> Note that the topic in question is not whether to do the entire restart
> from user space (and I argue that most work should be done in the kernel),
> but rather whether process creation (and only that) should be done in
> kernel or user space.
See my answer to Dave.
>
> Quick thoughts of pros/cons of each approach are:
>
> user space:
>
> + re-use existing api (fork)
> + easier to debug
> + will allow 'handmade' resources restart: it was mentioned before that
> one may want to reattach stdout to a different place after restart; a
> user based restart of processes can make this much easier: e.g. the
> user process can create the alternative resources, give them to the
> kernel and only then call sys_restart)
> + arch-independent code
>
> - a bit slower than in kernel space
> - requires a clone-with-specific-pid syscall or interface
>
> kernel space:
>
> + a bit easier to control everything
> + a bit faster than user space
> + no need for user-visible interface for clone-with-...
>
> - arch-dependent code
> - needs special code to fight 'fork-bomb'
>
> So, I'm not convinced, and I even think there may be room to both, for
> the time being. I volunteer to support the user-space alternative while
> we make up our minds.
Yes, I hope that investigating both approaches will give us stronger arguments.
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 18:01 ` Louis Rilling
@ 2008-10-30 18:28 ` Oren Laadan
0 siblings, 0 replies; 69+ messages in thread
From: Oren Laadan @ 2008-10-30 18:28 UTC (permalink / raw)
To: Louis.Rilling
Cc: Dave Hansen, linux-kernel, Daniel Lezcano, Cedric Le Goater,
containers, Andrey Mirkin
Louis Rilling wrote:
> On Thu, Oct 30, 2008 at 10:08:44AM -0700, Dave Hansen wrote:
>> On Thu, 2008-10-30 at 12:47 +0100, Louis Rilling wrote:
>>> 1) this prevents userspace from doing weird things, like changing the task tree
>>> and let the kernel detect it and deal with the mess this creates (think about
>>> two threads being restarted in separate processes that do not even share their
>>> parents). But one can argue that userspace can change the checkpoint image as
>>> well, so that the kernel must check for such weird things anyway.
>> To me, this is one of the strongest arguments out there for doing
>> restart as much as possible with existing user<->kernel APIs. Having
>> the kernel detect and clean up userspace's messes is not going to work.
>> We might as well just do things in the kernel rather than do that.
>>
>> What we *should* do is leverage all of the existing APIs that we already
>> have instead of creating completely new code paths into which my butter
>> fingers can introduce new kernel bugs.
>>
>>> 2) restart will be more efficient with respect to shared objects.
>> Can you quantify this? Which objects? How much more efficient?
>
> Quantify? No. I expect that investigating both approaches will show us numbers.
> Unless Oren already has some?
I do have some. it's pretty quick :) see the usenix 2007 paper...
the new implementation will be faster, though.
>
> Which objects? I think that two kinds will especially matter: objects usually
> shared only inside a thread group (mm_struct, fs_struct, files_struct,
> signal_struct and sighand_struct), and individual file descriptors. The point is
> to avoid creating new structures before destroying them because the restarted
> task shares them with a previously restarted one.
all the forks in the user space will be done with CLONE_VM etc, to avoid
exactly that sort of overhead.
in any event, my experience is that this is not the dominant factor in the
restart time.
>
> Concerning individual file descriptors, limiting the number of open files before
> calling sys_restart() may avoid these useless creations/destructions (actually
> the "useless" work mainly consists in managing ref counts since file descriptors
> are shared after fork()).
>
> Concerning thread-shared structures, it is probably easy for userspace to guess
> which clone flags to use when restarting threads, but
> 1) kernel-space will have to check that the sharing is correct anyway, and
ok. that's not a lot of work :p
(see more below)
> 2) kernel-space will have to fix it anyway if structures are not shared in an
> obvious manner between tasks (think about A creating B with shared files_struct,
> B creating C with shared files_struct, B unsharing its files_struct, and then
> checkpoint).
>
> So, with a userspace implementation, useless structures will be created anyway,
> and optimizing the common cases (regular threads) just duplicates kernel's work
> of checking which shared structure to use for each task to restart.
> With a kernel-space implementation, all useless creations can be avoided, and no
> duplicate work is needed.
they can also be avoided in user space - you "optimistically" create everything
shared to begin with, and in the kernel (inside sys_restart) you "unshare" and
create the necessary resources on demand - just like you would do with kernel
based process creation.
in this case, the extra work is only ref-counting, and then sys_restart will
unconditionally attach the right shared resource to the restarting process
(the "right" shared resource will be found, of course, in the shared pool).
this way, you don't even need to check what the user gave you - you simply
ignore overwrite it.
>
> That said, numbers may show us that useless creations are not so
> time-consuming, but we won't know before seeing them...
yes, odds are that you are right.
Oren
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 18:14 ` Louis Rilling
@ 2008-10-30 18:32 ` Oren Laadan
2008-10-31 10:37 ` Louis Rilling
0 siblings, 1 reply; 69+ messages in thread
From: Oren Laadan @ 2008-10-30 18:32 UTC (permalink / raw)
To: Louis.Rilling
Cc: Andrey Mirkin, Dave Hansen, Serge E. Hallyn, Cedric Le Goater,
Daniel Lezcano, containers, linux-kernel
Louis Rilling wrote:
> On Thu, Oct 30, 2008 at 01:45:25PM -0400, Oren Laadan wrote:
>>
>> Louis Rilling wrote:
>>> In Kerrighed this is kernel-based, and will remain kernel-based because we
>>> checkpoint a distributed task tree, and want to restart it as mush as possible
>>> with the same distribution. The distributed protocol used for restart is
>>> currently too fragile and complex to rely on customized user-space
>>> implementations. That said, if someone brings very good arguments in favor of
>>> userspace implementations, we might consider changing this.
>> Zap also has distributed checkpoint which does not require strict
>> kernel-side ordering. Do you need that because you do SSI ?
>
> Yes. Tasks from different nodes have parent-children, session leader, etc.
> relationships, and the distributed management of struct pid lifecycle is a bit
> touchy too. By the way, splitting the checkpoint image in one file for each task
> helps us a lot to make restart parallel, because it is more efficient for the file
> system to handle parallel reads of different files from different nodes than
> parallel reads on a single file descriptor from different nodes.
You can also make parallel restart work with the single stream, without
much effort. Particularly if you store everything on the file system.
In both cases, the limiting factor is shared resources - where one task
cannot proceed with checkpoint because it waits for another task to first
(re)create that resource.
[...]
Oren.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-30 18:32 ` Oren Laadan
@ 2008-10-31 10:37 ` Louis Rilling
0 siblings, 0 replies; 69+ messages in thread
From: Louis Rilling @ 2008-10-31 10:37 UTC (permalink / raw)
To: Oren Laadan
Cc: Andrey Mirkin, Dave Hansen, Serge E. Hallyn, Cedric Le Goater,
Daniel Lezcano, containers, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2354 bytes --]
On Thu, Oct 30, 2008 at 02:32:51PM -0400, Oren Laadan wrote:
>
>
> Louis Rilling wrote:
> > On Thu, Oct 30, 2008 at 01:45:25PM -0400, Oren Laadan wrote:
> >>
> >> Louis Rilling wrote:
> >>> In Kerrighed this is kernel-based, and will remain kernel-based because we
> >>> checkpoint a distributed task tree, and want to restart it as mush as possible
> >>> with the same distribution. The distributed protocol used for restart is
> >>> currently too fragile and complex to rely on customized user-space
> >>> implementations. That said, if someone brings very good arguments in favor of
> >>> userspace implementations, we might consider changing this.
> >> Zap also has distributed checkpoint which does not require strict
> >> kernel-side ordering. Do you need that because you do SSI ?
> >
> > Yes. Tasks from different nodes have parent-children, session leader, etc.
> > relationships, and the distributed management of struct pid lifecycle is a bit
> > touchy too. By the way, splitting the checkpoint image in one file for each task
> > helps us a lot to make restart parallel, because it is more efficient for the file
> > system to handle parallel reads of different files from different nodes than
> > parallel reads on a single file descriptor from different nodes.
>
> You can also make parallel restart work with the single stream, without
> much effort. Particularly if you store everything on the file system.
Sure we can use a single stream, since we already share file descriptors accross
nodes. But the distributed synchronization of the file pointer is costly
compared to having each node access different files. This way we push the
parallelization bottelneck down to the file system rather than in the
distributed VFS layer.
>
> In both cases, the limiting factor is shared resources - where one task
> cannot proceed with checkpoint because it waits for another task to first
> (re)create that resource.
We just try to avoid other bottlenecks :) And besides file descriptors, shared
resources are as common as multi-threaded programs, which are not the majority
of the workloads we can address.
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: [Devel] Re: [PATCH 0/9] OpenVZ kernel based checkpointing/restart
2008-10-27 14:07 ` Andrey Mirkin
2008-10-27 14:39 ` Oren Laadan
@ 2008-11-03 19:35 ` Oren Laadan
1 sibling, 0 replies; 69+ messages in thread
From: Oren Laadan @ 2008-11-03 19:35 UTC (permalink / raw)
To: Andrey Mirkin; +Cc: Dave Hansen, containers, linux-kernel
Andrey Mirkin wrote:
> On Monday 20 October 2008 19:55 Dave Hansen wrote:
>> On Mon, 2008-10-20 at 16:14 +0400, Andrey Mirkin wrote:
>>> Right now my patchset (v2) provides an ability to checkpoint and restart
>>> a group of processes. The process of checkpointing and restart can be
>>> initiated from external process (not from the process which should be
>>> checkpointed).
>> Absolutely. Oren's code does it this way to make for a smaller patch at
>> first. The syscall takes a pid argument so it is surely expected to be
>> expanded upon later.
>>
>>> Also I think that all the restart job (including process forking) should
>>> be done in kernel, as in this case we will not depend on user space and
>>> will be more secure. This is also implemented in my patchset.
>> Do you think that this is an approach that Oren's patches are married
>> to, or is this a "feature" we can add on later?
>
> Well, AFAICS from Oren's patch set his approach is oriented on process
> creation in user space. I think we should choose right now what approach will
> be used for process creation.
> We have two options here: fork processes in kernel or fork them in user space.
> If process will be forked in user space, then there will be a gap when process
> will be in user space and can be killed with received signal before entering
> kernel. Also we will need a functionolity to create processes with predefined
> PID. I think it is not very good to provide such ability to user space. That
Rethinking this -- if the user wishes she can construct a suitable
checkpoint image that would do exactly that. It takes more effort than
using a system call, but the result is similar.
What I had in mind for that special clone-with-pid is to restrict when
it can be used (e.g. when the container is in a "restarting" state or
something like that.
[...]
Oren.
^ permalink raw reply [flat|nested] 69+ messages in thread
end of thread, other threads:[~2008-11-03 19:36 UTC | newest]
Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-03 10:57 [PATCH 0/9] OpenVZ kernel based checkpointing/restart Andrey Mirkin
2008-09-03 10:57 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Andrey Mirkin
2008-09-03 10:57 ` [PATCH 2/9] Make checkpoint/restart functionality modular Andrey Mirkin
2008-09-03 10:57 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Andrey Mirkin
2008-09-03 10:57 ` [PATCH 4/9] Introduce container dump function Andrey Mirkin
2008-09-03 10:57 ` [PATCH 5/9] Introduce function to dump process Andrey Mirkin
2008-09-03 10:57 ` [PATCH 6/9] Introduce functions to dump mm Andrey Mirkin
2008-09-03 10:57 ` [PATCH 7/9] Introduce function for restarting a container Andrey Mirkin
2008-09-03 10:57 ` [PATCH 8/9] Introduce functions to restart a process Andrey Mirkin
2008-09-03 10:57 ` [PATCH 9/9] Introduce functions to restore mm Andrey Mirkin
2008-09-03 14:32 ` [PATCH 8/9] Introduce functions to restart a process Louis Rilling
2008-09-13 17:34 ` Pavel Machek
2008-09-03 14:17 ` [PATCH 6/9] Introduce functions to dump mm Louis Rilling
2008-09-03 14:23 ` [PATCH 4/9] Introduce container dump function Serge E. Hallyn
2008-09-03 14:45 ` Andrey Mirkin
2008-09-03 12:29 ` [PATCH 3/9] Introduce context structure needed during checkpointing/restart Matthieu Fertré
2008-09-03 14:11 ` Andrey Mirkin
2008-09-03 13:56 ` Louis Rilling
2008-09-03 14:07 ` Andrey Mirkin
2008-09-03 14:13 ` Cedric Le Goater
2008-09-03 14:29 ` Andrey Mirkin
2008-09-03 14:27 ` [PATCH 2/9] Make checkpoint/restart functionality modular Serge E. Hallyn
2008-09-03 14:51 ` Andrey Mirkin
2008-09-03 11:44 ` [PATCH 1/9] Introduce trivial sys_checkpoint and sys_restore system calls Cedric Le Goater
2008-09-03 13:05 ` [Devel] " Andrey Mirkin
2008-09-03 12:28 ` [PATCH 0/9] OpenVZ kernel based checkpointing/restart Cedric Le Goater
2008-09-03 13:59 ` [Devel] " Andrey Mirkin
2008-09-04 22:55 ` Dave Hansen
2008-09-03 14:18 ` Serge E. Hallyn
2008-09-03 13:49 ` Louis Rilling
2008-09-03 14:06 ` Louis Rilling
2008-09-03 14:19 ` Andrey Mirkin
2008-09-03 14:26 ` Cedric Le Goater
2008-09-03 14:53 ` Andrey Mirkin
2008-09-04 8:14 ` Oren Laadan
2008-09-04 14:05 ` Dave Hansen
2008-10-17 23:33 ` Dave Hansen
2008-10-20 11:10 ` Louis Rilling
2008-10-20 13:25 ` Daniel Lezcano
2008-10-20 13:48 ` Cedric Le Goater
2008-10-20 13:49 ` Daniel Lezcano
2008-10-20 15:53 ` Oren Laadan
2008-10-20 16:37 ` Daniel Lezcano
2008-10-20 17:23 ` Serge E. Hallyn
2008-10-21 0:18 ` Oren Laadan
2008-10-21 0:58 ` Serge E. Hallyn
2008-10-21 13:24 ` Daniel Lezcano
2008-10-27 14:45 ` [Devel] " Andrey Mirkin
2008-10-20 16:51 ` Serge E. Hallyn
2008-10-21 9:36 ` Cedric Le Goater
2008-10-20 16:36 ` Dave Hansen
2008-10-20 12:14 ` [Devel] " Andrey Mirkin
2008-10-20 15:55 ` Dave Hansen
2008-10-27 14:07 ` Andrey Mirkin
2008-10-27 14:39 ` Oren Laadan
2008-10-30 6:02 ` Andrey Mirkin
2008-10-30 11:47 ` Louis Rilling
2008-10-30 17:08 ` Dave Hansen
2008-10-30 18:01 ` Louis Rilling
2008-10-30 18:28 ` Oren Laadan
2008-10-30 17:45 ` Oren Laadan
2008-10-30 18:14 ` Louis Rilling
2008-10-30 18:32 ` Oren Laadan
2008-10-31 10:37 ` Louis Rilling
2008-10-30 14:08 ` Serge E. Hallyn
2008-10-30 17:03 ` Dave Hansen
2008-11-03 19:35 ` Oren Laadan
2008-10-20 17:17 ` Oren Laadan
2008-10-27 14:38 ` Andrey Mirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).