LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH 0/4] coredump: core dump masking support v4 @ 2007-03-02 4:41 Kawai, Hidehiro 2007-03-02 4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro ` (4 more replies) 0 siblings, 5 replies; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-02 4:41 UTC (permalink / raw) To: Andrew Morton, kernel list Cc: Pavel Machek, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Hi, This patch series is version 4 of the core dump masking feature, which provides a per-process flag not to dump anonymous shared memory segments. In the previous version, the flag value was passed around the core dump functions as an argument to use the same setting while dumping. In this version, instead of doing that, a r/w semaphore prevents the setting from being changed while dumping. This patch series can be applied against 2.6.20-mm2. The supported core file formats are ELF and ELF-FDPIC. ELF has been tested, but ELF-FDPIC has not been built and tested because I don't have the test environment. Background: Some software programs share huge memory among hundreds of processes. If a failure occurs on one of these processes, they can be signaled by a monitoring process to generate core files and restart the service. However, it can develop into a system-wide failure such as system slow down for a long time and disk space shortage because the total size of the core files is very huge! To avoid the above situation we can limit the core file size by setrlimit(2) or ulimit(1). But this method can lose important data such as stack because core dumping is terminated halfway. So I suggest keeping shared memory segments from being dumped for particular processes. Because the shared memory attached to processes is common in them, we don't need to dump the shared memory every time. Usage: Get all shared memory segments of pid 1234 not to dump: $ echo 1 > /proc/1234/coredump_omit_anonymous_shared When a new process is created, the process inherits the flag status from its parent. It is useful to set the core dump flags before the program runs. For example: $ echo 1 > /proc/self/coredump_omit_anonymous_shared $ ./some_program ChangeLog: v4: - in maydump(), retrieve the core dump setting from mm_struct directly, instead of its additional argument - writing to /proc/<pid>/coredump_omit_anonymous_shared returns EBUSY while core dumping. v3: http://groups.google.com/group/linux.kernel/browse_frm/thread/706d2ae41c1cb2de/ - remove `/proc/<pid>/core_flags' proc entry - add `/proc/<pid>/coredump_anonymous_shared' as a named flag - remove kernel.core_flags_enable sysctl parameter v2: http://groups.google.com/group/linux.kernel/browse_frm/thread/cb254465971d4a42/ http://groups.google.com/group/linux.kernel/browse_frm/thread/da78f2702e06fa11/ - rename `coremask' to `core_flags' - change `core_flags' member in mm_struct to a bit field next to `dumpable' - introduce a global spin lock to protect adjacent two bit fields (core_flags and dumpable) from race condition - fix a bug that the generated core file can be corrupted when core dumping and updating core_flags occur concurrently - add kernel.core_flags_enable sysctl parameter to enable/disable flags in /proc/<pid>/core_flags - support ELF-FDPIC binary format, but not tested v1: http://groups.google.com/group/linux.kernel/browse_frm/thread/1381fc54d716e3e6/ -- Hidehiro Kawai Hitachi, Ltd., Systems Development Laboratory E-mail: hidehiro.kawai.ez@hitachi.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/4] coredump: add an interface to control the core dump routine 2007-03-02 4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro @ 2007-03-02 4:47 ` Kawai, Hidehiro 2007-03-02 9:34 ` Pavel Machek 2007-03-02 4:49 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro ` (3 subsequent siblings) 4 siblings, 1 reply; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-02 4:47 UTC (permalink / raw) To: Andrew Morton, kernel list Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI This patch adds an interface to set/reset a flag which determines anonymous shared memory segments should be dumped or not when a core file is generated. /proc/<pid>/coredump_omit_anonymous_shared file is provided to access the flag. You can change the flag status for a particular process by writing to or reading from the file. The flag status is inherited to the child process when it is created. The flag is stored into coredump_omit_anon_shared member of mm_struct, which shares bytes with dumpable member because these two are adjacent bit fields. In order to avoid write-write race between the two, we use a global spin lock. smp_wmb() at updating dumpable is removed because set_dumpable() includes a pair of spin lock and unlock which has the effect of memory barrier. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> --- fs/exec.c | 12 ++-- fs/proc/base.c | 103 ++++++++++++++++++++++++++++++++++++++ include/linux/binfmts.h | 4 + include/linux/sched.h | 33 ++++++++++++ kernel/fork.c | 3 + kernel/sys.c | 62 +++++++--------------- security/commoncap.c | 2 security/dummy.c | 2 8 files changed, 174 insertions(+), 47 deletions(-) Index: linux-2.6.20-mm2/fs/proc/base.c =================================================================== --- linux-2.6.20-mm2.orig/fs/proc/base.c +++ linux-2.6.20-mm2/fs/proc/base.c @@ -74,6 +74,7 @@ #include <linux/poll.h> #include <linux/nsproxy.h> #include <linux/oom.h> +#include <linux/elf.h> #include "internal.h" /* NOTE: @@ -1753,6 +1754,104 @@ static const struct inode_operations pro #endif +#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE) +static ssize_t proc_coredump_omit_anon_shared_read(struct file *file, + char __user *buf, + size_t count, + loff_t *ppos) +{ + struct task_struct *task = get_proc_task(file->f_dentry->d_inode); + struct mm_struct *mm; + char buffer[PROC_NUMBUF]; + size_t len; + loff_t __ppos = *ppos; + int ret; + + ret = -ESRCH; + if (!task) + goto out_no_task; + + ret = 0; + mm = get_task_mm(task); + if (!mm) + goto out_no_mm; + + len = snprintf(buffer, sizeof(buffer), "%u\n", + mm->coredump_omit_anon_shared); + if (__ppos >= len) + goto out; + if (count > len - __ppos) + count = len - __ppos; + + ret = -EFAULT; + if (copy_to_user(buf, buffer + __ppos, count)) + goto out; + + ret = count; + *ppos = __ppos + count; + + out: + mmput(mm); + out_no_mm: + put_task_struct(task); + out_no_task: + return ret; +} + +static ssize_t proc_coredump_omit_anon_shared_write(struct file *file, + const char __user *buf, + size_t count, + loff_t *ppos) +{ + struct task_struct *task; + struct mm_struct *mm; + char buffer[PROC_NUMBUF], *end; + unsigned int val; + int ret; + + ret = -EFAULT; + memset(buffer, 0, sizeof(buffer)); + if (count > sizeof(buffer) - 1) + count = sizeof(buffer) - 1; + if (copy_from_user(buffer, buf, count)) + goto out_no_task; + + ret = -EINVAL; + val = (unsigned int)simple_strtoul(buffer, &end, 0); + if (*end == '\n') + end++; + if (end - buffer == 0) + goto out_no_task; + + ret = -ESRCH; + task = get_proc_task(file->f_dentry->d_inode); + if (!task) + goto out_no_task; + + ret = end - buffer; + mm = get_task_mm(task); + if (!mm) + goto out_no_mm; + + if (down_write_trylock(&coredump_settings_sem)) { + set_coredump_omit_anon_shared(mm, (val != 0)); + up_write(&coredump_settings_sem); + } else + ret = -EBUSY; + + mmput(mm); + out_no_mm: + put_task_struct(task); + out_no_task: + return ret; +} + +static struct file_operations proc_coredump_omit_anon_shared_operations = { + .read = proc_coredump_omit_anon_shared_read, + .write = proc_coredump_omit_anon_shared_write, +}; +#endif + /* * /proc/self: */ @@ -1972,6 +2071,10 @@ static struct pid_entry tgid_base_stuff[ #ifdef CONFIG_FAULT_INJECTION REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject), #endif +#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE) + REG("coredump_omit_anonymous_shared", S_IRUGO|S_IWUSR, + coredump_omit_anon_shared), +#endif #ifdef CONFIG_TASK_IO_ACCOUNTING INF("io", S_IRUGO, pid_io_accounting), #endif Index: linux-2.6.20-mm2/include/linux/sched.h =================================================================== --- linux-2.6.20-mm2.orig/include/linux/sched.h +++ linux-2.6.20-mm2/include/linux/sched.h @@ -366,7 +366,13 @@ struct mm_struct { unsigned int token_priority; unsigned int last_interval; + /* + * Writing to these bit fields can cause race condition. To avoid + * the race, use dump_bits_lock. You may also use set_dumpable() or + * set_coredump_*() macros to set a value. + */ unsigned char dumpable:2; + unsigned char coredump_omit_anon_shared:1; /* don't dump anon shm */ /* coredumping support */ int core_waiters; @@ -1727,6 +1733,33 @@ static inline void inc_syscw(struct task } #endif +#include <linux/elf.h> +/* + * These macros are used to protect dumpable and coredump_omit_anon_shared bit + * fields in mm_struct from write race between the two. + */ +extern spinlock_t dump_bits_lock; +#define __set_dump_bits(dest, val) \ + do { \ + spin_lock(&dump_bits_lock); \ + (dest) = (val); \ + spin_unlock(&dump_bits_lock); \ + } while (0) + +#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE) +# define set_dumpable(mm, val) \ + __set_dump_bits((mm)->dumpable, val) +# define set_coredump_omit_anon_shared(mm, val) \ + __set_dump_bits((mm)->coredump_omit_anon_shared, val) +#else +# define set_dumpable(mm, val) \ + do { \ + (mm)->dumpable = (val); \ + smp_wmb(); \ + } while (0) +# define set_coredump_omit_anon_shared(mm, val) +#endif + #endif /* __KERNEL__ */ #endif Index: linux-2.6.20-mm2/fs/exec.c =================================================================== --- linux-2.6.20-mm2.orig/fs/exec.c +++ linux-2.6.20-mm2/fs/exec.c @@ -61,6 +61,8 @@ int core_uses_pid; char core_pattern[128] = "core"; int suid_dumpable = 0; +DEFINE_SPINLOCK(dump_bits_lock); +DECLARE_RWSEM(coredump_settings_sem); EXPORT_SYMBOL(suid_dumpable); /* The maximal length of core_pattern is also specified in sysctl.c */ @@ -853,9 +855,9 @@ int flush_old_exec(struct linux_binprm * current->sas_ss_sp = current->sas_ss_size = 0; if (current->euid == current->uid && current->egid == current->gid) - current->mm->dumpable = 1; + set_dumpable(current->mm, 1); else - current->mm->dumpable = suid_dumpable; + set_dumpable(current->mm, suid_dumpable); name = bprm->filename; @@ -883,7 +885,7 @@ int flush_old_exec(struct linux_binprm * file_permission(bprm->file, MAY_READ) || (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP)) { suid_keys(current); - current->mm->dumpable = suid_dumpable; + set_dumpable(current->mm, suid_dumpable); } /* An exec changes our domain. We are no longer part of the thread @@ -1477,7 +1479,7 @@ int do_coredump(long signr, int exit_cod flag = O_EXCL; /* Stop rewrite attacks */ current->fsuid = 0; /* Dump root private */ } - mm->dumpable = 0; + set_dumpable(mm, 0); retval = coredump_wait(exit_code); if (retval < 0) @@ -1530,7 +1532,9 @@ int do_coredump(long signr, int exit_cod if (!ispipe && do_truncate(file->f_path.dentry, 0, 0, file) != 0) goto close_fail; + down_read(&coredump_settings_sem); retval = binfmt->core_dump(signr, regs, file); + up_read(&coredump_settings_sem); if (retval) current->signal->group_exit_code |= 0x80; Index: linux-2.6.20-mm2/kernel/fork.c =================================================================== --- linux-2.6.20-mm2.orig/kernel/fork.c +++ linux-2.6.20-mm2/kernel/fork.c @@ -334,6 +334,9 @@ static struct mm_struct * mm_init(struct atomic_set(&mm->mm_count, 1); init_rwsem(&mm->mmap_sem); INIT_LIST_HEAD(&mm->mmlist); + /* don't need to use set_coredump_omit_anon_shared() */ + mm->coredump_omit_anon_shared = + (current->mm) ? current->mm->coredump_omit_anon_shared : 0; mm->core_waiters = 0; mm->nr_ptes = 0; set_mm_counter(mm, file_rss, 0); Index: linux-2.6.20-mm2/kernel/sys.c =================================================================== --- linux-2.6.20-mm2.orig/kernel/sys.c +++ linux-2.6.20-mm2/kernel/sys.c @@ -1022,10 +1022,8 @@ asmlinkage long sys_setregid(gid_t rgid, else return -EPERM; } - if (new_egid != old_egid) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (new_egid != old_egid) + set_dumpable(current->mm, suid_dumpable); if (rgid != (gid_t) -1 || (egid != (gid_t) -1 && egid != old_rgid)) current->sgid = new_egid; @@ -1052,16 +1050,12 @@ asmlinkage long sys_setgid(gid_t gid) return retval; if (capable(CAP_SETGID)) { - if (old_egid != gid) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (old_egid != gid) + set_dumpable(current->mm, suid_dumpable); current->gid = current->egid = current->sgid = current->fsgid = gid; } else if ((gid == current->gid) || (gid == current->sgid)) { - if (old_egid != gid) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (old_egid != gid) + set_dumpable(current->mm, suid_dumpable); current->egid = current->fsgid = gid; } else @@ -1089,10 +1083,8 @@ static int set_user(uid_t new_ruid, int switch_uid(new_user); - if (dumpclear) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (dumpclear) + set_dumpable(current->mm, suid_dumpable); current->uid = new_ruid; return 0; } @@ -1145,10 +1137,8 @@ asmlinkage long sys_setreuid(uid_t ruid, if (new_ruid != old_ruid && set_user(new_ruid, new_euid != old_euid) < 0) return -EAGAIN; - if (new_euid != old_euid) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (new_euid != old_euid) + set_dumpable(current->mm, suid_dumpable); current->fsuid = current->euid = new_euid; if (ruid != (uid_t) -1 || (euid != (uid_t) -1 && euid != old_ruid)) @@ -1195,10 +1185,8 @@ asmlinkage long sys_setuid(uid_t uid) } else if ((uid != current->uid) && (uid != new_suid)) return -EPERM; - if (old_euid != uid) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (old_euid != uid) + set_dumpable(current->mm, suid_dumpable); current->fsuid = current->euid = uid; current->suid = new_suid; @@ -1240,10 +1228,8 @@ asmlinkage long sys_setresuid(uid_t ruid return -EAGAIN; } if (euid != (uid_t) -1) { - if (euid != current->euid) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (euid != current->euid) + set_dumpable(current->mm, suid_dumpable); current->euid = euid; } current->fsuid = current->euid; @@ -1290,10 +1276,8 @@ asmlinkage long sys_setresgid(gid_t rgid return -EPERM; } if (egid != (gid_t) -1) { - if (egid != current->egid) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (egid != current->egid) + set_dumpable(current->mm, suid_dumpable); current->egid = egid; } current->fsgid = current->egid; @@ -1336,10 +1320,8 @@ asmlinkage long sys_setfsuid(uid_t uid) if (uid == current->uid || uid == current->euid || uid == current->suid || uid == current->fsuid || capable(CAP_SETUID)) { - if (uid != old_fsuid) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (uid != old_fsuid) + set_dumpable(current->mm, suid_dumpable); current->fsuid = uid; } @@ -1365,10 +1347,8 @@ asmlinkage long sys_setfsgid(gid_t gid) if (gid == current->gid || gid == current->egid || gid == current->sgid || gid == current->fsgid || capable(CAP_SETGID)) { - if (gid != old_fsgid) { - current->mm->dumpable = suid_dumpable; - smp_wmb(); - } + if (gid != old_fsgid) + set_dumpable(current->mm, suid_dumpable); current->fsgid = gid; key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); @@ -2163,7 +2143,7 @@ asmlinkage long sys_prctl(int option, un error = -EINVAL; break; } - current->mm->dumpable = arg2; + set_dumpable(current->mm, arg2); break; case PR_SET_UNALIGN: Index: linux-2.6.20-mm2/security/commoncap.c =================================================================== --- linux-2.6.20-mm2.orig/security/commoncap.c +++ linux-2.6.20-mm2/security/commoncap.c @@ -244,7 +244,7 @@ void cap_bprm_apply_creds (struct linux_ if (bprm->e_uid != current->uid || bprm->e_gid != current->gid || !cap_issubset (new_permitted, current->cap_permitted)) { - current->mm->dumpable = suid_dumpable; + set_dumpable(current->mm, suid_dumpable); if (unsafe & ~LSM_UNSAFE_PTRACE_CAP) { if (!capable(CAP_SETUID)) { Index: linux-2.6.20-mm2/security/dummy.c =================================================================== --- linux-2.6.20-mm2.orig/security/dummy.c +++ linux-2.6.20-mm2/security/dummy.c @@ -130,7 +130,7 @@ static void dummy_bprm_free_security (st static void dummy_bprm_apply_creds (struct linux_binprm *bprm, int unsafe) { if (bprm->e_uid != current->uid || bprm->e_gid != current->gid) { - current->mm->dumpable = suid_dumpable; + set_dumpable(current->mm, suid_dumpable); if ((unsafe & ~LSM_UNSAFE_PTRACE_CAP) && !capable(CAP_SETUID)) { bprm->e_uid = current->uid; Index: linux-2.6.20-mm2/include/linux/binfmts.h =================================================================== --- linux-2.6.20-mm2.orig/include/linux/binfmts.h +++ linux-2.6.20-mm2/include/linux/binfmts.h @@ -17,6 +17,8 @@ struct pt_regs; #ifdef __KERNEL__ +#include <linux/rwsem.h> + /* * This structure is used to hold the arguments that are used when loading binaries. */ @@ -75,6 +77,8 @@ extern int suid_dumpable; #define SUID_DUMP_USER 1 /* Dump as user of process */ #define SUID_DUMP_ROOT 2 /* Dump as root */ +extern struct rw_semaphore coredump_settings_sem; + /* Stack area protections */ #define EXSTACK_DEFAULT 0 /* Whatever the arch defaults to */ #define EXSTACK_DISABLE_X 1 /* Disable executable stacks */ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine 2007-03-02 4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro @ 2007-03-02 9:34 ` Pavel Machek 2007-03-26 13:02 ` Kawai, Hidehiro 0 siblings, 1 reply; 20+ messages in thread From: Pavel Machek @ 2007-03-02 9:34 UTC (permalink / raw) To: Kawai, Hidehiro Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Hi! > This patch adds an interface to set/reset a flag which determines > anonymous shared memory segments should be dumped or not when a core > file is generated. > > /proc/<pid>/coredump_omit_anonymous_shared file is provided to access > the flag. You can change the flag status for a particular process by > writing to or reading from the file. Yes. So, you used very different interface interface from ulimit, which means locking is hard. Plus, what you are doing can be done in userspace using google coredumper. > + if (down_write_trylock(&coredump_settings_sem)) { > + set_coredump_omit_anon_shared(mm, (val != 0)); > + up_write(&coredump_settings_sem); > + } else > + ret = -EBUSY; Now this is an ugly interface. "If user tries to write to /proc file while it is locked, return him spurious error. > @@ -75,6 +77,8 @@ extern int suid_dumpable; > #define SUID_DUMP_USER 1 /* Dump as user of process */ > #define SUID_DUMP_ROOT 2 /* Dump as root */ > > +extern struct rw_semaphore coredump_settings_sem; > + > /* Stack area protections */ > #define EXSTACK_DEFAULT 0 /* Whatever the arch defaults to */ > #define EXSTACK_DISABLE_X 1 /* Disable executable stacks */ Yep, very nice, if you used interface suited for the task (ulimit), you'd not have to invent new locking like this. Anyway, I don't care, and you don't listen. You got the design wrong, and your code can be done in userspace, anyway. This is "NAK" from me, but unfortunately I'm not in power to decide this. So at least drop me from Cc in future submissions. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine 2007-03-02 9:34 ` Pavel Machek @ 2007-03-26 13:02 ` Kawai, Hidehiro 2007-03-29 10:49 ` Pavel Machek 2007-03-29 19:16 ` David Howells 0 siblings, 2 replies; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-26 13:02 UTC (permalink / raw) To: Pavel Machek Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Hi Pavel, Thank you for your reply. I'm sorry for my late reply. I have discussed with my colleagues why you say "ugly" against my procfs interface, then I noticed I may have misunderstood what you said. Is the reason for saying "ugly" two interfaces, i.e. preexisting ulimit (get/setrlimit) and my proc entry, exist to control core file size? If so, I'm sorry for taking your precious time by proceeding to the discussion without enough understanding. Assuming my presumption is true, I don't think it's so ugly because there are other parameters to control core dumping such as dumpable (controled by prctl(2)) and suid_dumpable (controled via fs.suid_dumpable sysctl). What would you think about that? Anyway, here are the answers to what you pointed out. Pavel Machek wrote: >>This patch adds an interface to set/reset a flag which determines >>anonymous shared memory segments should be dumped or not when a core >>file is generated. >> >>/proc/<pid>/coredump_omit_anonymous_shared file is provided to access >>the flag. You can change the flag status for a particular process by >>writing to or reading from the file. > > Yes. So, you used very different interface interface from ulimit, > which means locking is hard. I'd like to allow users to change the flag from other process. So I have to do locking even if it is hard. You said this flexibility was not an advantage before, but in some cases, it is needed. Please assume the case where a process forks many children and they share a huge shared memory. Sometimes an end user wants to set coredump_omit_anon_shared flag to those processes except for a particular child process. With ulimit (setrlimit) interface, the user can't do such setup without modifying the application program. But normal end user will not be able to modify the program. > Plus, what you are doing can be done in userspace using google > coredumper. I think that the needs differ between userland core dumper user and in-kernel core dumper user. Pros and cons also differ. Some of people (such as system admins, distro vendors, etc) need highly reliable core dumper because they don't want to experience same failures again and they don't hope that another failure is caused by core dumping. Userland core dumper is useful because it is relatively easy to be customized, but its reliability highly depends on the application programs. If the stack for signal handlers is not set up carefully, if the data used by userland core dumper has been destroyed, if coredump_omit_anon_shared flag has been overwritten by bad data, or if the address of functions have been destroyed, the userland core dumper may fail to dump. So in-kernel solutoin is required by enterprise users. >>+ if (down_write_trylock(&coredump_settings_sem)) { >>+ set_coredump_omit_anon_shared(mm, (val != 0)); >>+ up_write(&coredump_settings_sem); >>+ } else >>+ ret = -EBUSY; > > Now this is an ugly interface. "If user tries to write to /proc file > while it is locked, return him spurious error. I'm considering using the previous argument passing approach (preserves the setting value into a local variable and then passes it to core dump routines) or another approach which introduce a per-process flag to indicate that core dump is ongoing. Both of these approach never produce spurious errors. >>@@ -75,6 +77,8 @@ extern int suid_dumpable; >> #define SUID_DUMP_USER 1 /* Dump as user of process */ >> #define SUID_DUMP_ROOT 2 /* Dump as root */ >> >>+extern struct rw_semaphore coredump_settings_sem; >>+ >> /* Stack area protections */ >> #define EXSTACK_DEFAULT 0 /* Whatever the arch defaults to */ >> #define EXSTACK_DISABLE_X 1 /* Disable executable stacks */ > > Yep, very nice, if you used interface suited for the task (ulimit), > you'd not have to invent new locking like this. Using the above-stated approach, this semaphore becomes unnecessary. Thanks, -- Hidehiro Kawai Hitachi, Ltd., Systems Development Laboratory ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine 2007-03-26 13:02 ` Kawai, Hidehiro @ 2007-03-29 10:49 ` Pavel Machek 2007-03-29 19:16 ` David Howells 1 sibling, 0 replies; 20+ messages in thread From: Pavel Machek @ 2007-03-29 10:49 UTC (permalink / raw) To: Kawai, Hidehiro Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Hi! > I have discussed with my colleagues why you say "ugly" against my > procfs interface, then I noticed I may have misunderstood what you said. > Is the reason for saying "ugly" two interfaces, i.e. preexisting ulimit > (get/setrlimit) and my proc entry, exist to control core file size? Yes. > > Plus, what you are doing can be done in userspace using google > > coredumper. > > I think that the needs differ between userland core dumper user and > in-kernel core dumper user. Pros and cons also differ. > > Some of people (such as system admins, distro vendors, etc) need > highly reliable core dumper because they don't want to experience > same failures again and they don't hope that another failure is > caused by core dumping. Userland core dumper is useful because > it is relatively easy to be customized, but its reliability highly > depends on the application programs. Fix userland core dumper to be reliable, then. > If the stack for signal handlers is not set up carefully, if the > data used by userland core dumper has been destroyed, if > coredump_omit_anon_shared flag has been overwritten by bad data, > or if the address of functions have been destroyed, the userland > core dumper may fail to dump. So in-kernel solutoin is required > by enterprise users. It should be possible to dump from separate process. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine 2007-03-26 13:02 ` Kawai, Hidehiro 2007-03-29 10:49 ` Pavel Machek @ 2007-03-29 19:16 ` David Howells 2007-03-29 21:17 ` Andrew Morton 1 sibling, 1 reply; 20+ messages in thread From: David Howells @ 2007-03-29 19:16 UTC (permalink / raw) To: Pavel Machek Cc: Kawai, Hidehiro, Andrew Morton, kernel list, Robin Holt, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI, dhowells Pavel Machek <pavel@ucw.cz> wrote: > > Userland core dumper is useful because it is relatively easy to be > > customized, but its reliability highly depends on the application > > programs. > > Fix userland core dumper to be reliable, then. I don't think it's that easy. The userland core dumper, as I understand it, has to work *within* an application program (it's a library), thus the application program my scotch the core dumper in a couple of ways: (1) by trying to claim the same services (such as signal handlers). (2) by destroying the coredumper should the application run amok and splat it (by munmapping it, mmapping over it, writing over it, etc.). Plus there are cases that an in-application userspace coredumper can't catch - namely double/triple faults. David ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine 2007-03-29 19:16 ` David Howells @ 2007-03-29 21:17 ` Andrew Morton 2007-03-30 10:29 ` Kawai, Hidehiro 0 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2007-03-29 21:17 UTC (permalink / raw) To: David Howells Cc: Pavel Machek, Kawai, Hidehiro, kernel list, Robin Holt, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI On Thu, 29 Mar 2007 20:16:59 +0100 David Howells <dhowells@redhat.com> wrote: > Pavel Machek <pavel@ucw.cz> wrote: > > > > Userland core dumper is useful because it is relatively easy to be > > > customized, but its reliability highly depends on the application > > > programs. > > > > Fix userland core dumper to be reliable, then. > > I don't think it's that easy. The userland core dumper, as I understand it, > has to work *within* an application program (it's a library), thus the > application program my scotch the core dumper in a couple of ways: That's no longer necessarily true with the recently-added dump-to-an-application feature: core_pattern: ... . If the first character of the pattern is a '|', the kernel will treat the rest of the pattern as a command to run. The core dump will be written to the standard input of that program instead of to a file. That's new in 2.6.20 (maybe .19?) so people probably don't know about it. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine 2007-03-29 21:17 ` Andrew Morton @ 2007-03-30 10:29 ` Kawai, Hidehiro 2007-03-30 16:10 ` Andrew Morton 2007-03-31 13:03 ` David Howells 0 siblings, 2 replies; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-30 10:29 UTC (permalink / raw) To: Andrew Morton Cc: David Howells, Pavel Machek, kernel list, Robin Holt, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Hi, Andrew Morton wrote: > On Thu, 29 Mar 2007 20:16:59 +0100 > David Howells <dhowells@redhat.com> wrote: > >>Pavel Machek <pavel@ucw.cz> wrote: >> >>>>Userland core dumper is useful because it is relatively easy to be >>>>customized, but its reliability highly depends on the application >>>>programs. >>> >>>Fix userland core dumper to be reliable, then. >> >>I don't think it's that easy. The userland core dumper, as I understand it, >>has to work *within* an application program (it's a library), thus the >>application program my scotch the core dumper in a couple of ways: > > That's no longer necessarily true with the recently-added > dump-to-an-application feature: > > core_pattern: > ... > . If the first character of the pattern is a '|', the kernel will treat > the rest of the pattern as a command to run. The core dump will be > written to the standard input of that program instead of to a file. I think dumping core over a pipe is almost good. Filtering and writing out a core by a separete userspace program can be reliable because it is independent of the failed user process. But I have one concern; data transfer over a pipe is slow. It took 7 seconds to transfer 2GB anonymous shared memory (detailed at the last of this mail). In the case of dumping hundreds processes which share giga bytes memory, it will take a few minutes even if the huge shared memory is not written to a disk. If a user wants to restart the failed application as soon as possible to keep downtime to a minimum, this extra transfer time will be a barrier. So I think in-kernel filtering is still needed. I performed a simple experiment to confirm the transfer speed: 1. run a program which allocates 2GB anonymous shared memory 2. the program receives SIGSEGV 3. core image is generated and passed to tp_pipe(*) over a pipe 4. tp_pipe reads the pipe and revokes the received data while data arrives 5. tp_pipe logs time of completing the data transfer 6. repeat 1 to 5 ten times and calculate the average and standard deviation (*) test program which I prepared for this experiment Here is the result: Average: 7.041 sec Standard deviation: 0.903 And cpuinfo and meminfo of my box: $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 1 cpu MHz : 3200.343 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl cid cx16 xtpr bogomips : 6404.82 clflush size : 64 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 1 cpu MHz : 3200.343 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl cid cx16 xtpr bogomips : 6400.75 clflush size : 64 $ cat /proc/meminfo MemTotal: 8309240 kB MemFree: 8216036 kB Buffers: 11888 kB Cached: 45200 kB SwapCached: 0 kB Active: 42324 kB Inactive: 26008 kB HighTotal: 7470896 kB HighFree: 7405504 kB LowTotal: 838344 kB LowFree: 810532 kB SwapTotal: 16386260 kB SwapFree: 16386260 kB Dirty: 4 kB Writeback: 0 kB AnonPages: 11232 kB Mapped: 5828 kB Slab: 11916 kB SReclaimable: 5604 kB SUnreclaim: 6312 kB PageTables: 896 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 20540880 kB Committed_AS: 33540 kB VmallocTotal: 118776 kB VmallocUsed: 8324 kB VmallocChunk: 110408 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB Thanks, -- Hidehiro Kawai Hitachi, Ltd., Systems Development Laboratory ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine 2007-03-30 10:29 ` Kawai, Hidehiro @ 2007-03-30 16:10 ` Andrew Morton 2007-03-31 13:03 ` David Howells 1 sibling, 0 replies; 20+ messages in thread From: Andrew Morton @ 2007-03-30 16:10 UTC (permalink / raw) To: Kawai, Hidehiro Cc: David Howells, Pavel Machek, kernel list, Robin Holt, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI On Fri, 30 Mar 2007 19:29:23 +0900 "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote: > > core_pattern: > > ... > > . If the first character of the pattern is a '|', the kernel will treat > > the rest of the pattern as a command to run. The core dump will be > > written to the standard input of that program instead of to a file. > > I think dumping core over a pipe is almost good. Filtering and writing > out a core by a separete userspace program can be reliable because it is > independent of the failed user process. But I have one concern; data > transfer over a pipe is slow. It took 7 seconds to transfer 2GB > anonymous shared memory (detailed at the last of this mail). > > In the case of dumping hundreds processes which share giga bytes memory, > it will take a few minutes even if the huge shared memory is not written > to a disk. If a user wants to restart the failed application as soon > as possible to keep downtime to a minimum, this extra transfer time will > be a barrier. So I think in-kernel filtering is still needed. Yes, I agree - I don't think we presently have a way of avoiding having to send all of that uninteresting data down the pipe. One may, however, be able to play tricks with /proc/<pid>/mem from within the corefile-generating program: select the vmas which are to be dumped, read only those ones. I don't know how practical that would be. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] coredump: add an interface to control the core dump routine 2007-03-30 10:29 ` Kawai, Hidehiro 2007-03-30 16:10 ` Andrew Morton @ 2007-03-31 13:03 ` David Howells 1 sibling, 0 replies; 20+ messages in thread From: David Howells @ 2007-03-31 13:03 UTC (permalink / raw) To: Andrew Morton Cc: Kawai, Hidehiro, Pavel Machek, kernel list, Robin Holt, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Andrew Morton <akpm@linux-foundation.org> wrote: > Yes, I agree - I don't think we presently have a way of avoiding having > to send all of that uninteresting data down the pipe. Have the kernel fork and exec a debug program "just in time" with the dying process ptrace-attached in advance. That program could then use ptrace() or whatever to do the coredump. I think Windows does something like that. David ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory 2007-03-02 4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro 2007-03-02 4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro @ 2007-03-02 4:49 ` Kawai, Hidehiro 2007-03-02 4:50 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro ` (2 subsequent siblings) 4 siblings, 0 replies; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-02 4:49 UTC (permalink / raw) To: Andrew Morton, kernel list Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI This patch enables to omit anonymous shared memory from an ELF formatted core file when it is generated. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> --- fs/binfmt_elf.c | 12 +++++++++--- 1 files changed, 9 insertions(+), 3 deletions(-) Index: linux-2.6.20-mm2/fs/binfmt_elf.c =================================================================== --- linux-2.6.20-mm2.orig/fs/binfmt_elf.c +++ linux-2.6.20-mm2/fs/binfmt_elf.c @@ -1191,9 +1191,15 @@ static int maydump(struct vm_area_struct if (vma->vm_flags & (VM_IO | VM_RESERVED)) return 0; - /* Dump shared memory only if mapped from an anonymous file. */ - if (vma->vm_flags & VM_SHARED) - return vma->vm_file->f_path.dentry->d_inode->i_nlink == 0; + /* + * Dump shared memory only if mapped from an anonymous file and + * /proc/<pid>/coredump_omit_anonymous_shared flag is not set. + */ + if (vma->vm_flags & VM_SHARED) { + if (vma->vm_file->f_path.dentry->d_inode->i_nlink) + return 0; + return vma->vm_mm->coredump_omit_anon_shared == 0; + } /* If it hasn't been written to, don't write it out */ if (!vma->anon_vma) ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory 2007-03-02 4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro 2007-03-02 4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro 2007-03-02 4:49 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro @ 2007-03-02 4:50 ` Kawai, Hidehiro 2007-03-02 4:51 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro 2007-03-15 20:37 ` [PATCH 0/4] coredump: core dump masking support v4 Andrew Morton 4 siblings, 0 replies; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-02 4:50 UTC (permalink / raw) To: Andrew Morton, kernel list Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI This patch enables to omit anonymous shared memory from an ELF-FDPIC formatted core file when it is generated. The debug messages from maydump() in fs/binfmt_elf_fdpic.c are changed appropriately so that we can know what kind of memory segments are dumped or not. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> --- fs/binfmt_elf_fdpic.c | 25 ++++++++++++++++--------- 1 files changed, 16 insertions(+), 9 deletions(-) Index: linux-2.6.20-mm2/fs/binfmt_elf_fdpic.c =================================================================== --- linux-2.6.20-mm2.orig/fs/binfmt_elf_fdpic.c +++ linux-2.6.20-mm2/fs/binfmt_elf_fdpic.c @@ -1168,7 +1168,7 @@ static int dump_seek(struct file *file, * * I think we should skip something. But I am not sure how. H.J. */ -static int maydump(struct vm_area_struct *vma) +static int maydump(struct vm_area_struct *vma, struct mm_struct *mm) { /* Do not dump I/O mapped devices or special mappings */ if (vma->vm_flags & (VM_IO | VM_RESERVED)) { @@ -1184,15 +1184,22 @@ static int maydump(struct vm_area_struct return 0; } - /* Dump shared memory only if mapped from an anonymous file. */ + /* + * Dump shared memory only if mapped from an anonymous file and + * /proc/<pid>/coredump_omit_anonymous_shared flag is not set. + */ if (vma->vm_flags & VM_SHARED) { - if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0) { + if (vma->vm_file->f_path.dentry->d_inode->i_nlink) { kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags); + return 0; + } + if (mm->coredump_omit_anon_shared) { + kdcore("%08lx: %08lx: no (anon-share)", vma->vm_start, vma->vm_flags); + return 0; + } else { + kdcore("%08lx: %08lx: yes (anon-share)", vma->vm_start, vma->vm_flags); return 1; } - - kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags); - return 0; } #ifdef CONFIG_MMU @@ -1451,7 +1458,7 @@ static int elf_fdpic_dump_segments(struc for (vma = current->mm->mmap; vma; vma = vma->vm_next) { unsigned long addr; - if (!maydump(vma)) + if (!maydump(vma, mm)) continue; for (addr = vma->vm_start; @@ -1506,7 +1513,7 @@ static int elf_fdpic_dump_segments(struc for (vml = current->mm->context.vmlist; vml; vml = vml->next) { struct vm_area_struct *vma = vml->vma; - if (!maydump(vma)) + if (!maydump(vma, mm)) continue; if ((*size += PAGE_SIZE) > *limit) @@ -1715,7 +1722,7 @@ static int elf_fdpic_core_dump(long sign phdr.p_offset = offset; phdr.p_vaddr = vma->vm_start; phdr.p_paddr = 0; - phdr.p_filesz = maydump(vma) ? sz : 0; + phdr.p_filesz = maydump(vma, current->mm) ? sz : 0; phdr.p_memsz = sz; offset += phdr.p_filesz; phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0; ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/4] coredump: documentation for proc entry 2007-03-02 4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro ` (2 preceding siblings ...) 2007-03-02 4:50 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro @ 2007-03-02 4:51 ` Kawai, Hidehiro 2007-03-02 9:35 ` Pavel Machek 2007-03-15 20:37 ` [PATCH 0/4] coredump: core dump masking support v4 Andrew Morton 4 siblings, 1 reply; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-02 4:51 UTC (permalink / raw) To: Andrew Morton, kernel list Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI This patch adds the documentation for /proc/<pid>/coredump_omit_anonymous_shared. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> --- Documentation/filesystems/proc.txt | 38 +++++++++++++++++++++++++++ 1 files changed, 38 insertions(+) Index: linux-2.6.20-mm2/Documentation/filesystems/proc.txt =================================================================== --- linux-2.6.20-mm2.orig/Documentation/filesystems/proc.txt +++ linux-2.6.20-mm2/Documentation/filesystems/proc.txt @@ -41,6 +41,7 @@ Table of Contents 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem 2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score 2.13 /proc/<pid>/oom_score - Display current oom-killer score + 2.14 /proc/<pid>/coredump_omit_anonymous_shared - Core dump coordinator ------------------------------------------------------------------------------ Preface @@ -1982,6 +1983,43 @@ This file can be used to check the curre any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which process should be killed in an out-of-memory situation. +2.14 /proc/<pid>/coredump_omit_anonymous_shared - Core dump coordinator +--------------------------------------------------------------------- +When a process is dumped, all anonymous memory is written to a core file as +long as the size of the core file isn't limited. But sometimes we don't want +to dump some memory segments, for example, huge shared memory. + +The /proc/<pid>/coredump_omit_anonymous_shared is a flag which enables you to +omit anonymous shared memory segments from a core file when it is generated. +When the <pid> process is dumped, the core dump routine decides whether a +given memory segment should be dumped into a core file or not based on the +type of the memory segment and the flag. + +If you have written a non-zero value to this proc file, anonymous shared +memory segments are not dumped. There are three types of anonymous shared +memory: + + - IPC shared memory + - the memory segments created by mmap(2) with MAP_ANONYMOUS and MAP_SHARED + flags + - the memory segments created by mmap(2) with MAP_SHARED flag, and the + mapped file has already been unlinked + +Because current core dump routine doesn't distinguish these segments, you can +only choose either dumping all anonymous shared memory segments or not. + +If you don't want to dump all shared memory segments attached to pid 1234, +write 0 to the process's proc file. + + $ echo 1 > /proc/1234/coredump_omit_anonymous_shared + +When a new process is created, the process inherits the flag status from its +parent. It is useful to set the flag before the program runs. +For example: + + $ echo 1 > /proc/self/coredump_omit_anonymous_shared + $ ./some_program + ------------------------------------------------------------------------------ Summary ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] coredump: documentation for proc entry 2007-03-02 4:51 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro @ 2007-03-02 9:35 ` Pavel Machek 2007-03-20 11:11 ` Kawai, Hidehiro 0 siblings, 1 reply; 20+ messages in thread From: Pavel Machek @ 2007-03-02 9:35 UTC (permalink / raw) To: Kawai, Hidehiro Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Hi! > +If you don't want to dump all shared memory segments attached to pid 1234, > +write 0 to the process's proc file. > + > + $ echo 1 > /proc/1234/coredump_omit_anonymous_shared Write 0? > +When a new process is created, the process inherits the flag status from its > +parent. It is useful to set the flag before the program runs. > +For example: > + > + $ echo 1 > /proc/self/coredump_omit_anonymous_shared > + $ ./some_program > + Notice that this docs is wrong. You have to retry until kernel stops producing spurious errors. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] coredump: documentation for proc entry 2007-03-02 9:35 ` Pavel Machek @ 2007-03-20 11:11 ` Kawai, Hidehiro 0 siblings, 0 replies; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-20 11:11 UTC (permalink / raw) To: Pavel Machek Cc: Andrew Morton, kernel list, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Hi Pavel, I'm sorry for my late reply. Pavel Machek wrote: > Hi! > >>+If you don't want to dump all shared memory segments attached to pid 1234, >>+write 0 to the process's proc file. >>+ >>+ $ echo 1 > /proc/1234/coredump_omit_anonymous_shared > > Write 0? Thank you for pointing out. It seems I mistook when I changed the documents. `write 1' is correct. >>+When a new process is created, the process inherits the flag status from its >>+parent. It is useful to set the flag before the program runs. >>+For example: >>+ >>+ $ echo 1 > /proc/self/coredump_omit_anonymous_shared >>+ $ ./some_program >>+ > > Notice that this docs is wrong. You have to retry until kernel stops > producing spurious errors. > Pavel I'll fix the patchset so that kernel doesn't produce the spurious error. For answers to your another mail, please wait a few days. I'm still considering the answer partly. Thanks, -- Hidehiro Kawai Hitachi, Ltd., Systems Development Laboratory ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] coredump: core dump masking support v4 2007-03-02 4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro ` (3 preceding siblings ...) 2007-03-02 4:51 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro @ 2007-03-15 20:37 ` Andrew Morton 2007-03-23 13:13 ` Kawai, Hidehiro 4 siblings, 1 reply; 20+ messages in thread From: Andrew Morton @ 2007-03-15 20:37 UTC (permalink / raw) To: Kawai, Hidehiro Cc: kernel list, Pavel Machek, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI On Fri, 02 Mar 2007 13:41:30 +0900 "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote: > This patch series is version 4 of the core dump masking feature, > which provides a per-process flag not to dump anonymous shared > memory segments. First up, please convince us that this problem cannot be solved in userspace. Note that we now support dumping core over a pipe to a userspace application which can perform filtering such as this (see Documentation/sysctl/kernel.txt:core_pattern). Assuming that your argument is successful... - The unpleasing trylock in proc_coredump_omit_anon_shared_write() is there, I believe, to handle the case where a coredump is presently in progress. We don't want to change the filtering rule while the dump is happening. What I suggest you do instead is to take a copy of mm->coredump_omit_anon_shared into a local variable with one single read per coredump. Pass that local down into all the callees which need to see it. That way, no locking is needed. - These games we're playing with the atomicity of the bitfields in the mm_struct need to go away. First up, please prepare a standalone patch which removes mm_struct.dumpable and adds `unsigned long mm_struct.flags'. Include a comment telling people that they must use atomic bitops (set_bit, clear_bit) on mm_struct.flags. Reimplement the current three-value dumpable silliness using two or three separate flags in mm_struct.flags. Of course, this design means that there will be tiny timing windows where the value of these two or three flags have intermediate, invalid states. Please take care of those little windows and document how you did so. I expect a suitable approach would be to set and clear the flags in a suitable order, so that if a race _does_ happen, the results are benign. - Once that is done, you're ready to think about your new functionality. Start out with #define MM_FLAG_COREDUMP_OMIT_ANON_SHARED (1 << 3) or whatever, and it all becomes easy. - Finally, the code as you have it here is very specific to your specific requirement: don't dump shared memory segments. But if we're going to implement in-kernel core-dump filtering of this nature, we should design it extensibly, even if we don't actually implement those extensions at this time. Because other people might (reasonably) wish to omit anonymous memory, or private mappings, or file-backed VMAs, or whatever. So maybe /proc/pid/coredump_omit_anon_shared should become /proc/pid/core_dumpfilter, which is a carefully documented bitmask. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] coredump: core dump masking support v4 2007-03-15 20:37 ` [PATCH 0/4] coredump: core dump masking support v4 Andrew Morton @ 2007-03-23 13:13 ` Kawai, Hidehiro 2007-03-28 12:37 ` Kawai, Hidehiro 0 siblings, 1 reply; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-23 13:13 UTC (permalink / raw) To: Andrew Morton Cc: kernel list, Pavel Machek, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Hi, Thank you for your kind comments. I'm still discussing the answer with my senior colleagues, so please wait a few days. I think I can reply at the beginning of next week. Best regards, -- Hidehiro Kawai Hitachi, Ltd., Systems Development Laboratory Andrew Morton wrote: > On Fri, 02 Mar 2007 13:41:30 +0900 > "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote: > > >>This patch series is version 4 of the core dump masking feature, >>which provides a per-process flag not to dump anonymous shared >>memory segments. > > > First up, please convince us that this problem cannot be solved in > userspace. Note that we now support dumping core over a pipe to a > userspace application which can perform filtering such as this (see > Documentation/sysctl/kernel.txt:core_pattern). > > > Assuming that your argument is successful... > > - The unpleasing trylock in proc_coredump_omit_anon_shared_write() is > there, I believe, to handle the case where a coredump is presently in > progress. We don't want to change the filtering rule while the dump is > happening. > > What I suggest you do instead is to take a copy of > mm->coredump_omit_anon_shared into a local variable with one single read > per coredump. Pass that local down into all the callees which need to > see it. That way, no locking is needed. > > - These games we're playing with the atomicity of the bitfields in the > mm_struct need to go away. > > First up, please prepare a standalone patch which removes > mm_struct.dumpable and adds `unsigned long mm_struct.flags'. Include a > comment telling people that they must use atomic bitops (set_bit, clear_bit) on > mm_struct.flags. > > Reimplement the current three-value dumpable silliness using two or > three separate flags in mm_struct.flags. Of course, this design means > that there will be tiny timing windows where the value of these two or > three flags have intermediate, invalid states. Please take care of those > little windows and document how you did so. I expect a suitable approach > would be to set and clear the flags in a suitable order, so that if a > race _does_ happen, the results are benign. > > - Once that is done, you're ready to think about your new functionality. > Start out with > > #define MM_FLAG_COREDUMP_OMIT_ANON_SHARED (1 << 3) > > or whatever, and it all becomes easy. > > - Finally, the code as you have it here is very specific to your specific > requirement: don't dump shared memory segments. > > But if we're going to implement in-kernel core-dump filtering of this > nature, we should design it extensibly, even if we don't actually > implement those extensions at this time. > > Because other people might (reasonably) wish to omit anonymous memory, > or private mappings, or file-backed VMAs, or whatever. > > So maybe /proc/pid/coredump_omit_anon_shared should become > /proc/pid/core_dumpfilter, which is a carefully documented bitmask. > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] coredump: core dump masking support v4 2007-03-23 13:13 ` Kawai, Hidehiro @ 2007-03-28 12:37 ` Kawai, Hidehiro 2007-03-28 17:32 ` Andrew Morton 0 siblings, 1 reply; 20+ messages in thread From: Kawai, Hidehiro @ 2007-03-28 12:37 UTC (permalink / raw) To: Andrew Morton Cc: Kawai, Hidehiro, kernel list, Pavel Machek, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI Hi, Thank you for your kind comments. I'm sorry for my late reply. Andrew Morton wrote: > On Fri, 02 Mar 2007 13:41:30 +0900 > "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote: > >>This patch series is version 4 of the core dump masking feature, >>which provides a per-process flag not to dump anonymous shared >>memory segments. > > First up, please convince us that this problem cannot be solved in > userspace. > Note that we now support dumping core over a pipe to a > userspace application which can perform filtering such as this (see > Documentation/sysctl/kernel.txt:core_pattern). I understand. Thank you for your suggestion. I'll reply about it in another mail, but it may take a few days. > Assuming that your argument is successful... > > - The unpleasing trylock in proc_coredump_omit_anon_shared_write() is > there, I believe, to handle the case where a coredump is presently in > progress. We don't want to change the filtering rule while the dump is > happening. > > What I suggest you do instead is to take a copy of > mm->coredump_omit_anon_shared into a local variable with one single read > per coredump. Pass that local down into all the callees which need to > see it. That way, no locking is needed. Previous v3 patchset does what you suggest, and here are links to the patches: [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory http://lkml.org/lkml/2007/2/16/156 [PATCH 3/4] coredump: ELF-FDPIC: enable to omit anonymous shared memory http://lkml.org/lkml/2007/2/16/157 However, there was an opposite opinion. To pass the flag status, I added omit_anon_shared argument to elf_fdpic_dump_segments(). Then, David pointed that the argument was unncecessary, because the function also receives mm_struct *mm which includes coredump_omit_anon_shared. But mm->coredump_omit_anon_shared can be changed while core dumping, and it may causes the core file to be corrupted. So in v4 patchset I used r/w semaphore to prevent mm->coredump_omit_anon_shared from being changed. If I add an addtional argument to elf_fdpic_dump_segments() again, I have to explain it to David. I'll tell him that removing mm argument from the function will be a solution since it refers current->mm directly and the mm argument is never used. > - These games we're playing with the atomicity of the bitfields in the > mm_struct need to go away. > > First up, please prepare a standalone patch which removes > mm_struct.dumpable and adds `unsigned long mm_struct.flags'. Include a > comment telling people that they must use atomic bitops (set_bit, clear_bit) on > mm_struct.flags. OK. I'll do it in the next version. > - Finally, the code as you have it here is very specific to your specific > requirement: don't dump shared memory segments. > > But if we're going to implement in-kernel core-dump filtering of this > nature, we should design it extensibly, even if we don't actually > implement those extensions at this time. I understood. Since I had done so initially, I'll revert it to. > Because other people might (reasonably) wish to omit anonymous memory, > or private mappings, or file-backed VMAs, or whatever. > > So maybe /proc/pid/coredump_omit_anon_shared should become > /proc/pid/core_dumpfilter, which is a carefully documented bitmask. There are people who wish to dump VMAs which are not dumped by default. Taking this into account, some bits of core_dumpfilter will be set by default. This means users have to be aware of the default bitmask when they change the bitmask. Perhaps changing the bitmask requires 3 steps: 1. read the default bitmask 2. change bits of the mask 3. write it to the proc entry So I think it is better if we provide /proc/pid/core_flags (default: all bits are 0) instead of core_dumpfilter. With this interface, users who use only one bit of the bitmask (this will be a common case) just have to write 2^n to the proc entry. It takes only one step: 1. write a value to the proc entry If we can implement at the same cost, core_flags will be better because it is useful for users. What would you think about that? By the way, Robin Holt wrote as follows: > Can you make this a little more transparent? Having a magic bitmask does > not seem like the best way to do stuff. Could you maybe make a core_flags > directory with a seperate file for each flag. It could still map to a > single field in the mm, but be broken out for the proc filesystem. Do you think Robin's suggestion is acceptable? Best regards, -- Hidehiro Kawai Hitachi, Ltd., Systems Development Laboratory ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/4] coredump: core dump masking support v4 2007-03-28 12:37 ` Kawai, Hidehiro @ 2007-03-28 17:32 ` Andrew Morton 0 siblings, 0 replies; 20+ messages in thread From: Andrew Morton @ 2007-03-28 17:32 UTC (permalink / raw) To: Kawai, Hidehiro Cc: kernel list, Pavel Machek, Robin Holt, David Howells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI On Wed, 28 Mar 2007 21:37:07 +0900 "Kawai, Hidehiro" <hidehiro.kawai.ez@hitachi.com> wrote: > > Because other people might (reasonably) wish to omit anonymous memory, > > or private mappings, or file-backed VMAs, or whatever. > > > > So maybe /proc/pid/coredump_omit_anon_shared should become > > /proc/pid/core_dumpfilter, which is a carefully documented bitmask. > > There are people who wish to dump VMAs which are not dumped by default. > Taking this into account, some bits of core_dumpfilter will be set by > default. This means users have to be aware of the default bitmask > when they change the bitmask. Perhaps changing the bitmask requires > 3 steps: > > 1. read the default bitmask > 2. change bits of the mask > 3. write it to the proc entry > > So I think it is better if we provide /proc/pid/core_flags (default: > all bits are 0) instead of core_dumpfilter. With this interface, > users who use only one bit of the bitmask (this will be a common case) > just have to write 2^n to the proc entry. It takes only one step: > > 1. write a value to the proc entry > > If we can implement at the same cost, core_flags will be better > because it is useful for users. What would you think about that? > It sounds unnecessarily complex, and unnecessarily different from our normal expectations of /proc files. And the value we read differs from the value we wrote... I think having a non-zero default will be fine. > > By the way, Robin Holt wrote as follows: > > > Can you make this a little more transparent? Having a magic bitmask does > > not seem like the best way to do stuff. Could you maybe make a core_flags > > directory with a seperate file for each flag. It could still map to a > > single field in the mm, but be broken out for the proc filesystem. > > Do you think Robin's suggestion is acceptable? Marginal, I think. This is not likely to be a field which a lot of people modify a lot of times. Those few people who need to work with this can afford to look the values up in the documentation while writing their script. And it requires a distressingly large amount of code to implement a /proc file. Perhaps in this situation the code can be shared. otoh, why is it a /proc thing at all? unsigned long sys_set_corefile_filter(unsigned long enable_mask); unsigned long sys_clear_corefile_filter(unsigned long enable_mask); would be better? ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 0/4] coredump: core dump masking support v3 @ 2007-02-16 13:34 Kawai, Hidehiro 2007-02-16 13:42 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro 0 siblings, 1 reply; 20+ messages in thread From: Kawai, Hidehiro @ 2007-02-16 13:34 UTC (permalink / raw) To: Andrew Morton, kernel list Cc: Pavel Machek, Robin Holt, dhowells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI@redhat Hi, This patch series is version 3 of the core dump masking feature, which provides a per-process flag not to dump anonymous shared memory segments. In this version, /proc/<pid>/coredump_omit_anonymous_shared file is provided as an interface instead of the previous /proc/<pid>/core_flags. If you have written a non-zero value to the file, anonymous shared memory segments of the process not to be dumped. Another change of this version is removal of kernel.core_flags_enables sysctl which enables/disables core dump flags globally. The purpose of this sysctl is for the system administrator to force all anonymous memory to be dumped. But ulimit -c 0 breaks it. So this sysctl is not helpful for the current linux. This patch series can be applied against 2.6.20-mm1. The supported core file formats are ELF and ELF-FDPIC. ELF has been tested, but ELF-FDPIC has not been build and tested because I don't have the test environment. Background: Some software programs share huge memory among hundreds of processes. If a failure occurs on one of these processes, they can be signaled by a monitoring process to generate core files and restart the service. However, it can develop into a system-wide failure such as system slow down for a long time and disk space shortage because the total size of the core files is very huge! To avoid the above situation we can limit the core file size by setrlimit(2) or ulimit(1). But this method can lose important data such as stack because core dumping is terminated halfway. So I suggest keeping shared memory segments from being dumped for particular processes. Because the shared memory attached to processes is common in them, we don't need to dump the shared memory every time. Usage: Get all shared memory segments of pid 1234 not to dump: $ echo 1 > /proc/1234/coredump_omit_anonymous_shared When a new process is created, the process inherits the flag status from its parent. It is useful to set the core dump flags before the program runs. For example: $ echo 1 > /proc/self/coredump_omit_anonymous_shared $ ./some_program ChangeLog: v3: - remove `/proc/<pid>/core_flags' proc entry - add `/proc/<pid>/coredump_anonymous_shared' as a named flag - remove kernel.core_flags_enable sysctl parameter v2: http://groups.google.com/group/linux.kernel/browse_frm/thread/cb254465971d4a42/ http://groups.google.com/group/linux.kernel/browse_frm/thread/da78f2702e06fa11/ - rename `coremask' to `core_flags' - change `core_flags' member in mm_struct to a bit field next to `dumpable' - introduce a global spin lock to protect adjacent two bit fields (core_flags and dumpable) from race condition - fix a bug that the generated core file can be corrupted when core dumping and updating core_flags occur concurrently - add kernel.core_flags_enable sysctl parameter to enable/disable flags in /proc/<pid>/core_flags - support ELF-FDPIC binary format, but not tested v1: http://groups.google.com/group/linux.kernel/browse_frm/thread/1381fc54d716e3e6/ -- Hidehiro Kawai Hitachi, Ltd., Systems Development Laboratory E-mail: hidehiro.kawai.ez@hitachi.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 4/4] coredump: documentation for proc entry 2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro @ 2007-02-16 13:42 ` Kawai, Hidehiro 0 siblings, 0 replies; 20+ messages in thread From: Kawai, Hidehiro @ 2007-02-16 13:42 UTC (permalink / raw) To: Andrew Morton, kernel list Cc: Kawai, Hidehiro, Pavel Machek, Robin Holt, dhowells, Alan Cox, Masami Hiramatsu, sugita, Satoshi OSHIMA, Hideo AOKI@redhat This patch adds the documentation for /proc/<pid>/coredump_omit_anonymous_shared. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> --- Documentation/filesystems/proc.txt | 38 +++++++++++++++++++++++++++ 1 files changed, 38 insertions(+) Index: linux-2.6.20-mm1/Documentation/filesystems/proc.txt =================================================================== --- linux-2.6.20-mm1.orig/Documentation/filesystems/proc.txt +++ linux-2.6.20-mm1/Documentation/filesystems/proc.txt @@ -41,6 +41,7 @@ Table of Contents 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem 2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score 2.13 /proc/<pid>/oom_score - Display current oom-killer score + 2.14 /proc/<pid>/coredump_omit_anonymous_shared - Core dump coordinator ------------------------------------------------------------------------------ Preface @@ -1982,6 +1983,43 @@ This file can be used to check the curre any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which process should be killed in an out-of-memory situation. +2.14 /proc/<pid>/coredump_omit_anonymous_shared - Core dump coordinator +--------------------------------------------------------------------- +When a process is dumped, all anonymous memory is written to a core file as +long as the size of the core file isn't limited. But sometimes we don't want +to dump some memory segments, for example, huge shared memory. + +The /proc/<pid>/coredump_omit_anonymous_shared is a flag which enables you to +omit anonymous shared memory segments from a core file when it is generated. +When the <pid> process is dumped, the core dump routine decides whether a +given memory segment should be dumped into a core file or not based on the +type of the memory segment and the flag. + +If you have written a non-zero value to this proc file, anonymous shared +memory segments are not dumped. There are three types of anonymous shared +memory: + + - IPC shared memory + - the memory segments created by mmap(2) with MAP_ANONYMOUS and MAP_SHARED + flags + - the memory segments created by mmap(2) with MAP_SHARED flag, and the + mapped file has already been unlinked + +Because current core dump routine doesn't distinguish these segments, you can +only choose either dumping all anonymous shared memory segments or not. + +If you don't want to dump all shared memory segments attached to pid 1234, +write 0 to the process's proc file. + + $ echo 1 > /proc/1234/coredump_omit_anonymous_shared + +When a new process is created, the process inherits the flag status from its +parent. It is useful to set the flag before the program runs. +For example: + + $ echo 1 > /proc/self/coredump_omit_anonymous_shared + $ ./some_program + ------------------------------------------------------------------------------ Summary ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2007-03-31 13:03 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-03-02 4:41 [PATCH 0/4] coredump: core dump masking support v4 Kawai, Hidehiro 2007-03-02 4:47 ` [PATCH 1/4] coredump: add an interface to control the core dump routine Kawai, Hidehiro 2007-03-02 9:34 ` Pavel Machek 2007-03-26 13:02 ` Kawai, Hidehiro 2007-03-29 10:49 ` Pavel Machek 2007-03-29 19:16 ` David Howells 2007-03-29 21:17 ` Andrew Morton 2007-03-30 10:29 ` Kawai, Hidehiro 2007-03-30 16:10 ` Andrew Morton 2007-03-31 13:03 ` David Howells 2007-03-02 4:49 ` [PATCH 2/4] coredump: ELF: enable to omit anonymous shared memory Kawai, Hidehiro 2007-03-02 4:50 ` [PATCH 3/4] coredump: ELF-FDPIC: " Kawai, Hidehiro 2007-03-02 4:51 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro 2007-03-02 9:35 ` Pavel Machek 2007-03-20 11:11 ` Kawai, Hidehiro 2007-03-15 20:37 ` [PATCH 0/4] coredump: core dump masking support v4 Andrew Morton 2007-03-23 13:13 ` Kawai, Hidehiro 2007-03-28 12:37 ` Kawai, Hidehiro 2007-03-28 17:32 ` Andrew Morton -- strict thread matches above, loose matches on Subject: below -- 2007-02-16 13:34 [PATCH 0/4] coredump: core dump masking support v3 Kawai, Hidehiro 2007-02-16 13:42 ` [PATCH 4/4] coredump: documentation for proc entry Kawai, Hidehiro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).