LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 4/5] don't panic if /sbin/init exits or killed
@ 2008-03-16 15:54 Oleg Nesterov
  2008-03-16 22:19 ` Roland McGrath
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Oleg Nesterov @ 2008-03-16 15:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Davide Libenzi, Eric W. Biederman, Ingo Molnar, Laurent Riffard,
	Pavel Emelyanov, Roland McGrath, linux-kernel

If the buggy init exits, the kernel panics. I see no point for this. It is very
possible that the system is still usable enough, at least to read the logs and
prepare the bug report.

Change exit_child_reaper() to do BUG() instead of panic().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 25/kernel/exit.c~4_EXIT_DONT_PANIC	2008-03-16 16:49:30.000000000 +0300
+++ 25/kernel/exit.c	2008-03-16 16:49:35.000000000 +0300
@@ -861,8 +861,10 @@ static inline void exit_child_reaper(str
 	if (likely(tsk->group_leader != task_child_reaper(tsk)))
 		return;
 
-	if (tsk->nsproxy->pid_ns == &init_pid_ns)
-		panic("Attempted to kill init!");
+	if (unlikely(tsk->nsproxy->pid_ns == &init_pid_ns)) {
+		printk(KERN_EMERG "Kernel panic - init is killed!\n");
+		BUG();
+	}
 
 	/*
 	 * @tsk is the last thread in the 'cgroup-init' and is exiting.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-16 15:54 [PATCH 4/5] don't panic if /sbin/init exits or killed Oleg Nesterov
@ 2008-03-16 22:19 ` Roland McGrath
  2008-03-16 22:54   ` Krzysztof Halasa
  2008-03-16 23:03   ` Oleg Nesterov
       [not found] ` <1205850955.6466.61.camel@moss-spartans.epoch.ncsc.mil>
  2008-03-29  5:47 ` H. Peter Anvin
  2 siblings, 2 replies; 13+ messages in thread
From: Roland McGrath @ 2008-03-16 22:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
	Laurent Riffard, Pavel Emelyanov, linux-kernel

BUG() does not seem right to me.  This does not diagnose any kernel bug.
The kernel source location and backtrace are not useful.  In fact, they
are likely to mislead the user into reporting the bug to the wrong place
(because it will look like a kernel bug).

I gather your motivation is to get something "recoverable" rather than
always rebooting.  This might be useful for developers like you and me.
I suspect that conservative administrators of production systems prefer
the current behavior.  If the boot init dies, that is reasonably likely
to be a "catastrophic" failure of the system as a whole as far as the
proprietor of a production system is concerned.  That is, the system may
no longer behave as expected in ways essential for its normal operation.
If it sticks around in that condition, appearing to be available but not
doing everything it should, that is usually worse than a quick and
orderly crash (which the installation's procedures and monitoring
infrastructure are often prepared to handle).

panic is a bit extreme for the situation, where we have no reason yet to
think kernel data structures are inconsistent.  A sync+reboot or sync+crash
without bust_spinlocks et al might be better.

For letting init die and calling it recoverable for hacking purposes, a
sysctl to disable the panic/crash makes sense.  But I don't think we
should change the default setting.

Have you tested how recoverable it really is?  I wonder what happens
with init having exited when things get reparented to it.  Don't the
zombies just pile up?


Thanks,
Roland

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-16 22:19 ` Roland McGrath
@ 2008-03-16 22:54   ` Krzysztof Halasa
  2008-03-16 23:03   ` Oleg Nesterov
  1 sibling, 0 replies; 13+ messages in thread
From: Krzysztof Halasa @ 2008-03-16 22:54 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Oleg Nesterov, Andrew Morton, Davide Libenzi, Eric W. Biederman,
	Ingo Molnar, Laurent Riffard, Pavel Emelyanov, linux-kernel

Roland McGrath <roland@redhat.com> writes:

> Have you tested how recoverable it really is?  I wonder what happens
> with init having exited when things get reparented to it.  Don't the
> zombies just pile up?

That's not very different from any other "important" program crash,
we don't sync+reboot when httpd or sshd is killed.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-16 23:03   ` Oleg Nesterov
@ 2008-03-16 22:55     ` Alan Cox
  2008-03-16 23:32     ` Roland McGrath
  1 sibling, 0 replies; 13+ messages in thread
From: Alan Cox @ 2008-03-16 22:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Roland McGrath, Andrew Morton, Davide Libenzi, Eric W. Biederman,
	Ingo Molnar, Laurent Riffard, Pavel Emelyanov, linux-kernel

> Well, I think the generic "if we have a chance to survive, we should try
> to survive" rule is good.
> 
> If the boot init dies, at least the admin has a chance to figure out what
> has happened, and -o remount,ro /.

A long long time ago someone posted a patch which arranged that if the
init process died it got respawned via the same list of process names as
happens on boot.

Or for that matter we could catch the dying init and replace it with a
kernel side while(1) wait(NULL); so that we at least continue the cleanup.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-16 22:19 ` Roland McGrath
  2008-03-16 22:54   ` Krzysztof Halasa
@ 2008-03-16 23:03   ` Oleg Nesterov
  2008-03-16 22:55     ` Alan Cox
  2008-03-16 23:32     ` Roland McGrath
  1 sibling, 2 replies; 13+ messages in thread
From: Oleg Nesterov @ 2008-03-16 23:03 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
	Laurent Riffard, Pavel Emelyanov, linux-kernel

On 03/16, Roland McGrath wrote:
>

(re-ordered)

> Have you tested how recoverable it really is?  I wonder what happens
> with init having exited when things get reparented to it.  Don't the
> zombies just pile up?

Yes sure, we leak the re-parented zombies, and nobody can take care of
/etc/inittab. As expected.

But otherwise the system runs fine.

> BUG() does not seem right to me.  This does not diagnose any kernel bug.
> The kernel source location and backtrace are not useful.  In fact, they
> are likely to mislead the user into reporting the bug to the wrong place
> (because it will look like a kernel bug).

But panic() isn't better? It doesn't provide any useful info.

> I gather your motivation is to get something "recoverable" rather than
> always rebooting.  This might be useful for developers like you and me.
> I suspect that conservative administrators of production systems prefer
> the current behavior.  If the boot init dies, that is reasonably likely
> to be a "catastrophic" failure of the system as a whole as far as the
> proprietor of a production system is concerned.  That is, the system may
> no longer behave as expected in ways essential for its normal operation.
> If it sticks around in that condition, appearing to be available but not
> doing everything it should, that is usually worse than a quick and
> orderly crash (which the installation's procedures and monitoring
> infrastructure are often prepared to handle).

Well, I think the generic "if we have a chance to survive, we should try
to survive" rule is good.

If the boot init dies, at least the admin has a chance to figure out what
has happened, and -o remount,ro /.

Every BUG/BUG_ON in fact means the system is not useable, but still it does
not panic(), but tries to proceed.

In short, I can't see why panic() is better. Except we have panic_timeout,
but we can take it into account if init exits.

> panic is a bit extreme for the situation, where we have no reason yet to
> think kernel data structures are inconsistent.  A sync+reboot or sync+crash
> without bust_spinlocks et al might be better.
> 
> For letting init die and calling it recoverable for hacking purposes, a
> sysctl to disable the panic/crash makes sense.  But I don't think we
> should change the default setting.

OK, I won't argue (not that I agree ;).

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-16 23:03   ` Oleg Nesterov
  2008-03-16 22:55     ` Alan Cox
@ 2008-03-16 23:32     ` Roland McGrath
  2008-03-16 23:49       ` Oleg Nesterov
  1 sibling, 1 reply; 13+ messages in thread
From: Roland McGrath @ 2008-03-16 23:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
	Laurent Riffard, Pavel Emelyanov, linux-kernel

> But panic() isn't better? It doesn't provide any useful info.

It is not misleading in the same way.  It's clear that going to look at the
kernel source is not the place to find the root of the problem.

> Well, I think the generic "if we have a chance to survive, we should try
> to survive" rule is good.
> 
> If the boot init dies, at least the admin has a chance to figure out what
> has happened, and -o remount,ro /.

For me and you, I agree.  I think the common case is that there is no admin
prepared to do any such thing, but just someone expecting a reboot to fix
things and preferring that a failing system reboot itself in the middle of
the night rather than wedge.

> Every BUG/BUG_ON in fact means the system is not useable, but still it does
> not panic(), but tries to proceed.

Many production systems probably set panic_on_oops.  Having the init panic
behavior keyed on that seems fine to me.  I just don't like the "kernel bug
at this source line" output when it's not true.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-16 23:32     ` Roland McGrath
@ 2008-03-16 23:49       ` Oleg Nesterov
  2008-03-16 23:59         ` Roland McGrath
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2008-03-16 23:49 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
	Laurent Riffard, Pavel Emelyanov, linux-kernel

On 03/16, Roland McGrath wrote:
>
> > But panic() isn't better? It doesn't provide any useful info.
> 
> It is not misleading in the same way.  It's clear that going to look at the
> kernel source is not the place to find the root of the problem.
> 
> > Well, I think the generic "if we have a chance to survive, we should try
> > to survive" rule is good.
> > 
> > If the boot init dies, at least the admin has a chance to figure out what
> > has happened, and -o remount,ro /.
> 
> For me and you, I agree.  I think the common case is that there is no admin
> prepared to do any such thing, but just someone expecting a reboot to fix
> things and preferring that a failing system reboot itself in the middle of
> the night rather than wedge.

Agreed,

> > Every BUG/BUG_ON in fact means the system is not useable, but still it does
> > not panic(), but tries to proceed.
> 
> Many production systems probably set panic_on_oops.  Having the init panic
> behavior keyed on that seems fine to me.  I just don't like the "kernel bug
> at this source line" output when it's not true.

Ah, OK. We can change this to dump_stack() without BUG().

(but again! panic() isn't better, it also looks like a kernel bug).

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-16 23:49       ` Oleg Nesterov
@ 2008-03-16 23:59         ` Roland McGrath
  2008-03-17  0:05           ` Oleg Nesterov
  0 siblings, 1 reply; 13+ messages in thread
From: Roland McGrath @ 2008-03-16 23:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
	Laurent Riffard, Pavel Emelyanov, linux-kernel

> Ah, OK. We can change this to dump_stack() without BUG().

I'm not sure that's a whole lot better as far as confusion.  
And what use is it?  You know init exited.  Just print its exit_code 
and/or group_exit_code and you know what kind of kernel entry the exit was.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-16 23:59         ` Roland McGrath
@ 2008-03-17  0:05           ` Oleg Nesterov
  0 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2008-03-17  0:05 UTC (permalink / raw)
  To: Roland McGrath
  Cc: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
	Laurent Riffard, Pavel Emelyanov, linux-kernel

On 03/16, Roland McGrath wrote:
>
> > Ah, OK. We can change this to dump_stack() without BUG().
> 
> I'm not sure that's a whole lot better as far as confusion.  
> And what use is it?  You know init exited.  Just print its exit_code 
> and/or group_exit_code and you know what kind of kernel entry the exit was.

Agreed. It was either killed, or exited. The exit code provides enough info.

Thanks, I'll resend this patch.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
       [not found] ` <1205850955.6466.61.camel@moss-spartans.epoch.ncsc.mil>
@ 2008-03-18 15:41   ` Oleg Nesterov
  0 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2008-03-18 15:41 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
	Laurent Riffard, Pavel Emelyanov, Roland McGrath, linux-kernel,
	James Morris

On 03/18, Stephen Smalley wrote:
> 
> On Sun, 2008-03-16 at 18:54 +0300, Oleg Nesterov wrote:
> > If the buggy init exits, the kernel panics. I see no point for this. It is very
> > possible that the system is still usable enough, at least to read the logs and
> > prepare the bug report.
> 
> At present, init with SELinux support will exit with an error status if
> it cannot successfully load the initial policy if the system is in
> enforcing mode, with the expectation that the kernel will then panic.
> 
> We might argue about whether that is correct (having it explicitly halt
> makes more sense to me, but I didn't write the init patch), but it has
> been the behavior in Fedora and Debian for quite some time.

OK, thanks. It's a pity ;)

Please ignore this patch then.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-16 15:54 [PATCH 4/5] don't panic if /sbin/init exits or killed Oleg Nesterov
  2008-03-16 22:19 ` Roland McGrath
       [not found] ` <1205850955.6466.61.camel@moss-spartans.epoch.ncsc.mil>
@ 2008-03-29  5:47 ` H. Peter Anvin
  2008-03-29 10:51   ` Oleg Nesterov
  2 siblings, 1 reply; 13+ messages in thread
From: H. Peter Anvin @ 2008-03-29  5:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
	Laurent Riffard, Pavel Emelyanov, Roland McGrath, linux-kernel

Oleg Nesterov wrote:
> If the buggy init exits, the kernel panics. I see no point for this. It is very
> possible that the system is still usable enough, at least to read the logs and
> prepare the bug report.
> 
> Change exit_child_reaper() to do BUG() instead of panic().
> 
> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

This would be highly undesirable in a production system, since it would 
leave the machine an unusable zombie.  In a production system, the panic 
can be made to reboot the system, bringing it back online.

	-hpa


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
  2008-03-29  5:47 ` H. Peter Anvin
@ 2008-03-29 10:51   ` Oleg Nesterov
  0 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2008-03-29 10:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Davide Libenzi, Eric W. Biederman, Ingo Molnar,
	Laurent Riffard, Pavel Emelyanov, Roland McGrath, linux-kernel,
	Stephen Smalley

On 03/28, H. Peter Anvin wrote:
>
> Oleg Nesterov wrote:
> >If the buggy init exits, the kernel panics. I see no point for this. It is 
> >very
> >possible that the system is still usable enough, at least to read the logs 
> >and
> >prepare the bug report.
> >
> >Change exit_child_reaper() to do BUG() instead of panic().
> >
> >Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
> 
> This would be highly undesirable in a production system, since it would 
> leave the machine an unusable zombie.  In a production system, the panic 
> can be made to reboot the system, bringing it back online.

I can't agree. Following this logic, we should always use panic() instead
of BUG().

I think the system should try to use any chance to survive, and we have
panic_on_oops.


That said, Stephen has a good reason to nack this patch. But still I hope
it is possible to find a simple solution. Not that I think this is really
important, but this panic() is silly, imho.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] don't panic if /sbin/init exits or killed
       [not found]     ` <a8aCw-2hv-31@gated-at.bofh.it>
@ 2008-03-18 17:41       ` Bodo Eggert
  0 siblings, 0 replies; 13+ messages in thread
From: Bodo Eggert @ 2008-03-18 17:41 UTC (permalink / raw)
  To: Roland McGrath, Oleg Nesterov, Andrew Morton, Davide Libenzi,
	Eric W. Biederman, Ingo Molnar, Laurent Riffard, Pavel Emelyanov,
	linux-kernel

Roland McGrath <roland@redhat.com> wrote:

>> But panic() isn't better? It doesn't provide any useful info.
> 
> It is not misleading in the same way.  It's clear that going to look at the
> kernel source is not the place to find the root of the problem.
> 
>> Well, I think the generic "if we have a chance to survive, we should try
>> to survive" rule is good.
>> 
>> If the boot init dies, at least the admin has a chance to figure out what
>> has happened, and -o remount,ro /.
> 
> For me and you, I agree.  I think the common case is that there is no admin
> prepared to do any such thing, but just someone expecting a reboot to fix
> things and preferring that a failing system reboot itself in the middle of
> the night rather than wedge.

I think the common case is repairing a system using init=/bin/sh, and
forgetting to not log out instead of fscking the filesystem after the panic.
Rebooting on panic= is fine, but not skipping the umount/sync stages.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-03-29 10:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-16 15:54 [PATCH 4/5] don't panic if /sbin/init exits or killed Oleg Nesterov
2008-03-16 22:19 ` Roland McGrath
2008-03-16 22:54   ` Krzysztof Halasa
2008-03-16 23:03   ` Oleg Nesterov
2008-03-16 22:55     ` Alan Cox
2008-03-16 23:32     ` Roland McGrath
2008-03-16 23:49       ` Oleg Nesterov
2008-03-16 23:59         ` Roland McGrath
2008-03-17  0:05           ` Oleg Nesterov
     [not found] ` <1205850955.6466.61.camel@moss-spartans.epoch.ncsc.mil>
2008-03-18 15:41   ` Oleg Nesterov
2008-03-29  5:47 ` H. Peter Anvin
2008-03-29 10:51   ` Oleg Nesterov
     [not found] <a83rs-N7-9@gated-at.bofh.it>
     [not found] ` <a89wR-8k4-3@gated-at.bofh.it>
     [not found]   ` <a8a9t-1pV-21@gated-at.bofh.it>
     [not found]     ` <a8aCw-2hv-31@gated-at.bofh.it>
2008-03-18 17:41       ` Bodo Eggert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).