LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Boot failures with net-next after rebase to v4.17.0-rc1
@ 2018-04-24 19:54 Jesper Dangaard Brouer
  2018-04-24 20:04 ` Linus Torvalds
  0 siblings, 1 reply; 3+ messages in thread
From: Jesper Dangaard Brouer @ 2018-04-24 19:54 UTC (permalink / raw)
  To: netdev
  Cc: brouer, LKML, David Miller, Toke Høiland-Jørgensen,
	Paul E. McKenney, Linus Torvalds, David Ahern

Hi all,

I'm experiencing boot failures with net-next git-tree after it got
rebased/merged with Linus'es tree at v4.17.0-rc1.

The boot problem only occurs for certain kernel configs. I've bisected
the config problem down to enabling CONFIG_PREEMPT=y and resulting
dependencies in below diff.

Is this a know problem?
Have others experienced this too?

This happens for me on two different (x86_64) testlab machines...
I also tested on Linus'es tree at v4.17-rc2, and problem also exists
for me there.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


--- config21-steps-works        2018-04-24 21:33:42.353751894 +0200
+++ config20-steps-bad  2018-04-24 21:27:19.852654328 +0200
@@ -131,7 +131,7 @@
 #
 # RCU Subsystem
 #
-CONFIG_TREE_RCU=y
+CONFIG_PREEMPT_RCU=y
 # CONFIG_RCU_EXPERT is not set
 CONFIG_SRCU=y
 CONFIG_TREE_SRCU=y
@@ -421,11 +421,7 @@
 CONFIG_BFQ_GROUP_IOSCHED=y
 CONFIG_PREEMPT_NOTIFIERS=y
 CONFIG_ASN1=y
-CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
-CONFIG_INLINE_READ_UNLOCK=y
-CONFIG_INLINE_READ_UNLOCK_IRQ=y
-CONFIG_INLINE_WRITE_UNLOCK=y
-CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
+CONFIG_UNINLINE_SPIN_UNLOCK=y
 CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y
 CONFIG_MUTEX_SPIN_ON_OWNER=y
 CONFIG_RWSEM_SPIN_ON_OWNER=y
@@ -497,9 +493,10 @@
 CONFIG_SCHED_SMT=y
 CONFIG_SCHED_MC=y
 CONFIG_SCHED_MC_PRIO=y
-CONFIG_PREEMPT_NONE=y
+# CONFIG_PREEMPT_NONE is not set
 # CONFIG_PREEMPT_VOLUNTARY is not set
-# CONFIG_PREEMPT is not set
+CONFIG_PREEMPT=y
+CONFIG_PREEMPT_COUNT=y
 CONFIG_X86_LOCAL_APIC=y
 CONFIG_X86_IO_APIC=y
 CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
@@ -3931,6 +3928,7 @@
 # CONFIG_SCHEDSTATS is not set
 # CONFIG_SCHED_STACK_END_CHECK is not set
 # CONFIG_DEBUG_TIMEKEEPING is not set
+# CONFIG_DEBUG_PREEMPT is not set
 
 #
 # Lock Debugging (spinlocks, mutexes, etc...)
@@ -3996,6 +3994,7 @@
 CONFIG_FUNCTION_GRAPH_TRACER=y
 # CONFIG_PREEMPTIRQ_EVENTS is not set
 # CONFIG_IRQSOFF_TRACER is not set
+# CONFIG_PREEMPT_TRACER is not set
 # CONFIG_SCHED_TRACER is not set
 CONFIG_HWLAT_TRACER=y
 # CONFIG_FTRACE_SYSCALLS is not set

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Boot failures with net-next after rebase to v4.17.0-rc1
  2018-04-24 19:54 Boot failures with net-next after rebase to v4.17.0-rc1 Jesper Dangaard Brouer
@ 2018-04-24 20:04 ` Linus Torvalds
  2018-04-25  7:16   ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 2018-04-24 20:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, LKML, David Miller, Toke Høiland-Jørgensen,
	Paul E. McKenney, David Ahern

On Tue, Apr 24, 2018 at 12:54 PM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> Hi all,
>
> I'm experiencing boot failures with net-next git-tree after it got
> rebased/merged with Linus'es tree at v4.17.0-rc1.

I suspect it's the global bit stuff that came in very late in the
merge window, and had been developed and tested for a while before,
but showed some problems under some configs.

The fix is currently in the x86/pti tree in -tip, see:

   x86/pti: Fix boot problems from Global-bit setting

and I expect it will percolate upstream soon.

In the meantime, it would be good to verify that merging that x86/pti
branch fixes it for you?

There is another candidate for boot problems - do you happen to have
CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled? That can under certain
circumstances get a percpu setup page fault because memory hadn't been
initialized sufficiently.

The fix there is to move the mm_init() call one step earlier in
init_main(): start_kernel() (to before trap_init()).

And if it's neither of the above, I think you'll need to help bisect it.

               Linus

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Boot failures with net-next after rebase to v4.17.0-rc1
  2018-04-24 20:04 ` Linus Torvalds
@ 2018-04-25  7:16   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 3+ messages in thread
From: Jesper Dangaard Brouer @ 2018-04-25  7:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: netdev, LKML, David Miller, Toke Høiland-Jørgensen,
	Paul E. McKenney, David Ahern, brouer, Ingo Molnar

On Tue, 24 Apr 2018 13:04:23 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Apr 24, 2018 at 12:54 PM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> > Hi all,
> >
> > I'm experiencing boot failures with net-next git-tree after it got
> > rebased/merged with Linus'es tree at v4.17.0-rc1.  
> 
> I suspect it's the global bit stuff that came in very late in the
> merge window, and had been developed and tested for a while before,
> but showed some problems under some configs.
> 
> The fix is currently in the x86/pti tree in -tip, see:
> 
>    x86/pti: Fix boot problems from Global-bit setting
> 
> and I expect it will percolate upstream soon.
> 
> In the meantime, it would be good to verify that merging that x86/pti
> branch fixes it for you?

Thanks for spotting this so quickly!
I have verified that this DOES solve the issue for me :-)))

If others are hit by this, and cannot wait for Linus to pull the tip
tree, this is the pull command:

 git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/pti


> There is another candidate for boot problems - do you happen to have
> CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled? That can under certain
> circumstances get a percpu setup page fault because memory hadn't been
> initialized sufficiently.

CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set

> The fix there is to move the mm_init() call one step earlier in
> init_main(): start_kernel() (to before trap_init()).
> 
> And if it's neither of the above, I think you'll need to help bisect it.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-04-25  7:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-24 19:54 Boot failures with net-next after rebase to v4.17.0-rc1 Jesper Dangaard Brouer
2018-04-24 20:04 ` Linus Torvalds
2018-04-25  7:16   ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).