LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Maxim <maximlevitsky@gmail.com>
To: Len Brown <lenb@kernel.org>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
Date: Sun, 18 Mar 2007 01:00:01 +0200 [thread overview]
Message-ID: <200703180100.01889.maximlevitsky@gmail.com> (raw)
In-Reply-To: <200703161919.44458.lenb@kernel.org>
On Saturday 17 March 2007 01:19:44 Len Brown wrote:
> On Friday 16 March 2007 06:30, Maxim Levitsky wrote:
> >
> > Good day,
> >
> > I want to report regressions I have with 2.6.21-rc3 kernel.
> > I use CONFIG_NO_HZ.
>
> Do any of these issues go away with CONFIG_NO_HZ=n (or boot with nohz=n)
> or are they all independent of it?
>
> thanks,
> -Len
>
> > 1) Both suspend to disk and suspend to RAM are completely broken:
> > On vanilla 2.6.20 suspend to disk works perfectly and suspend to ram works _almost_ perfectly (I will tell about that later).
> > On 2.6.21-rc1 and later system hangs even before suspend begins (suspend to disk hangs before image write , and after suspend to ram ,
> > some devices are powered down (disk,power leds) , and some and not(fans, power) , and system hangs).
> >
> > I did a git-bisect and I found which commit caused that:
> > e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c (breaks S3)
> > ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c (breaks swsusp, I don't use it, but I tested it)
> > 259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (breaks uswsusp, that I use)
> >
> > I reverted those commits and now system suspends correctly to disk, but suspend to ram showed some more regressions.
> >
> >
> > 2) ) After suspend to ram I get this
> >
> > Mar 14 00:22:23 MAIN kernel: [ 2.072875] caller is check_tsc_sync_source+0x1d/0x100
> > Mar 14 00:22:23 MAIN kernel: [ 2.072878] [show_trace_log_lvl+26/48] show_trace_log_lvl+0x1a/0x30
> > Mar 14 00:22:23 MAIN kernel: [ 2.072881] [show_trace+18/32] show_trace+0x12/0x20
> > Mar 14 00:22:23 MAIN kernel: [ 2.072884] [dump_stack+22/32] dump_stack+0x16/0x20
> > Mar 14 00:22:23 MAIN kernel: [ 2.072887] [debug_smp_processor_id+173/176] debug_smp_processor_id+0xad/0xb0
> > Mar 14 00:22:23 MAIN kernel: [ 2.072891] [check_tsc_sync_source+29/256] check_tsc_sync_source+0x1d/0x100
> > Mar 14 00:22:23 MAIN kernel: [ 2.072894] [__cpu_up+80/384] __cpu_up+0x50/0x180
> > Mar 14 00:22:23 MAIN kernel: [ 2.072897] [_cpu_up+98/208] _cpu_up+0x62/0xd0
> > Mar 14 00:22:23 MAIN kernel: [ 2.072901] [cpu_up+46/80] cpu_up+0x2e/0x50
> > Mar 14 00:22:23 MAIN kernel: [ 2.072903] [enable_nonboot_cpus+110/160] enable_nonboot_cpus+0x6e/0xa0
> > Mar 14 00:22:23 MAIN kernel: [ 2.072906] [enter_state+326/496] enter_state+0x146/0x1f0
> > Mar 14 00:22:23 MAIN kernel: [ 2.072909] [state_store+174/192] state_store+0xae/0xc0
> > Mar 14 00:22:23 MAIN kernel: [ 2.072912] [subsys_attr_store+43/64] subsys_attr_store+0x2b/0x40
> > Mar 14 00:22:23 MAIN kernel: [ 2.072917] [sysfs_write_file+186/272] sysfs_write_file+0xba/0x110
> > Mar 14 00:22:23 MAIN kernel: [ 2.072920] [vfs_write+150/352] vfs_write+0x96/0x160
> > Mar 14 00:22:23 MAIN kernel: [ 2.072923] [sys_write+61/112] sys_write+0x3d/0x70
> > Mar 14 00:22:23 MAIN kernel: [ 2.072926] [sysenter_past_esp+93/153] sysenter_past_esp+0x5d/0x99
> > Mar 14 00:22:23 MAIN kernel: [ 2.072929] =======================
> > Mar 14 00:22:23 MAIN kernel: [ 2.072931] checking TSC synchronization [CPU#0 -> CPU#1]:
> > Mar 14 00:22:23 MAIN kernel: [ 2.092922] Measured 72051818872 cycles TSC warp between CPUs, turning off
> >
> > It looks clear that preempt is enabled all the way in second cpu initialization, ( I think that at least in check_tsc_sync_source, it should be disabled,
> > shouldn't it ? )
> >
> > Then I did add preempt_disable() / preempt_enable() to this function , and I still got this:
> >
> > Mar 14 00:22:23 MAIN kernel: [ 2.072931] checking TSC synchronization [CPU#0 -> CPU#1]:
> > Mar 14 00:22:23 MAIN kernel: [ 2.092922] Measured 72051818872 cycles TSC warp between CPUs, turning off
> >
> > It happens after second CPU is brought back on-line.
> >
> > Now I understand that this is TSC sync problem and I tried to do some tests:
> >
> > I tried to disable/enable second CPU by hand, eg I did number of times,
> >
> > echo "0" > /sys/devices/system/cpu/cpu1/online
> > echo "1" > /sys/devices/system/cpu/cpu1/online
> >
> > and TSC sync was ok.
> >
> > Then I disabled 2nd CPU, have suspended system to RAM , resumed it , and then enabled 2nd CPU and got same error message.
> > Then I disabled cpufreq , and did above tests, and got same results.
> > I think that maybe this error is false, that there is some difference in TSC clock, but this difference is constant, and can be fixed
> >
> > 3) Sometimes I get this (once in three boots or so)
> >
> > [ 36.217405] ENABLING IO-APIC IRQs
> > [ 36.217587] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
> > [ 36.433917] APIC timer disabled due to verification failure.
> >
> > And NO_HZ is disabled due to that (I get 1000/s timer's interrupts)
> > I haven't investigated that yet.
> > It looks like another new test that my hardware fails to perform...
> >
> >
> > And now I want to tell you about that _almost_ working suspend to ram I got in 2.6.20:
> > To put it simply sometimes system wakes from resume, and sometimes not (about 1 in 5 times)
> > When it does it works perfectly.
> >
> > This is quite common problem but ironically my case is very different and harder to solve.
> >
> > The fact is that I found thanks to RTC tricks , similar to PM_TRACE that system hangs in exactly three places (in one of them of course)
> >
> > I put between instructions, code like that to save a position in RTC alarm which is not cleared on reboot
> > Note that this code uses ax, but I checked every time I put it that ax can be used (eg, it is loaded in next instruction)
> >
> > #define TRACE(val) \
> > movb $0x01, %al ; \
> > outb %al, $0x70 ; \
> > movb $ ## val, %al ; \
> > outb %al, $0x71
> >
> > It hangs very early in asm code, and those are places:
> >
> > 1) /home/maxim/software/kernel/linux-2.6.20-mod/arch/i386/kernel/acpi/wakeup.S:wakeup_start:
> > ljmpl $__KERNEL_CS,$wakeup_pmode_return
> > As I see that is first time wakeup low page is addressing kernel memory by jumping to it.
> >
> > 2) /home/maxim/software/kernel/linux-2.6.20-mod/arch/i386/kernel/acpi/wakeup.S:do_suspend_lowlevel
> > call restore_registers
> > It hangs exacly on that instruction, I can only see that this is first time protected stack is accessed
> >
> > 3) /home/maxim/software/kernel/linux-2.6.20-mod/arch/i386/power/cpu.c :__restore_processor_state(struct saved_context *ctxt)
> > mtrr_ap_init();
> >
> > It hangs somewhere inside this function, and because is is long I haven't found where exactly, it is easier to disable MTRR all together.
> >
> > Note that all three places have different external behavier:
> >
> > in first case, system powers on -> off -> on -> hangs
> > I did put test in RTC to see whenever BIOS call kernel twice but I know from this test for sure that it calls it only once,
> > It actualy makes sense, because exception occures before IDT is loaded, so system has no choice but to power down
> >
> > in second case I see blinking leds -> almost sure a oops
> >
> > in third case system just hangs
> >
> > That's all, I will continue to dig those problems out
> >
> > Thanks for attention,
> > Maxim Levitsky
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
Issues that are caused by those
e3c7db621bed4afb8e231cb005057f2feb5db557 - [PATCH] [PATCH] PM: Change code ordering in main.c (breaks S3)
ed746e3b18f4df18afa3763155972c5835f284c5 - [PATCH] [PATCH] swsusp: Change code ordering in disk.c (breaks swsusp, I don't use it, but I tested it)
259130526c267550bc365d3015917d90667732f1 - [PATCH] [PATCH] swsusp: Change code ordering in user.c (breaks uswsusp, that I use)
doesn't go away, even worse, I tested latest Linus's git tree, and same hang both is suspend to disk and suspend to ram (before suspending)
>Mar 14 00:22:23 MAIN kernel: [ 2.072875] caller is check_tsc_sync_source+0x1d/0x100
>Mar 14 00:22:23 MAIN kernel: [ 2.072878] [show_trace_log_lvl+26/48] show_trace_log_lvl+0x1a/0x30
>Mar 14 00:22:23 MAIN kernel: [ 2.072881] [show_trace+18/32] show_trace+0x12/0x20
>Mar 14 00:22:23 MAIN kernel: [ 2.072884] [dump_stack+22/32] dump_stack+0x16/0x20
>Mar 14 00:22:23 MAIN kernel: [ 2.072887] [debug_smp_processor_id+173/176] debug_smp_processor_id+0xad/0x
^ This thing is fixed (sorry for bothering you)
>Mar 14 00:22:23 MAIN kernel: [ 2.072931] checking TSC synchronization [CPU#0 -> CPU#1]:
>Mar 14 00:22:23 MAIN kernel: [ 2.092922] Measured 72051818872 cycles TSC warp between CPUs, turning off
^ This one I don't think is related to NO_HZ, maybe it is hardware problem, but it exist without NO_HZ
>[ 36.217405] ENABLING IO-APIC IRQs
>[ 36.217587] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
>[ 36.433917] APIC timer disabled due to verification failure.
This one is now discussed, I will look at it and it is not related to NO_HZ
And hangs with S3 as I said occur even in 2.6.20
And I forgot to tell about another problem with (now I know ,hi-resolution timers)
That before suspend to ram APIC timer is used and HPET is not used :
root@MAIN:/home/maxim# cat /sys/devices/system/clockevents/clockevents0/registered
lapic F:0007 M:3(periodic) C: 1
hpet F:0003 M:1(shutdown) C: 0
lapic F:0007 M:3(periodic) C: 0
root@MAIN:/home/maxim#
But after suspend to ram HPET is 'woken'
root@MAIN:/home/maxim# cat /sys/devices/system/clockevents/clockevents0/registered
lapic F:0007 M:3(one shoot) C: 1
hpet F:0003 M:3(one shoot) C: 0
lapic F:0007 M:3(one shoot) C: 0
Note that I added those (one shoot), (periodic) descriptions, would be nice to have them in kernel, can I send a patch ? ;-)
and I see average of 18 IRQs/sec on IRQ 0
Best regards,
Maxim Levitsky
next prev parent reply other threads:[~2007-03-17 23:00 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-16 10:30 Maxim Levitsky
2007-03-16 23:19 ` Len Brown
2007-03-17 23:00 ` Maxim [this message]
2007-03-17 23:32 ` Thomas Gleixner
2007-12-30 7:50 ` Mike Galbraith
2007-12-30 14:53 ` Ingo Molnar
2007-03-20 11:54 ` sysfs ugly timer interface (was Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far) Pavel Machek
2007-03-22 15:28 ` Greg KH
2007-03-22 15:41 ` Thomas Gleixner
2007-03-23 1:24 ` sysfs q [was: sysfs ugly timer interface] Jan Engelhardt
2007-03-23 4:48 ` Greg KH
2007-03-23 6:05 ` Jan Engelhardt
2007-03-16 23:39 ` [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far Thomas Gleixner
2007-03-17 23:01 ` Maxim
2007-03-16 23:44 ` Thomas Gleixner
2007-03-17 0:04 ` [PATCH] i386: trust the PM-Timer calibration of the local APIC timer Thomas Gleixner
2007-03-17 7:22 ` Ingo Molnar
2007-03-17 13:24 ` Andi Kleen
2007-03-18 8:12 ` Andrew Morton
2007-03-18 8:45 ` Thomas Gleixner
2007-03-17 1:32 ` [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far Len Brown
2007-03-17 9:56 ` Thomas Gleixner
2007-03-17 11:05 ` Thomas Gleixner
2007-03-17 16:52 ` Thomas Gleixner
2007-03-17 10:32 ` Arjan van de Ven
2007-03-17 13:26 ` Andi Kleen
2007-03-20 4:27 ` Greg KH
2007-03-20 6:30 ` Thomas Gleixner
2007-03-20 9:14 ` Arjan van de Ven
2007-03-20 11:36 ` Andi Kleen
2007-03-20 11:41 ` Oliver Neukum
2007-03-17 22:45 ` Maxim
2007-03-20 5:04 ` Lee Revell
2007-03-20 5:36 ` Eric St-Laurent
2007-03-20 9:15 ` Arjan van de Ven
2007-03-20 18:04 ` Andy Lutomirski
2007-03-20 22:58 ` Eric St-Laurent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200703180100.01889.maximlevitsky@gmail.com \
--to=maximlevitsky@gmail.com \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--subject='Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).