LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.25-rc5-git6: Reported regressions from 2.6.24
@ 2008-03-16 23:18 Rafael J. Wysocki
  2008-03-16 23:33 ` Linus Torvalds
                   ` (4 more replies)
  0 siblings, 5 replies; 41+ messages in thread
From: Rafael J. Wysocki @ 2008-03-16 23:18 UTC (permalink / raw)
  To: LKML; +Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich

This message contains a list of some regressions from 2.6.24 reported since
2.6.25-rc1 was released, for which there are no fixes in the mainline I know
of.  If any of them have been fixed already, please let me know.

If you know of any other unresolved regressions from 2.6.24, please let me know
either and I'll add them to the list.  Also, please let me know if any of the
entries below are invalid.


Listed regressions statistics:

  Date          Total  Pending  Unresolved
  ----------------------------------------
  2008-03-17      148       38          30
  2008-03-16      146       42          35
  2008-03-14      145       45          39
  2008-03-12      143       51          41
  2008-03-11      141       58          43
  2008-03-10      138       66          47
  2008-03-03      115       65          49
  2008-02-25       90       51          39
  2008-02-17       61       45          37


Unresolved regressions
----------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9962
Subject		: mount: could not find filesystem
Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date		: 2008-02-12 14:34 (34 days old)
References	: http://lkml.org/lkml/2008/2/12/91
Handled-By	: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
		  Yinghai Lu <yhlu.kernel@gmail.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9976
Subject		: BUG: 2.6.25-rc1: iptables postrouting setup causes oops
Submitter	: Ben Nizette <bn@niasdigital.com>
Date		: 2008-02-12 12:46 (34 days old)
References	: http://lkml.org/lkml/2008/2/12/148
Handled-By	: Haavard Skinnemoen <hskinnemoen@atmel.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9978
Subject		: 2.6.25-rc1: volanoMark regression
Submitter	: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Date		: 2008-02-13 10:30 (33 days old)
References	: http://lkml.org/lkml/2008/2/13/128
		  http://lkml.org/lkml/2008/3/12/52
Handled-By	: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
		  Balbir Singh <balbir@linux.vnet.ibm.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9980
Subject		: 2.6.25-rc1 on Sun Ultra 40- HPET clocksource which causes it to hang
Submitter	: Jasper Bryant-Greene <jasper@unix.geek.nz>
Date		: 2008-02-13 12:25 (33 days old)
References	: http://lkml.org/lkml/2008/2/13/181
Handled-By	: Yinghai Lu <Yinghai.Lu@sun.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9983
Subject		: PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
Submitter	: Linas Žvirblis <0x0007@gmail.com>
Date		: 2008-02-13 22:38 (33 days old)
References	: http://lkml.org/lkml/2008/2/13/566


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9995
Subject		: 2.6.25-rc1 regression - backlight controlls do not work - ThinkPad T61
Submitter	: Lukas Hejtmanek <xhejtman@fi.muni.cz>
Date		: 2008-02-15 04:51 (31 days old)
Handled-By	: Zhang Rui <rui.zhang@intel.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10011
Subject		: The computer is blocked when X is started - unless max_cstate=2 - Acer Travelmate 4001 Lmi
Submitter	: François Valenduc <francois.valenduc@tvcablenet.be>
Date		: 2008-02-17 06:28 (29 days old)
Handled-By	: Thomas Gleixner <tglx@linutronix.de>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10027
Subject		: 2.6.25-rc[12] Video4Linux Bttv Regression
Submitter	: Bongani Hlope <bonganilinux@mweb.co.za>
Date		: 2008-02-17 09:36 (29 days old)
References	: http://lkml.org/lkml/2008/2/17/55
Handled-By	: Mauro Carvalho Chehab <mchehab@infradead.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10041
Subject		: 2.6.25-rc1/2 regression: first-time login into gnome fails
Submitter	: Romano Giannetti <romanol@upcomillas.es>
Date		: 2008-02-18 11:56 (28 days old)
References	: http://lkml.org/lkml/2008/2/18/145
Handled-By	: Ray Lee <ray-lk@madrabbit.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10051
Subject		: Spurious messages at boot, eventually hangs the usb subsustem
Submitter	: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
Date		: 2008-02-20 09:10 (26 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10065
Subject		: 2.6.25-rc2 regression - hang on suspend
Submitter	: Soeren Sonnenburg <kernel@nn7.de>
Date		: 2008-02-19 12:59 (27 days old)
References	: http://lkml.org/lkml/2008/2/19/165
		  http://lkml.org/lkml/2008/2/17/381
Handled-By	: Rafael J. Wysocki <rjw@sisk.pl>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10067
Subject		: TUNER_TDA8290=y, VIDEO_DEV=n build error
Submitter	: Toralf Förster <toralf.foerster@gmx.de>
Date		: 2008-02-22 10:36 (24 days old)
References	: http://lkml.org/lkml/2008/2/19/262


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10082
Subject		: 2.6.25-rc2-git4 - Kernel oops while running kernbench and tbench on powerpc
Submitter	: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date		: 2008-02-20 16:01 (26 days old)
References	: http://lkml.org/lkml/2008/2/20/218
		  http://lkml.org/lkml/2008/1/18/71


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10086
Subject		: 2.6.25-rc2 + smartd = hang
Submitter	: Anders Eriksson <aeriksson@fastmail.fm>
Date		: 2008-02-22 17:51 (24 days old)
References	: http://lkml.org/lkml/2008/2/22/239
Handled-By	: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10093
Subject		: 2.6.25-current-git hangs on boot unless CONFIG_CPU_IDLE=n - Apple
Submitter	: Soeren Sonnenburg <kernel@nn7.de>
Date		: 2008-02-23 18:55 (23 days old)
References	: http://lkml.org/lkml/2008/2/23/263
		  http://marc.info/?l=linux-acpi&amp;m=120387537018467&amp;w=4
Handled-By	: Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10117
Subject		: 2.6.25-current-git hangs on boot (pci=nommconf helps)
Submitter	: Soeren Sonnenburg <kernel@nn7.de>
Date		: 2008-02-23 18:55 (23 days old)
References	: http://lkml.org/lkml/2008/2/23/263


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10133
Subject		: INFO: possible circular locking in the resume
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2008-02-27 (19 days old)
References	: http://lkml.org/lkml/2008/2/26/479
Handled-By	: Gautham R Shenoy <ego@in.ibm.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10146
Subject		: 2.6.25-rc: complete lockup on boot/start of X (bisected)
Submitter	: Marcin Slusarz <marcin.slusarz@gmail.com>
Date		: 2008-03-02 20:00 (15 days old)
References	: http://lkml.org/lkml/2008/3/2/91
Handled-By	: Peter Zijlstra <a.p.zijlstra@chello.nl>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10152
Subject		: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
Submitter	: Gabriel C <nix.or.die@googlemail.com>
Date		: 2008-02-24 01:31 (22 days old)
References	: http://lkml.org/lkml/2008/2/23/380
		  http://lkml.org/lkml/2008/2/24/281
Handled-By	: Thomas Gleixner <tglx@linutronix.de>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10156
Subject		: KVM &amp; Qemu crashed with infinite recursive kernel loop in the guest
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2008-02-28 11:25 (18 days old)
References	: http://lkml.org/lkml/2008/2/28/106


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10172
Subject		: kvm: INFO: inconsistent lock state
Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date		: 2008-03-05 03:26 (12 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10190
Subject		: [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error
Submitter	: Tarkan Erimer <tarkan@netone.net.tr>
Date		: 2008-03-05 05:01 (12 days old)
References	: http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/1867.html


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10203
Subject		: 2.6.25 IOMMU breaks DMA for b43 on x86_64
Submitter	: Christian Casteyde <casteyde.christian@free.fr>
Date		: 2008-03-09 00:55 (8 days old)
Handled-By	: Michael Buesch <mb@bu3sch.de>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10207
Subject		: INFO: task mount:11202 blocked for more than 120 seconds
Submitter	: Christian Kujau <lists@nerdbynature.de>
Date		: 2008-03-07 21:32 (10 days old)
References	: http://lkml.org/lkml/2008/3/7/308
		  http://lkml.org/lkml/2008/3/9/186


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10211
Subject		: drivers/media/video/cx2341x.c: undefined references
Submitter	: Toralf Förster <toralf.foerster@gmx.de>
Date		: 2008-03-07 13:48 (10 days old)
References	: http://lkml.org/lkml/2008/3/7/168


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10234
Subject		: pciehp hang on hp ia64 rx6600
Submitter	: Alex Chiang <achiang@hp.com>
Date		: 2008-03-12 00:47 (5 days old)
References	: http://lkml.org/lkml/2008/3/12/31
Handled-By	: Mark Lord <mlord@pobox.com>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10235
Subject		: 2.6.25-rc5: Blank Screen with Intel 945
Submitter	: Justin Madru <jdm64@gawab.com>
Date		: 2008-03-12 12:02 (5 days old)
References	: http://lkml.org/lkml/2008/3/12/290
Handled-By	: Jesse Barnes <jbarnes@virtuousgeek.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10238
Subject		: netconsole still hangs
Submitter	: Andrew Morton <akpm@linux-foundation.org>
Date		: 2008-03-12 23:14 (5 days old)
References	: http://marc.info/?t=120536379200004&amp;r=1&amp;w=2
Handled-By	: David Miller <davem@davemloft.net>
		  Stephen Hemminger <shemminger@linux-foundation.org>


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10242
Subject		: rm command hangs
Submitter	: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
Date		: 2008-03-14 05:47 (3 days old)


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10266
Subject		: [PATCH] i810fb: Fix console switch regression
Submitter	: Stefan Bauer <stefan.bauer@cs.tu-chemnitz.de>
Date		: 2008-03-16 19:42 (1 days old)
References	: http://lkml.org/lkml/2008/3/16/84


Regressionn with patches
------------------------

Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9969
Subject		: 2.6.24-git15 Keyboard Issue?
Submitter	: Chris Holvenstot <cholvenstot@comcast.net>
Date		: 2008-02-06 14:02 (40 days old)
References	: http://lkml.org/lkml/2008/2/6/100
		  http://lkml.org/lkml/2008/2/13/82
Handled-By	: Thomas Gleixner <tglx@linutronix.de>
Patch		: http://lkml.org/lkml/2008/2/15/343


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10016
Subject		: cobalt_btns.c &lt;-&gt; struct platform_device compile error
Submitter	: Adrian Bunk <bunk@kernel.org>
Date		: 2008-02-17 12:12 (29 days old)
References	: http://lkml.org/lkml/2008/2/17/293
Handled-By	: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Patch		: http://lkml.org/lkml/2008/3/9/25


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10017
Subject		: cdev removal broke cobalt_btns.c compilation
Submitter	: Adrian Bunk <bunk@kernel.org>
Date		: 2008-02-17 12:14 (29 days old)
References	: http://lkml.org/lkml/2008/2/17/295
Handled-By	: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Patch		: http://lkml.org/lkml/2008/3/9/25


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10153
Subject		: (regression) kernel/timeconst.h bugs with HZ=128
Submitter	: David Brownell <david-b@pacbell.net>
Date		: 2008-02-26 19:32 (20 days old)
References	: http://lkml.org/lkml/2008/2/26/294
Handled-By	: H. Peter Anvin <hpa@zytor.com>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=15114&amp;action=view
		  http://bugzilla.kernel.org/attachment.cgi?id=15115&amp;action=view


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10186
Subject		: SCSI_AIC94XX must depend on SCSI
Submitter	: Toralf Förster <toralf.foerster@gmx.de>
Date		: 2008-03-06 19:09 (11 days old)
References	: http://marc.info/?l=linux-kernel&amp;m=120483073617232&amp;w=2
Handled-By	: Adrian Bunk <bunk@kernel.org>
Patch		: http://marc.info/?l=linux-kernel&amp;m=120483499725928&amp;w=2


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10210
Subject		: 2.6.25-rc4-git3: Handling of audio CDs broken on pata_ali
Submitter	: Rafael J. Wysocki <rjw@sisk.pl>
Date		: 2008-03-08 22:46 (9 days old)
References	: http://lkml.org/lkml/2008/3/8/123
Handled-By	: Tejun Heo <htejun@gmail.com>
Patch		: http://lkml.org/lkml/2008/3/10/69


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10232
Subject		: intel mtrr fixups apparently broke display and e1000 probe
Submitter	: Stephen Gran <steve@lobefin.net>
Date		: 2008-03-12 08:37 (5 days old)
Handled-By	: Yinghai Lu <yhlu.kenrel@gmail.com>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=15271&amp;action=view


Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10259
Subject		: /sys/class/hwmon/hwmon0 is missing a device link
Submitter	: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
Date		: 2008-03-16 04:56 (1 days old)
Handled-By	: Jean Delvare <khali@linux-fr.org>
Patch		: http://bugzilla.kernel.org/attachment.cgi?id=15301&amp;action=view


For details, please visit the bug entries and follow the links given in
references.

As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.24,
unresolved as well as resolved, at:

http://bugzilla.kernel.org/show_bug.cgi?id=9832

Please let me know if there are any Bugzilla entries that should be added to
the list in there.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-16 23:18 2.6.25-rc5-git6: Reported regressions from 2.6.24 Rafael J. Wysocki
@ 2008-03-16 23:33 ` Linus Torvalds
  2008-03-16 23:38   ` Rafael J. Wysocki
  2008-03-17  0:20 ` Gabriel C
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2008-03-16 23:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Natalie Protasevich,
	Linas Žvirblis



On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> 
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9983
> Subject		: PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> Submitter	: Linas Žvirblis <0x0007@gmail.com>
> Date		: 2008-02-13 22:38 (33 days old)
> References	: http://lkml.org/lkml/2008/2/13/566

This is most likely already fixed by commit
e82cc1288fa57857c6af8c57f3d07096d4bcd9d9.

Unless Linas can reproduce it with a newer kernel (I'm cutting an -rc6 
right now, but any -git snapshot in the last few days should work) this 
one should be closed.  We can't keep things open just because the tester 
hasn't tested.

		Linus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-16 23:33 ` Linus Torvalds
@ 2008-03-16 23:38   ` Rafael J. Wysocki
  0 siblings, 0 replies; 41+ messages in thread
From: Rafael J. Wysocki @ 2008-03-16 23:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: LKML, Adrian Bunk, Andrew Morton, Natalie Protasevich,
	Linas Žvirblis

On Monday, 17 of March 2008, Linus Torvalds wrote:
> 
> On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> > 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9983
> > Subject		: PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> > Submitter	: Linas Žvirblis <0x0007@gmail.com>
> > Date		: 2008-02-13 22:38 (33 days old)
> > References	: http://lkml.org/lkml/2008/2/13/566
> 
> This is most likely already fixed by commit
> e82cc1288fa57857c6af8c57f3d07096d4bcd9d9.
> 
> Unless Linas can reproduce it with a newer kernel (I'm cutting an -rc6 
> right now, but any -git snapshot in the last few days should work) this 
> one should be closed.  We can't keep things open just because the tester 
> hasn't tested.

Sure, I'll close it if there's no response in a couple of days.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-16 23:18 2.6.25-rc5-git6: Reported regressions from 2.6.24 Rafael J. Wysocki
  2008-03-16 23:33 ` Linus Torvalds
@ 2008-03-17  0:20 ` Gabriel C
  2008-03-17 16:17   ` Thomas Gleixner
  2008-03-17  6:47 ` Jason Wu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-17  0:20 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich, Thomas Gleixner

Rafael J. Wysocki wrote:

Hi,

> This message contains a list of some regressions from 2.6.24 reported since
> 2.6.25-rc1 was released, for which there are no fixes in the mainline I know
> of.  If any of them have been fixed already, please let me know.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10152
> Subject		: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> Submitter	: Gabriel C <nix.or.die@googlemail.com>
> Date		: 2008-02-24 01:31 (22 days old)
> References	: http://lkml.org/lkml/2008/2/23/380
> 		  http://lkml.org/lkml/2008/2/24/281
> Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> 

Thomas do you want me to bisect ? 

Or do you have any patches I could try ( really does not matter how experimental they are ) ?

Rafael the bug report is saying x86-64 Component while my box is 32bit :) Could you please correct this ?

Best Regards 

Gabriel




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-16 23:18 2.6.25-rc5-git6: Reported regressions from 2.6.24 Rafael J. Wysocki
  2008-03-16 23:33 ` Linus Torvalds
  2008-03-17  0:20 ` Gabriel C
@ 2008-03-17  6:47 ` Jason Wu
  2008-03-17 21:36   ` Rafael J. Wysocki
  2008-03-23 19:01 ` Christian Kujau
  2008-03-23 21:17 ` Christian Kujau
  4 siblings, 1 reply; 41+ messages in thread
From: Jason Wu @ 2008-03-17  6:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich

2008/3/17, Rafael J. Wysocki <rjw@sisk.pl>:> This message contains a list of some regressions from 2.6.24 reported since> 2.6.25-rc1 was released, for which there are no fixes in the mainline I know> of.  If any of them have been fixed already, please let me know.>> If you know of any other unresolved regressions from 2.6.24, please let me know> either and I'll add them to the list.  Also, please let me know if any of the> entries below are invalid.>>> Listed regressions statistics:>>  Date          Total  Pending  Unresolved>  ---------------------------------------->  2008-03-17      148       38          30>  2008-03-16      146       42          35>  2008-03-14      145       45          39>  2008-03-12      143       51          41>  2008-03-11      141       58          43>  2008-03-10      138       66          47>  2008-03-03      115       65          49>  2008-02-25       90       51          39>  2008-02-17       61       45          37>>> Unresolved regressions> ---------------------->> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9962> Subject         : mount: could not find filesystem> Submitter       : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>> Date            : 2008-02-12 14:34 (34 days old)> References      : http://lkml.org/lkml/2008/2/12/91> Handled-By      : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>>                  Yinghai Lu <yhlu.kernel@gmail.com>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9976> Subject         : BUG: 2.6.25-rc1: iptables postrouting setup causes oops> Submitter       : Ben Nizette <bn@niasdigital.com>> Date            : 2008-02-12 12:46 (34 days old)> References      : http://lkml.org/lkml/2008/2/12/148> Handled-By      : Haavard Skinnemoen <hskinnemoen@atmel.com>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9978> Subject         : 2.6.25-rc1: volanoMark regression> Submitter       : Zhang, Yanmin <yanmin_zhang@linux.intel.com>> Date            : 2008-02-13 10:30 (33 days old)> References      : http://lkml.org/lkml/2008/2/13/128>                  http://lkml.org/lkml/2008/3/12/52> Handled-By      : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>>                  Balbir Singh <balbir@linux.vnet.ibm.com>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9980> Subject         : 2.6.25-rc1 on Sun Ultra 40- HPET clocksource which causes it to hang> Submitter       : Jasper Bryant-Greene <jasper@unix.geek.nz>> Date            : 2008-02-13 12:25 (33 days old)> References      : http://lkml.org/lkml/2008/2/13/181> Handled-By      : Yinghai Lu <Yinghai.Lu@sun.com>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9983> Subject         : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)> Submitter       : Linas Žvirblis <0x0007@gmail.com>> Date            : 2008-02-13 22:38 (33 days old)> References      : http://lkml.org/lkml/2008/2/13/566>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9995> Subject         : 2.6.25-rc1 regression - backlight controlls do not work - ThinkPad T61> Submitter       : Lukas Hejtmanek <xhejtman@fi.muni.cz>> Date            : 2008-02-15 04:51 (31 days old)> Handled-By      : Zhang Rui <rui.zhang@intel.com>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10011> Subject         : The computer is blocked when X is started - unless max_cstate=2 - Acer Travelmate 4001 Lmi> Submitter       : François Valenduc <francois.valenduc@tvcablenet.be>> Date            : 2008-02-17 06:28 (29 days old)> Handled-By      : Thomas Gleixner <tglx@linutronix.de>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10027> Subject         : 2.6.25-rc[12] Video4Linux Bttv Regression> Submitter       : Bongani Hlope <bonganilinux@mweb.co.za>> Date            : 2008-02-17 09:36 (29 days old)> References      : http://lkml.org/lkml/2008/2/17/55> Handled-By      : Mauro Carvalho Chehab <mchehab@infradead.org>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10041> Subject         : 2.6.25-rc1/2 regression: first-time login into gnome fails> Submitter       : Romano Giannetti <romanol@upcomillas.es>> Date            : 2008-02-18 11:56 (28 days old)> References      : http://lkml.org/lkml/2008/2/18/145> Handled-By      : Ray Lee <ray-lk@madrabbit.org>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10051> Subject         : Spurious messages at boot, eventually hangs the usb subsustem> Submitter       : Jean-Luc Coulon <jean.luc.coulon@gmail.com>> Date            : 2008-02-20 09:10 (26 days old)>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10065> Subject         : 2.6.25-rc2 regression - hang on suspend> Submitter       : Soeren Sonnenburg <kernel@nn7.de>> Date            : 2008-02-19 12:59 (27 days old)> References      : http://lkml.org/lkml/2008/2/19/165>                  http://lkml.org/lkml/2008/2/17/381> Handled-By      : Rafael J. Wysocki <rjw@sisk.pl>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10067> Subject         : TUNER_TDA8290=y, VIDEO_DEV=n build error> Submitter       : Toralf Förster <toralf.foerster@gmx.de>> Date            : 2008-02-22 10:36 (24 days old)> References      : http://lkml.org/lkml/2008/2/19/262>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10082> Subject         : 2.6.25-rc2-git4 - Kernel oops while running kernbench and tbench on powerpc> Submitter       : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>> Date            : 2008-02-20 16:01 (26 days old)> References      : http://lkml.org/lkml/2008/2/20/218>                  http://lkml.org/lkml/2008/1/18/71>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10086> Subject         : 2.6.25-rc2 + smartd = hang> Submitter       : Anders Eriksson <aeriksson@fastmail.fm>> Date            : 2008-02-22 17:51 (24 days old)> References      : http://lkml.org/lkml/2008/2/22/239> Handled-By      : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10093> Subject         : 2.6.25-current-git hangs on boot unless CONFIG_CPU_IDLE=n - Apple> Submitter       : Soeren Sonnenburg <kernel@nn7.de>> Date            : 2008-02-23 18:55 (23 days old)> References      : http://lkml.org/lkml/2008/2/23/263>                  http://marc.info/?l=linux-acpi&amp;m=120387537018467&amp;w=4> Handled-By      : Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10117> Subject         : 2.6.25-current-git hangs on boot (pci=nommconf helps)> Submitter       : Soeren Sonnenburg <kernel@nn7.de>> Date            : 2008-02-23 18:55 (23 days old)> References      : http://lkml.org/lkml/2008/2/23/263>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10133> Subject         : INFO: possible circular locking in the resume> Submitter       : Zdenek Kabelac <zdenek.kabelac@gmail.com>> Date            : 2008-02-27 (19 days old)> References      : http://lkml.org/lkml/2008/2/26/479> Handled-By      : Gautham R Shenoy <ego@in.ibm.com>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10146> Subject         : 2.6.25-rc: complete lockup on boot/start of X (bisected)> Submitter       : Marcin Slusarz <marcin.slusarz@gmail.com>> Date            : 2008-03-02 20:00 (15 days old)> References      : http://lkml.org/lkml/2008/3/2/91> Handled-By      : Peter Zijlstra <a.p.zijlstra@chello.nl>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10152> Subject         : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box> Submitter       : Gabriel C <nix.or.die@googlemail.com>> Date            : 2008-02-24 01:31 (22 days old)> References      : http://lkml.org/lkml/2008/2/23/380>                  http://lkml.org/lkml/2008/2/24/281> Handled-By      : Thomas Gleixner <tglx@linutronix.de>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10156> Subject         : KVM &amp; Qemu crashed with infinite recursive kernel loop in the guest> Submitter       : Zdenek Kabelac <zdenek.kabelac@gmail.com>> Date            : 2008-02-28 11:25 (18 days old)> References      : http://lkml.org/lkml/2008/2/28/106>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10172> Subject         : kvm: INFO: inconsistent lock state> Submitter       : Zdenek Kabelac <zdenek.kabelac@gmail.com>> Date            : 2008-03-05 03:26 (12 days old)>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10190> Subject         : [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error> Submitter       : Tarkan Erimer <tarkan@netone.net.tr>> Date            : 2008-03-05 05:01 (12 days old)> References      : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/1867.html>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10203> Subject         : 2.6.25 IOMMU breaks DMA for b43 on x86_64> Submitter       : Christian Casteyde <casteyde.christian@free.fr>> Date            : 2008-03-09 00:55 (8 days old)> Handled-By      : Michael Buesch <mb@bu3sch.de>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10207> Subject         : INFO: task mount:11202 blocked for more than 120 seconds> Submitter       : Christian Kujau <lists@nerdbynature.de>> Date            : 2008-03-07 21:32 (10 days old)> References      : http://lkml.org/lkml/2008/3/7/308>                  http://lkml.org/lkml/2008/3/9/186>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10211> Subject         : drivers/media/video/cx2341x.c: undefined references> Submitter       : Toralf Förster <toralf.foerster@gmx.de>> Date            : 2008-03-07 13:48 (10 days old)> References      : http://lkml.org/lkml/2008/3/7/168>I think patch of Mauro Carvalho Chehab can fix this bug.http://linuxtv.org/hg/v4l-dvb/rev/ba1a6a7bd53b
J
>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10234> Subject         : pciehp hang on hp ia64 rx6600> Submitter       : Alex Chiang <achiang@hp.com>> Date            : 2008-03-12 00:47 (5 days old)> References      : http://lkml.org/lkml/2008/3/12/31> Handled-By      : Mark Lord <mlord@pobox.com>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10235> Subject         : 2.6.25-rc5: Blank Screen with Intel 945> Submitter       : Justin Madru <jdm64@gawab.com>> Date            : 2008-03-12 12:02 (5 days old)> References      : http://lkml.org/lkml/2008/3/12/290> Handled-By      : Jesse Barnes <jbarnes@virtuousgeek.org>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10238> Subject         : netconsole still hangs> Submitter       : Andrew Morton <akpm@linux-foundation.org>> Date            : 2008-03-12 23:14 (5 days old)> References      : http://marc.info/?t=120536379200004&amp;r=1&amp;w=2> Handled-By      : David Miller <davem@davemloft.net>>                  Stephen Hemminger <shemminger@linux-foundation.org>>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10242> Subject         : rm command hangs> Submitter       : Jean-Luc Coulon <jean.luc.coulon@gmail.com>> Date            : 2008-03-14 05:47 (3 days old)>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10266> Subject         : [PATCH] i810fb: Fix console switch regression> Submitter       : Stefan Bauer <stefan.bauer@cs.tu-chemnitz.de>> Date            : 2008-03-16 19:42 (1 days old)> References      : http://lkml.org/lkml/2008/3/16/84>>> Regressionn with patches> ------------------------>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9969> Subject         : 2.6.24-git15 Keyboard Issue?> Submitter       : Chris Holvenstot <cholvenstot@comcast.net>> Date            : 2008-02-06 14:02 (40 days old)> References      : http://lkml.org/lkml/2008/2/6/100>                  http://lkml.org/lkml/2008/2/13/82> Handled-By      : Thomas Gleixner <tglx@linutronix.de>> Patch           : http://lkml.org/lkml/2008/2/15/343>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10016> Subject         : cobalt_btns.c &lt;-&gt; struct platform_device compile error> Submitter       : Adrian Bunk <bunk@kernel.org>> Date            : 2008-02-17 12:12 (29 days old)> References      : http://lkml.org/lkml/2008/2/17/293> Handled-By      : Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>> Patch           : http://lkml.org/lkml/2008/3/9/25>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10017> Subject         : cdev removal broke cobalt_btns.c compilation> Submitter       : Adrian Bunk <bunk@kernel.org>> Date            : 2008-02-17 12:14 (29 days old)> References      : http://lkml.org/lkml/2008/2/17/295> Handled-By      : Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>> Patch           : http://lkml.org/lkml/2008/3/9/25>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10153> Subject         : (regression) kernel/timeconst.h bugs with HZ=128> Submitter       : David Brownell <david-b@pacbell.net>> Date            : 2008-02-26 19:32 (20 days old)> References      : http://lkml.org/lkml/2008/2/26/294> Handled-By      : H. Peter Anvin <hpa@zytor.com>> Patch           : http://bugzilla.kernel.org/attachment.cgi?id=15114&amp;action=view>                  http://bugzilla.kernel.org/attachment.cgi?id=15115&amp;action=view>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10186> Subject         : SCSI_AIC94XX must depend on SCSI> Submitter       : Toralf Förster <toralf.foerster@gmx.de>> Date            : 2008-03-06 19:09 (11 days old)> References      : http://marc.info/?l=linux-kernel&amp;m=120483073617232&amp;w=2> Handled-By      : Adrian Bunk <bunk@kernel.org>> Patch           : http://marc.info/?l=linux-kernel&amp;m=120483499725928&amp;w=2>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10210> Subject         : 2.6.25-rc4-git3: Handling of audio CDs broken on pata_ali> Submitter       : Rafael J. Wysocki <rjw@sisk.pl>> Date            : 2008-03-08 22:46 (9 days old)> References      : http://lkml.org/lkml/2008/3/8/123> Handled-By      : Tejun Heo <htejun@gmail.com>> Patch           : http://lkml.org/lkml/2008/3/10/69>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10232> Subject         : intel mtrr fixups apparently broke display and e1000 probe> Submitter       : Stephen Gran <steve@lobefin.net>> Date            : 2008-03-12 08:37 (5 days old)> Handled-By      : Yinghai Lu <yhlu.kenrel@gmail.com>> Patch           : http://bugzilla.kernel.org/attachment.cgi?id=15271&amp;action=view>>> Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10259> Subject         : /sys/class/hwmon/hwmon0 is missing a device link> Submitter       : Jean-Luc Coulon <jean.luc.coulon@gmail.com>> Date            : 2008-03-16 04:56 (1 days old)> Handled-By      : Jean Delvare <khali@linux-fr.org>> Patch           : http://bugzilla.kernel.org/attachment.cgi?id=15301&amp;action=view>>> For details, please visit the bug entries and follow the links given in> references.>> As you can see, there is a Bugzilla entry for each of the listed regressions.> There also is a Bugzilla entry used for tracking the regressions from 2.6.24,> unresolved as well as resolved, at:>> http://bugzilla.kernel.org/show_bug.cgi?id=9832>> Please let me know if there are any Bugzilla entries that should be added to> the list in there.>> Thanks,> Rafael>> --> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in> the body of a message to majordomo@vger.kernel.org> More majordomo info at  http://vger.kernel.org/majordomo-info.html> Please read the FAQ at  http://www.tux.org/lkml/>

-- BR'swenhsuanhttp://wenhsuanhack.spaces.live.com˙ôčş{.nÇ+‰ˇŸŽ‰­†+%ŠË˙ąéÝś\x17ĽŠw˙ş{.nÇ+‰ˇĽŠ{ąţGŤé˙Š{ayş\x1dʇڙë,j\a­˘fŁ˘ˇhšďę˙‘ęçz_čŽ\x03(­éšŽŠÝ˘j"ú\x1aś^[m§˙˙ž\aŤţGŤé˙˘¸?™¨č­Ú&Łř§~áśiO•ćŹzˇšvŘ^\x14\x04\x1aś^[m§˙˙Ă\f˙śě˙˘¸?–IĽ

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-17  0:20 ` Gabriel C
@ 2008-03-17 16:17   ` Thomas Gleixner
  2008-03-17 18:20     ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-17 16:17 UTC (permalink / raw)
  To: Gabriel C
  Cc: Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Mon, 17 Mar 2008, Gabriel C wrote:
> > Subject		: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> > Submitter	: Gabriel C <nix.or.die@googlemail.com>
> > Date		: 2008-02-24 01:31 (22 days old)
> > References	: http://lkml.org/lkml/2008/2/23/380
> > 		  http://lkml.org/lkml/2008/2/24/281
> > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> > 
> 
> Thomas do you want me to bisect ? 

That'd be great.

> Or do you have any patches I could try ( really does not matter how experimental they are ) ?

No, I have not the lightest clue whats going on.
 
Thanks,

	tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-17 16:17   ` Thomas Gleixner
@ 2008-03-17 18:20     ` Gabriel C
  2008-03-18  4:01       ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-17 18:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Gabriel C

Thomas Gleixner wrote:
> On Mon, 17 Mar 2008, Gabriel C wrote:
>>> Subject		: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
>>> Submitter	: Gabriel C <nix.or.die@googlemail.com>
>>> Date		: 2008-02-24 01:31 (22 days old)
>>> References	: http://lkml.org/lkml/2008/2/23/380
>>> 		  http://lkml.org/lkml/2008/2/24/281
>>> Handled-By	: Thomas Gleixner <tglx@linutronix.de>
>>>
>> Thomas do you want me to bisect ? 
> 
> That'd be great.

Ok I'll start doing that later on today.

> 
>> Or do you have any patches I could try ( really does not matter how experimental they are ) ?
> 
> No, I have not the lightest clue whats going on.
>  
> Thanks,
> 
> 	tglx
> 

Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-17  6:47 ` Jason Wu
@ 2008-03-17 21:36   ` Rafael J. Wysocki
  0 siblings, 0 replies; 41+ messages in thread
From: Rafael J. Wysocki @ 2008-03-17 21:36 UTC (permalink / raw)
  To: Jason Wu
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich

On Monday, 17 of March 2008, Jason Wu wrote:
> 2008/3/17, Rafael J. Wysocki <rjw@sisk.pl>:
> > This message contains a list of some regressions from 2.6.24 reported since
> > 2.6.25-rc1 was released, for which there are no fixes in the mainline I know
> > of.  If any of them have been fixed already, please let me know.
> >
> > If you know of any other unresolved regressions from 2.6.24, please let me know
> > either and I'll add them to the list.  Also, please let me know if any of the
> > entries below are invalid.
> >
> >
> > Listed regressions statistics:
> >
> >  Date          Total  Pending  Unresolved
> >  ----------------------------------------
> >  2008-03-17      148       38          30
> >  2008-03-16      146       42          35
> >  2008-03-14      145       45          39
> >  2008-03-12      143       51          41
> >  2008-03-11      141       58          43
> >  2008-03-10      138       66          47
> >  2008-03-03      115       65          49
> >  2008-02-25       90       51          39
> >  2008-02-17       61       45          37
> >
> >
> > Unresolved regressions
> > ----------------------
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9962
> > Subject         : mount: could not find filesystem
> > Submitter       : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> > Date            : 2008-02-12 14:34 (34 days old)
> > References      : http://lkml.org/lkml/2008/2/12/91
> > Handled-By      : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> >                  Yinghai Lu <yhlu.kernel@gmail.com>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > Subject         : BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > Submitter       : Ben Nizette <bn@niasdigital.com>
> > Date            : 2008-02-12 12:46 (34 days old)
> > References      : http://lkml.org/lkml/2008/2/12/148
> > Handled-By      : Haavard Skinnemoen <hskinnemoen@atmel.com>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9978
> > Subject         : 2.6.25-rc1: volanoMark regression
> > Submitter       : Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> > Date            : 2008-02-13 10:30 (33 days old)
> > References      : http://lkml.org/lkml/2008/2/13/128
> >                  http://lkml.org/lkml/2008/3/12/52
> > Handled-By      : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> >                  Balbir Singh <balbir@linux.vnet.ibm.com>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9980
> > Subject         : 2.6.25-rc1 on Sun Ultra 40- HPET clocksource which causes it to hang
> > Submitter       : Jasper Bryant-Greene <jasper@unix.geek.nz>
> > Date            : 2008-02-13 12:25 (33 days old)
> > References      : http://lkml.org/lkml/2008/2/13/181
> > Handled-By      : Yinghai Lu <Yinghai.Lu@sun.com>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9983
> > Subject         : PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> > Submitter       : Linas Žvirblis <0x0007@gmail.com>
> > Date            : 2008-02-13 22:38 (33 days old)
> > References      : http://lkml.org/lkml/2008/2/13/566
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=9995
> > Subject         : 2.6.25-rc1 regression - backlight controlls do not work - ThinkPad T61
> > Submitter       : Lukas Hejtmanek <xhejtman@fi.muni.cz>
> > Date            : 2008-02-15 04:51 (31 days old)
> > Handled-By      : Zhang Rui <rui.zhang@intel.com>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10011
> > Subject         : The computer is blocked when X is started - unless max_cstate=2 - Acer Travelmate 4001 Lmi
> > Submitter       : François Valenduc <francois.valenduc@tvcablenet.be>
> > Date            : 2008-02-17 06:28 (29 days old)
> > Handled-By      : Thomas Gleixner <tglx@linutronix.de>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10027
> > Subject         : 2.6.25-rc[12] Video4Linux Bttv Regression
> > Submitter       : Bongani Hlope <bonganilinux@mweb.co.za>
> > Date            : 2008-02-17 09:36 (29 days old)
> > References      : http://lkml.org/lkml/2008/2/17/55
> > Handled-By      : Mauro Carvalho Chehab <mchehab@infradead.org>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10041
> > Subject         : 2.6.25-rc1/2 regression: first-time login into gnome fails
> > Submitter       : Romano Giannetti <romanol@upcomillas.es>
> > Date            : 2008-02-18 11:56 (28 days old)
> > References      : http://lkml.org/lkml/2008/2/18/145
> > Handled-By      : Ray Lee <ray-lk@madrabbit.org>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10051
> > Subject         : Spurious messages at boot, eventually hangs the usb subsustem
> > Submitter       : Jean-Luc Coulon <jean.luc.coulon@gmail.com>
> > Date            : 2008-02-20 09:10 (26 days old)
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10065
> > Subject         : 2.6.25-rc2 regression - hang on suspend
> > Submitter       : Soeren Sonnenburg <kernel@nn7.de>
> > Date            : 2008-02-19 12:59 (27 days old)
> > References      : http://lkml.org/lkml/2008/2/19/165
> >                  http://lkml.org/lkml/2008/2/17/381
> > Handled-By      : Rafael J. Wysocki <rjw@sisk.pl>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10067
> > Subject         : TUNER_TDA8290=y, VIDEO_DEV=n build error
> > Submitter       : Toralf Förster <toralf.foerster@gmx.de>
> > Date            : 2008-02-22 10:36 (24 days old)
> > References      : http://lkml.org/lkml/2008/2/19/262
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10082
> > Subject         : 2.6.25-rc2-git4 - Kernel oops while running kernbench and tbench on powerpc
> > Submitter       : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> > Date            : 2008-02-20 16:01 (26 days old)
> > References      : http://lkml.org/lkml/2008/2/20/218
> >                  http://lkml.org/lkml/2008/1/18/71
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10086
> > Subject         : 2.6.25-rc2 + smartd = hang
> > Submitter       : Anders Eriksson <aeriksson@fastmail.fm>
> > Date            : 2008-02-22 17:51 (24 days old)
> > References      : http://lkml.org/lkml/2008/2/22/239
> > Handled-By      : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10093
> > Subject         : 2.6.25-current-git hangs on boot unless CONFIG_CPU_IDLE=n - Apple
> > Submitter       : Soeren Sonnenburg <kernel@nn7.de>
> > Date            : 2008-02-23 18:55 (23 days old)
> > References      : http://lkml.org/lkml/2008/2/23/263
> >                  http://marc.info/?l=linux-acpi&amp;m=120387537018467&amp;w=4
> > Handled-By      : Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10117
> > Subject         : 2.6.25-current-git hangs on boot (pci=nommconf helps)
> > Submitter       : Soeren Sonnenburg <kernel@nn7.de>
> > Date            : 2008-02-23 18:55 (23 days old)
> > References      : http://lkml.org/lkml/2008/2/23/263
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10133
> > Subject         : INFO: possible circular locking in the resume
> > Submitter       : Zdenek Kabelac <zdenek.kabelac@gmail.com>
> > Date            : 2008-02-27 (19 days old)
> > References      : http://lkml.org/lkml/2008/2/26/479
> > Handled-By      : Gautham R Shenoy <ego@in.ibm.com>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10146
> > Subject         : 2.6.25-rc: complete lockup on boot/start of X (bisected)
> > Submitter       : Marcin Slusarz <marcin.slusarz@gmail.com>
> > Date            : 2008-03-02 20:00 (15 days old)
> > References      : http://lkml.org/lkml/2008/3/2/91
> > Handled-By      : Peter Zijlstra <a.p.zijlstra@chello.nl>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10152
> > Subject         : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> > Submitter       : Gabriel C <nix.or.die@googlemail.com>
> > Date            : 2008-02-24 01:31 (22 days old)
> > References      : http://lkml.org/lkml/2008/2/23/380
> >                  http://lkml.org/lkml/2008/2/24/281
> > Handled-By      : Thomas Gleixner <tglx@linutronix.de>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10156
> > Subject         : KVM &amp; Qemu crashed with infinite recursive kernel loop in the guest
> > Submitter       : Zdenek Kabelac <zdenek.kabelac@gmail.com>
> > Date            : 2008-02-28 11:25 (18 days old)
> > References      : http://lkml.org/lkml/2008/2/28/106
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10172
> > Subject         : kvm: INFO: inconsistent lock state
> > Submitter       : Zdenek Kabelac <zdenek.kabelac@gmail.com>
> > Date            : 2008-03-05 03:26 (12 days old)
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10190
> > Subject         : [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error
> > Submitter       : Tarkan Erimer <tarkan@netone.net.tr>
> > Date            : 2008-03-05 05:01 (12 days old)
> > References      : http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/1867.html
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10203
> > Subject         : 2.6.25 IOMMU breaks DMA for b43 on x86_64
> > Submitter       : Christian Casteyde <casteyde.christian@free.fr>
> > Date            : 2008-03-09 00:55 (8 days old)
> > Handled-By      : Michael Buesch <mb@bu3sch.de>
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10207
> > Subject         : INFO: task mount:11202 blocked for more than 120 seconds
> > Submitter       : Christian Kujau <lists@nerdbynature.de>
> > Date            : 2008-03-07 21:32 (10 days old)
> > References      : http://lkml.org/lkml/2008/3/7/308
> >                  http://lkml.org/lkml/2008/3/9/186
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=10211
> > Subject         : drivers/media/video/cx2341x.c: undefined references
> > Submitter       : Toralf Förster <toralf.foerster@gmx.de>
> > Date            : 2008-03-07 13:48 (10 days old)
> > References      : http://lkml.org/lkml/2008/3/7/168
> >
> I think patch of Mauro Carvalho Chehab can fix this bug.
> http://linuxtv.org/hg/v4l-dvb/rev/ba1a6a7bd53b

Thanks, I updated the entry.

Rafael

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-17 18:20     ` Gabriel C
@ 2008-03-18  4:01       ` Gabriel C
  2008-03-18  4:24         ` Gabriel C
  2008-03-21 15:24         ` Gabriel C
  0 siblings, 2 replies; 41+ messages in thread
From: Gabriel C @ 2008-03-18  4:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Gabriel C, Andi Kleen,
	Ingo Molnar

Gabriel C wrote:
> Thomas Gleixner wrote:
>> On Mon, 17 Mar 2008, Gabriel C wrote:
>>>> Subject		: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
>>>> Submitter	: Gabriel C <nix.or.die@googlemail.com>
>>>> Date		: 2008-02-24 01:31 (22 days old)
>>>> References	: http://lkml.org/lkml/2008/2/23/380
>>>> 		  http://lkml.org/lkml/2008/2/24/281
>>>> Handled-By	: Thomas Gleixner <tglx@linutronix.de>
>>>>
>>> Thomas do you want me to bisect ? 
>> That'd be great.
> 
> Ok I'll start doing that later on today.
> 

I managed to bisect 'one of the bugs' down , I got some problems and used skip once because a revision didn't compiled , 
but it seems bisect got the right commit still. Sadly it seems there are 2 different bugs.

Also before I've started the bisect I've tested linux-next to be sure the bug(s) still exists and while rc1 got that already
I've started to bisect 2.6.24 ->  2.6.25-rc1.

cat .git/refs/bisect/bad
1ada5cba6a0318f90e45b38557e7b5206a9cba38

git show 1ada5cba6a0318f90e45b38557e7b5206a9cba38
commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38
Author: Andi Kleen <ak@suse.de>
Date:   Wed Jan 30 13:30:02 2008 +0100

    clocksource: make clocksource watchdog cycle through online CPUs

    This way it checks if the clocks are synchronized between CPUs too.
    This might be able to detect slowly drifting TSCs which only
    go wrong over longer time.

    Signed-off-by: Andi Kleen <ak@suse.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index cabfa19..edd5ef8 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -142,8 +142,13 @@ static void clocksource_watchdog(unsigned long data)
        }

        if (!list_empty(&watchdog_list)) {
-               __mod_timer(&watchdog_timer,
-                           watchdog_timer.expires + WATCHDOG_INTERVAL);
+               /* Cycle through CPUs to check if the CPUs stay synchronized to
+                * each other. */
+               int next_cpu = next_cpu(raw_smp_processor_id(), cpu_online_map);
+               if (next_cpu >= NR_CPUS)
+                       next_cpu = first_cpu(cpu_online_map);
+               watchdog_timer.expires += WATCHDOG_INTERVAL;
+               add_timer_on(&watchdog_timer, next_cpu);
        }
        spin_unlock(&watchdog_lock);
 }
@@ -165,7 +170,7 @@ static void clocksource_check_watchdog(struct clocksource *cs)
                if (!started && watchdog) {
                        watchdog_last = watchdog->read();
                        watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
-                       add_timer(&watchdog_timer);
+                       add_timer_on(&watchdog_timer, first_cpu(cpu_online_map));
                }
        } else {
                if (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS)
@@ -186,7 +191,8 @@ static void clocksource_check_watchdog(struct clocksource *cs)
                                watchdog_last = watchdog->read();
                                watchdog_timer.expires =
                                        jiffies + WATCHDOG_INTERVAL;
-                               add_timer(&watchdog_timer);
+                               add_timer_on(&watchdog_timer,
+                                               first_cpu(cpu_online_map));
                        }
                }
        }


 git bisect log
git-bisect start
# bad: [19af35546de68c872dcb687613e0902a602cb20e] Linux 2.6.25-rc1
git-bisect bad 19af35546de68c872dcb687613e0902a602cb20e
# good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
git-bisect good 49914084e797530d9baaf51df9eda77babc98fa8
# bad: [d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f] x86: add PAGE_KERNEL_EXEC_NOCACHE
git-bisect bad d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f
# good: [fb46990dba94866462e90623e183d02ec591cf8f] [NETFILTER]: nf_queue: remove unnecessary hook existance check
git-bisect good fb46990dba94866462e90623e183d02ec591cf8f
# good: [936722922f6d2366378de606a40c14f96915474d] [IPV4] fib_trie: compute size when needed
git-bisect good 936722922f6d2366378de606a40c14f96915474d
# bad: [ff14c6164bd532a6dc9025c07d3b562f839f00a9] x86: x86-64 ia32 ptrace pt_regs cleanup
git-bisect bad ff14c6164bd532a6dc9025c07d3b562f839f00a9
# good: [c087567d3ffb2c7c61e091982e6ca45478394f1a] SUNRPC: Remove the obsolete RPC_WAITQ macro
git-bisect good c087567d3ffb2c7c61e091982e6ca45478394f1a
# bad: [af7a78e9258ffcca681e080cbc857f854869144f] x86: move mce related declarations
git-bisect bad af7a78e9258ffcca681e080cbc857f854869144f
# good: [34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a] SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
git-bisect good 34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a
# bad: [83bd01024b1fdfc41d9b758e5669e80fca72df66] x86: protect against sigaltstack wraparound
git-bisect bad 83bd01024b1fdfc41d9b758e5669e80fca72df66
# good: [efd9ac8630e89b9ee7ce64008bd7783952374f37] time: fold __get_realtime_clock_ts() into getnstimeofday()
git-bisect good efd9ac8630e89b9ee7ce64008bd7783952374f37
# bad: [37a47db8d7f0f38dac5acf5a13abbc8f401707fa] x86: assign IRQs to HPET timers, fix
git-bisect bad 37a47db8d7f0f38dac5acf5a13abbc8f401707fa
# skip: [316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba] x86: restrict PIT clocksource usage
git-bisect skip 316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba
# bad: [4713e22ce81eb8b3353e16435362eb3d0ec95640] clocksource: add unregister function to disable unusable clocksources
git-bisect bad 4713e22ce81eb8b3353e16435362eb3d0ec95640
# bad: [1ada5cba6a0318f90e45b38557e7b5206a9cba38] clocksource: make clocksource watchdog cycle through online CPUs
git-bisect bad 1ada5cba6a0318f90e45b38557e7b5206a9cba38
# good: [1077f5a917b7c630231037826b344b2f7f5b903f] clocksource.c: use init_timer_deferrable for clocksource_watchdog
git-bisect good 1077f5a917b7c630231037826b344b2f7f5b903f


Also the broken revision died with that :

arch/x86/kernel/i8253.c: In function 'init_pit_clocksource':
arch/x86/kernel/i8253.c:207: error: implicit declaration of function 'is_hpet_enabled'
make[1]: *** [arch/x86/kernel/i8253.o] Error 1
make: *** [arch/x86/kernel] Error 2

If you tell me on how to fix that I'll restart the bisect from there , just in case ..


Also reverting the commit from 2.6.25-rc1 fixes the 'Tsc being unstable thing' but it does not fix the hang
when I boot with clocksource=acpi_pm so that seems to be introduced in a different commit.

I will try to bisect this hang also , most probably on weekend.


Also I reverted that commit from git head and an kernel compiles right now, I'll let you know in a bit if that worked out.

Please let me know if you need more informations.


Best Regards,

Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-18  4:01       ` Gabriel C
@ 2008-03-18  4:24         ` Gabriel C
  2008-03-21 15:24         ` Gabriel C
  1 sibling, 0 replies; 41+ messages in thread
From: Gabriel C @ 2008-03-18  4:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Gabriel C, Andi Kleen,
	Ingo Molnar

Gabriel C wrote:
> Gabriel C wrote:
>> Thomas Gleixner wrote:
>>> On Mon, 17 Mar 2008, Gabriel C wrote:
>>>>> Subject		: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
>>>>> Submitter	: Gabriel C <nix.or.die@googlemail.com>
>>>>> Date		: 2008-02-24 01:31 (22 days old)
>>>>> References	: http://lkml.org/lkml/2008/2/23/380
>>>>> 		  http://lkml.org/lkml/2008/2/24/281
>>>>> Handled-By	: Thomas Gleixner <tglx@linutronix.de>
>>>>>
>>>> Thomas do you want me to bisect ? 
>>> That'd be great.
>> Ok I'll start doing that later on today.
>>
> 
> I managed to bisect 'one of the bugs' down , I got some problems and used skip once because a revision didn't compiled , 
> but it seems bisect got the right commit still. Sadly it seems there are 2 different bugs.
> 
> Also before I've started the bisect I've tested linux-next to be sure the bug(s) still exists and while rc1 got that already
> I've started to bisect 2.6.24 ->  2.6.25-rc1.
> 
> cat .git/refs/bisect/bad
> 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> 
> git show 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> Author: Andi Kleen <ak@suse.de>
> Date:   Wed Jan 30 13:30:02 2008 +0100
> 
>     clocksource: make clocksource watchdog cycle through online CPUs
> 
>     This way it checks if the clocks are synchronized between CPUs too.
>     This might be able to detect slowly drifting TSCs which only
>     go wrong over longer time.
> 
>     Signed-off-by: Andi Kleen <ak@suse.de>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index cabfa19..edd5ef8 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -142,8 +142,13 @@ static void clocksource_watchdog(unsigned long data)
>         }
> 
>         if (!list_empty(&watchdog_list)) {
> -               __mod_timer(&watchdog_timer,
> -                           watchdog_timer.expires + WATCHDOG_INTERVAL);
> +               /* Cycle through CPUs to check if the CPUs stay synchronized to
> +                * each other. */
> +               int next_cpu = next_cpu(raw_smp_processor_id(), cpu_online_map);
> +               if (next_cpu >= NR_CPUS)
> +                       next_cpu = first_cpu(cpu_online_map);
> +               watchdog_timer.expires += WATCHDOG_INTERVAL;
> +               add_timer_on(&watchdog_timer, next_cpu);
>         }
>         spin_unlock(&watchdog_lock);
>  }
> @@ -165,7 +170,7 @@ static void clocksource_check_watchdog(struct clocksource *cs)
>                 if (!started && watchdog) {
>                         watchdog_last = watchdog->read();
>                         watchdog_timer.expires = jiffies + WATCHDOG_INTERVAL;
> -                       add_timer(&watchdog_timer);
> +                       add_timer_on(&watchdog_timer, first_cpu(cpu_online_map));
>                 }
>         } else {
>                 if (cs->flags & CLOCK_SOURCE_IS_CONTINUOUS)
> @@ -186,7 +191,8 @@ static void clocksource_check_watchdog(struct clocksource *cs)
>                                 watchdog_last = watchdog->read();
>                                 watchdog_timer.expires =
>                                         jiffies + WATCHDOG_INTERVAL;
> -                               add_timer(&watchdog_timer);
> +                               add_timer_on(&watchdog_timer,
> +                                               first_cpu(cpu_online_map));
>                         }
>                 }
>         }
> 
> 
>  git bisect log
> git-bisect start
> # bad: [19af35546de68c872dcb687613e0902a602cb20e] Linux 2.6.25-rc1
> git-bisect bad 19af35546de68c872dcb687613e0902a602cb20e
> # good: [49914084e797530d9baaf51df9eda77babc98fa8] Linux 2.6.24
> git-bisect good 49914084e797530d9baaf51df9eda77babc98fa8
> # bad: [d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f] x86: add PAGE_KERNEL_EXEC_NOCACHE
> git-bisect bad d2e626f45cc450c00f5f98a89b8b4c4ac3c9bf5f
> # good: [fb46990dba94866462e90623e183d02ec591cf8f] [NETFILTER]: nf_queue: remove unnecessary hook existance check
> git-bisect good fb46990dba94866462e90623e183d02ec591cf8f
> # good: [936722922f6d2366378de606a40c14f96915474d] [IPV4] fib_trie: compute size when needed
> git-bisect good 936722922f6d2366378de606a40c14f96915474d
> # bad: [ff14c6164bd532a6dc9025c07d3b562f839f00a9] x86: x86-64 ia32 ptrace pt_regs cleanup
> git-bisect bad ff14c6164bd532a6dc9025c07d3b562f839f00a9
> # good: [c087567d3ffb2c7c61e091982e6ca45478394f1a] SUNRPC: Remove the obsolete RPC_WAITQ macro
> git-bisect good c087567d3ffb2c7c61e091982e6ca45478394f1a
> # bad: [af7a78e9258ffcca681e080cbc857f854869144f] x86: move mce related declarations
> git-bisect bad af7a78e9258ffcca681e080cbc857f854869144f
> # good: [34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a] SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
> git-bisect good 34f5b4662bf4b54f22b32ce76ce70eccd7ebc68a
> # bad: [83bd01024b1fdfc41d9b758e5669e80fca72df66] x86: protect against sigaltstack wraparound
> git-bisect bad 83bd01024b1fdfc41d9b758e5669e80fca72df66
> # good: [efd9ac8630e89b9ee7ce64008bd7783952374f37] time: fold __get_realtime_clock_ts() into getnstimeofday()
> git-bisect good efd9ac8630e89b9ee7ce64008bd7783952374f37
> # bad: [37a47db8d7f0f38dac5acf5a13abbc8f401707fa] x86: assign IRQs to HPET timers, fix
> git-bisect bad 37a47db8d7f0f38dac5acf5a13abbc8f401707fa
> # skip: [316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba] x86: restrict PIT clocksource usage
> git-bisect skip 316da3b3fc8efa9a5d2c99e0d449f01ff38c6aba
> # bad: [4713e22ce81eb8b3353e16435362eb3d0ec95640] clocksource: add unregister function to disable unusable clocksources
> git-bisect bad 4713e22ce81eb8b3353e16435362eb3d0ec95640
> # bad: [1ada5cba6a0318f90e45b38557e7b5206a9cba38] clocksource: make clocksource watchdog cycle through online CPUs
> git-bisect bad 1ada5cba6a0318f90e45b38557e7b5206a9cba38
> # good: [1077f5a917b7c630231037826b344b2f7f5b903f] clocksource.c: use init_timer_deferrable for clocksource_watchdog
> git-bisect good 1077f5a917b7c630231037826b344b2f7f5b903f
> 
> 
> Also the broken revision died with that :
> 
> arch/x86/kernel/i8253.c: In function 'init_pit_clocksource':
> arch/x86/kernel/i8253.c:207: error: implicit declaration of function 'is_hpet_enabled'
> make[1]: *** [arch/x86/kernel/i8253.o] Error 1
> make: *** [arch/x86/kernel] Error 2
> 
> If you tell me on how to fix that I'll restart the bisect from there , just in case ..
> 
> 
> Also reverting the commit from 2.6.25-rc1 fixes the 'Tsc being unstable thing' but it does not fix the hang
> when I boot with clocksource=acpi_pm so that seems to be introduced in a different commit.
> 
> I will try to bisect this hang also , most probably on weekend.
> 
> 
> Also I reverted that commit from git head and an kernel compiles right now, I'll let you know in a bit if that worked out.

Worked out :)

git head - 1ada5cba6a0318f90e45b38557e7b5206a9cba38 works here.

dmesg|grep clocksource
[    0.563915] Time: tsc clocksource has been installed.

uname -a
Linux lara 2.6.25-rc6-00014-gbde4f8f-dirty #2 SMP PREEMPT Tue Mar 18 04:48:53 CET 2008 i686 GNU/Linux


Gabriel 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-18  4:01       ` Gabriel C
  2008-03-18  4:24         ` Gabriel C
@ 2008-03-21 15:24         ` Gabriel C
  2008-03-21 16:26           ` Thomas Gleixner
  1 sibling, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-21 15:24 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Gabriel C, Andi Kleen,
	Ingo Molnar

Gabriel C wrote:
> Gabriel C wrote:
>> Thomas Gleixner wrote:
>>> On Mon, 17 Mar 2008, Gabriel C wrote:
>>>>> Subject		: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
>>>>> Submitter	: Gabriel C <nix.or.die@googlemail.com>
>>>>> Date		: 2008-02-24 01:31 (22 days old)
>>>>> References	: http://lkml.org/lkml/2008/2/23/380
>>>>> 		  http://lkml.org/lkml/2008/2/24/281
>>>>> Handled-By	: Thomas Gleixner <tglx@linutronix.de>
>>>>>
>>>> Thomas do you want me to bisect ? 
>>> That'd be great.
>> Ok I'll start doing that later on today.
>>
> 
[ snip ]

> still hangs when I boot with clocksource=acpi_pm so that seems to be introduced in a different commit.
> 
> I will try to bisect this hang also , most probably on weekend.
> 

Correction on this one.

Current git head boots just fine with clocksource=acpi_pm here , I just don't know which commit fixed it.

Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 15:24         ` Gabriel C
@ 2008-03-21 16:26           ` Thomas Gleixner
  2008-03-21 16:46             ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-21 16:26 UTC (permalink / raw)
  To: Gabriel C
  Cc: Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, Gabriel C, Andi Kleen,
	Ingo Molnar

On Fri, 21 Mar 2008, Gabriel C wrote:
 
> > still hangs when I boot with clocksource=acpi_pm so that seems to
> > be introduced in a different commit.
> > 
> > I will try to bisect this hang also , most probably on weekend.
> > 
> 
> Correction on this one.
>
> Current git head boots just fine with clocksource=acpi_pm here , I
> just don't know which commit fixed it.

Hmm. Very dubious. I'm a bit afraid of self healing problems. It would
be interesting to find the commit which fixed the acpi_pm timer
problem unvoluntary.

Also, can you please reapply the reverted clocksource patch ? I have
the feeling that the acpi_pm one was the real problem which was
triggered the modfied watchdog.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 16:26           ` Thomas Gleixner
@ 2008-03-21 16:46             ` Gabriel C
  2008-03-21 18:11               ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-21 16:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Thomas Gleixner wrote:
> On Fri, 21 Mar 2008, Gabriel C wrote:
>  
>>> still hangs when I boot with clocksource=acpi_pm so that seems to
>>> be introduced in a different commit.
>>>
>>> I will try to bisect this hang also , most probably on weekend.
>>>
>> Correction on this one.
>>
>> Current git head boots just fine with clocksource=acpi_pm here , I
>> just don't know which commit fixed it.
> 
> Hmm. Very dubious. I'm a bit afraid of self healing problems. It would
> be interesting to find the commit which fixed the acpi_pm timer
> problem unvoluntary.

I can try to find it.

> 
> Also, can you please reapply the reverted clocksource patch ? I have
> the feeling that the acpi_pm one was the real problem which was
> triggered the modfied watchdog.

Sure I can , will do so in some minutes and let you know.

> 
> Thanks,
> 
> 	tglx


Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 16:46             ` Gabriel C
@ 2008-03-21 18:11               ` Gabriel C
  2008-03-21 18:49                 ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-21 18:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Gabriel C wrote:
> Thomas Gleixner wrote:
>> On Fri, 21 Mar 2008, Gabriel C wrote:
>>  
>>>> still hangs when I boot with clocksource=acpi_pm so that seems to
>>>> be introduced in a different commit.
>>>>
>>>> I will try to bisect this hang also , most probably on weekend.
>>>>
>>> Correction on this one.
>>>
>>> Current git head boots just fine with clocksource=acpi_pm here , I
>>> just don't know which commit fixed it.
>> Hmm. Very dubious. I'm a bit afraid of self healing problems. It would
>> be interesting to find the commit which fixed the acpi_pm timer
>> problem unvoluntary.
> 
> I can try to find it.
> 
>> Also, can you please reapply the reverted clocksource patch ? I have
>> the feeling that the acpi_pm one was the real problem which was
>> triggered the modfied watchdog.
> 
> Sure I can , will do so in some minutes and let you know.

It took a bit longer sorry but I have more infos now.

The acpi_pm was not related to that I still get the problem.

Of course I still can try to find the commit which magically fixed acpi_pm if you really want.

It seems like it breaks only when you enable HT and only on 2 socket motherboards. 
( at least the ones I own , I know is old hardware but worked fine for me )

Also disabling the second CPU and enabling HT works , enabling both CPUs and disabling HT works ,
booting with enabled HT and both CPUs but maxcpus=2 also works , booting with 2 CPUs and HT on breaks ,
booting with both CPUs HT on but maxcpus=3 breaks also.

Also I have another dual motherboard here 604 socket with 2 2,4 GHz Xeon's.
The motherboard has the storage controller somewhat broken but for a quick test it is still good :) and I see the
same thing.

Does that make any sense ?
 
Also all that tested on 2.6.25-rc6-00224-gae51801-dirty ( dirty while I reverted the revert =) )

Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 18:11               ` Gabriel C
@ 2008-03-21 18:49                 ` Thomas Gleixner
  2008-03-21 19:23                   ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-21 18:49 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Fri, 21 Mar 2008, Gabriel C wrote:
> >> Also, can you please reapply the reverted clocksource patch ? I have
> >> the feeling that the acpi_pm one was the real problem which was
> >> triggered the modfied watchdog.
> > 
> > Sure I can , will do so in some minutes and let you know.
> 
> It took a bit longer sorry but I have more infos now.
> 
> The acpi_pm was not related to that I still get the problem.
> 
> Of course I still can try to find the commit which magically fixed acpi_pm if you really want.

Just if you are really bored. :) I would have asked if it had fixed
the TSC issue.
 
> It seems like it breaks only when you enable HT and only on 2 socket motherboards. 
> ( at least the ones I own , I know is old hardware but worked fine for me )

Hmm. I wonder why a dual socket board survives the initial sync test.
 
> Also disabling the second CPU and enabling HT works , enabling both
> CPUs and disabling HT works , booting with enabled HT and both CPUs
> but maxcpus=2 also works , booting with 2 CPUs and HT on breaks ,
> booting with both CPUs HT on but maxcpus=3 breaks also.
> 
> Also I have another dual motherboard here 604 socket with 2 2,4 GHz
> Xeon's.  The motherboard has the storage controller somewhat broken
> but for a quick test it is still good :) and I see the same thing.
> 
> Does that make any sense ?

Not really. Can you please revert the reverted revert again and run

http://people.redhat.com/mingo/time-warp-test/time-warp-test.c

on your machine with all CPUs and HT enabled ?

Thanks,
	tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 18:49                 ` Thomas Gleixner
@ 2008-03-21 19:23                   ` Gabriel C
  2008-03-21 20:55                     ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-21 19:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Thomas Gleixner wrote:
> On Fri, 21 Mar 2008, Gabriel C wrote:
>>>> Also, can you please reapply the reverted clocksource patch ? I have
>>>> the feeling that the acpi_pm one was the real problem which was
>>>> triggered the modfied watchdog.
>>> Sure I can , will do so in some minutes and let you know.
>> It took a bit longer sorry but I have more infos now.
>>
>> The acpi_pm was not related to that I still get the problem.
>>
>> Of course I still can try to find the commit which magically fixed acpi_pm if you really want.
> 
> Just if you are really bored. :) I would have asked if it had fixed
> the TSC issue.
>  
>> It seems like it breaks only when you enable HT and only on 2 socket motherboards. 
>> ( at least the ones I own , I know is old hardware but worked fine for me )
> 
> Hmm. I wonder why a dual socket board survives the initial sync test.
>  
>> Also disabling the second CPU and enabling HT works , enabling both
>> CPUs and disabling HT works , booting with enabled HT and both CPUs
>> but maxcpus=2 also works , booting with 2 CPUs and HT on breaks ,
>> booting with both CPUs HT on but maxcpus=3 breaks also.
>>
>> Also I have another dual motherboard here 604 socket with 2 2,4 GHz
>> Xeon's.  The motherboard has the storage controller somewhat broken
>> but for a quick test it is still good :) and I see the same thing.
>>
>> Does that make any sense ?
> 
> Not really. Can you please revert the reverted revert again and run
> 
> http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
> 
> on your machine with all CPUs and HT enabled ?

Sure , doing so now.

> 
> Thanks,
> 	tglx


Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 19:23                   ` Gabriel C
@ 2008-03-21 20:55                     ` Gabriel C
  2008-03-21 21:15                       ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-21 20:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Gabriel C wrote:
> Thomas Gleixner wrote:
>> On Fri, 21 Mar 2008, Gabriel C wrote:
>>>>> Also, can you please reapply the reverted clocksource patch ? I have
>>>>> the feeling that the acpi_pm one was the real problem which was
>>>>> triggered the modfied watchdog.
>>>> Sure I can , will do so in some minutes and let you know.
>>> It took a bit longer sorry but I have more infos now.
>>>
>>> The acpi_pm was not related to that I still get the problem.
>>>
>>> Of course I still can try to find the commit which magically fixed acpi_pm if you really want.
>> Just if you are really bored. :) I would have asked if it had fixed
>> the TSC issue.
>>  
>>> It seems like it breaks only when you enable HT and only on 2 socket motherboards. 
>>> ( at least the ones I own , I know is old hardware but worked fine for me )
>> Hmm. I wonder why a dual socket board survives the initial sync test.
>>  
>>> Also disabling the second CPU and enabling HT works , enabling both
>>> CPUs and disabling HT works , booting with enabled HT and both CPUs
>>> but maxcpus=2 also works , booting with 2 CPUs and HT on breaks ,
>>> booting with both CPUs HT on but maxcpus=3 breaks also.
>>>
>>> Also I have another dual motherboard here 604 socket with 2 2,4 GHz
>>> Xeon's.  The motherboard has the storage controller somewhat broken
>>> but for a quick test it is still good :) and I see the same thing.
>>>
>>> Does that make any sense ?
>> Not really. Can you please revert the reverted revert again and run
>>
>> http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
>>
>> on your machine with all CPUs and HT enabled ?
> 
> Sure , doing so now.
> 

Here the result on 2.6.25-rc6-00243-g028011e ( it was running 30++ minutes the time I was away for food =) )

...

 4 CPUs, running 4 parallel test-tasks.
checking for time-warps via:
- read time stamp counter (RDTSC) instruction (cycle resolution)
- gettimeofday (TOD) syscall (usec resolution)
- clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)

| 1.46 us, TSC-warps:0 | 16.01 us, TOD-warps:0 | 16.10 us, CLOCK-warps:0

...

Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 20:55                     ` Gabriel C
@ 2008-03-21 21:15                       ` Thomas Gleixner
  2008-03-21 21:59                         ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-21 21:15 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Fri, 21 Mar 2008, Gabriel C wrote:
> >>> Does that make any sense ?
> >> Not really. Can you please revert the reverted revert again and run
> >>
> >> http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
> >>
> >> on your machine with all CPUs and HT enabled ?
> > 
> > Sure , doing so now.
> > 
> 
> Here the result on 2.6.25-rc6-00243-g028011e ( it was running 30++
> minutes the time I was away for food =) )
> ...
> 
>  4 CPUs, running 4 parallel test-tasks.
> checking for time-warps via:
> - read time stamp counter (RDTSC) instruction (cycle resolution)
> - gettimeofday (TOD) syscall (usec resolution)
> - clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)
> 
> | 1.46 us, TSC-warps:0 | 16.01 us, TOD-warps:0 | 16.10 us, CLOCK-warps:0

Amazing. I never found a multi socket box where the TSC's were in sync.

So the rotating watchdog triggers for a yet to figure out reason.

Oh, now that the pm timer seems to work again, can you try the following:

apply the reverted patch again and let the box boot. At some point the
TSC is marked unstable and is replaced by acpi_pm clocksource.

What result does timewarp.c show in that situation ?

Thanks,
	tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 21:15                       ` Thomas Gleixner
@ 2008-03-21 21:59                         ` Gabriel C
  2008-03-21 22:09                           ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-21 21:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Thomas Gleixner wrote:
> On Fri, 21 Mar 2008, Gabriel C wrote:
>>>>> Does that make any sense ?
>>>> Not really. Can you please revert the reverted revert again and run
>>>>
>>>> http://people.redhat.com/mingo/time-warp-test/time-warp-test.c
>>>>
>>>> on your machine with all CPUs and HT enabled ?
>>> Sure , doing so now.
>>>
>> Here the result on 2.6.25-rc6-00243-g028011e ( it was running 30++
>> minutes the time I was away for food =) )
>> ...
>>
>>  4 CPUs, running 4 parallel test-tasks.
>> checking for time-warps via:
>> - read time stamp counter (RDTSC) instruction (cycle resolution)
>> - gettimeofday (TOD) syscall (usec resolution)
>> - clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)
>>
>> | 1.46 us, TSC-warps:0 | 16.01 us, TOD-warps:0 | 16.10 us, CLOCK-warps:0
> 
> Amazing. I never found a multi socket box where the TSC's were in sync.
> 
> So the rotating watchdog triggers for a yet to figure out reason.
> 
> Oh, now that the pm timer seems to work again, can you try the following:
> 
> apply the reverted patch again and let the box boot. At some point the
> TSC is marked unstable and is replaced by acpi_pm clocksource.
> 
> What result does timewarp.c show in that situation ?

Here it is , same kernel + Andi's patch :

./time-warp-test
4 CPUs, running 4 parallel test-tasks.
checking for time-warps via:
- read time stamp counter (RDTSC) instruction (cycle resolution)
- gettimeofday (TOD) syscall (usec resolution)
- clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)

| 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0

> 
> Thanks,
> 	tglx


Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 21:59                         ` Gabriel C
@ 2008-03-21 22:09                           ` Thomas Gleixner
  2008-03-22 11:21                             ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-21 22:09 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Fri, 21 Mar 2008, Gabriel C wrote:
> > So the rotating watchdog triggers for a yet to figure out reason.
> > 
> > Oh, now that the pm timer seems to work again, can you try the following:
> > 
> > apply the reverted patch again and let the box boot. At some point the
> > TSC is marked unstable and is replaced by acpi_pm clocksource.
> > 
> > What result does timewarp.c show in that situation ?
> 
> Here it is , same kernel + Andi's patch :
> 
> ./time-warp-test
> 4 CPUs, running 4 parallel test-tasks.
> checking for time-warps via:
> - read time stamp counter (RDTSC) instruction (cycle resolution)
> - gettimeofday (TOD) syscall (usec resolution)
> - clock_gettime(CLOCK_MONOTONIC) syscall (nsec resolution)
> 
> | 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0

Ok. So the watchdog trigger is a false positive. 

Thinking more about it, it looks like Andi's change triggers some
hidden bug in the combination of NO_HZ and add_timer_on(), where the
CPU on which the timer is added is likely in a long idle sleep. I look
into this tomorrow.

Thanks for testing

       tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-21 22:09                           ` Thomas Gleixner
@ 2008-03-22 11:21                             ` Thomas Gleixner
  2008-03-22 13:34                               ` Gabriel C
  2008-03-22 14:25                               ` Andi Kleen
  0 siblings, 2 replies; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-22 11:21 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Fri, 21 Mar 2008, Thomas Gleixner wrote:
> > 
> > | 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0
> 
> Ok. So the watchdog trigger is a false positive. 
> 
> Thinking more about it, it looks like Andi's change triggers some
> hidden bug in the combination of NO_HZ and add_timer_on(), where the
> CPU on which the timer is added is likely in a long idle sleep. I look
> into this tomorrow.

Ok. Here is what's happening:

CPU0 runs the watchdog timer and schedules it on CPU1.

With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
boot process there is probably no timer pending on CPU1, which means
the idle sleep is infinite.

Now some time later CPU1 gets woken by an interrupt/IPI and runs the
timer wheel. At this point the pm_timer which is the reference clock
has already wrapped around, so the watchdog thinks that there is a
huge time difference and marks the TSC unstable.

Aside of that watchdog issue this also affects the other users of
add_timer_on(): e.g. queue_delayed_work_on().

Can you please apply the patch below and verify it with Andi's
watchdog patch applied ? 

Thanks,

	tglx

---
 include/linux/tick.h     |    4 ++++
 kernel/time/tick-sched.c |   30 ++++++++++++++++++++++++++++++
 kernel/timer.c           |   14 +++++++++++++-
 3 files changed, 47 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/tick.h
===================================================================
--- linux-2.6.orig/include/linux/tick.h
+++ linux-2.6/include/linux/tick.h
@@ -111,6 +111,8 @@ extern void tick_nohz_update_jiffies(voi
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern void tick_nohz_stop_idle(int cpu);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
+extern int tick_nohz_cpu_needs_wakeup(int cpu);
+extern void tick_nohz_rescan_timers_on(int cpu);
 # else
 static inline void tick_nohz_stop_sched_tick(void) { }
 static inline void tick_nohz_restart_sched_tick(void) { }
@@ -123,6 +125,8 @@ static inline ktime_t tick_nohz_get_slee
 }
 static inline void tick_nohz_stop_idle(int cpu) { }
 static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return 0; }
+static inline int tick_nohz_cpu_needs_wakeup(int cpu) { return 0; }
+static inline void tick_nohz_rescan_timers_on(int cpu) { }
 # endif /* !NO_HZ */
 
 #endif
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -183,6 +183,36 @@ u64 get_cpu_idle_time_us(int cpu, u64 *l
 }
 
 /**
+ * tick_nohz_cpu_needs_wakeup - check possible wakeup of cpu in add_timer_on()
+ *
+ * when add_timer_on() happens on a CPU which is in a long idle sleep,
+ * then we need to wake it up so the timer wheel gets reevaluated.
+ *
+ * Note: we use idle_cpu() which checks the idle state lockless, but
+ * we are ordered against the other cpu which might be on the way to
+ * idle by the timer base lock, which we hold.
+ */
+int tick_nohz_cpu_needs_wakeup(int cpu)
+{
+	return tick_nohz_enabled && idle_cpu(cpu) &&
+		(cpu != smp_processor_id());
+}
+
+/**
+ * tick_nohz_rescan_timers_on - reevaluate the idle sleep time of a CPU
+ *
+ * When a CPU is idle and a timer got added to this CPU timer wheel
+ * via add_timer_on() then we need to make sure that the CPU
+ * reevaluates the timer wheel. Otherwise the timer might be delayed
+ * for a real long time.
+ */
+void tick_nohz_rescan_timers_on(int cpu)
+{
+	if (tick_nohz_enabled && idle_cpu(cpu))
+		smp_send_reschedule(cpu);
+}
+
+/**
  * tick_nohz_stop_sched_tick - stop the idle tick from the idle task
  *
  * When the next event is more than a tick into the future, stop the idle tick
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -445,15 +445,27 @@ void add_timer_on(struct timer_list *tim
 {
 	struct tvec_base *base = per_cpu(tvec_bases, cpu);
 	unsigned long flags;
+	int wakeidle;
 
 	timer_stats_timer_set_start_info(timer);
 	BUG_ON(timer_pending(timer) || !timer->function);
 	spin_lock_irqsave(&base->lock, flags);
 	timer_set_base(timer, base);
 	internal_add_timer(base, timer);
+	/*
+	 * Check whether the other CPU is idle and needs to be
+	 * triggered to reevaluate the timer wheel when nohz is
+	 * active. We are protected against the other CPU fiddling
+	 * with the timer by holding the timer base lock. This also
+	 * makes sure that a CPU on the way to idle can not evaluate
+	 * the timer wheel.
+	 */
+	wakeidle = tick_nohz_cpu_needs_wakeup(cpu);
 	spin_unlock_irqrestore(&base->lock, flags);
-}
 
+	if (wakeidle)
+		tick_nohz_rescan_timers_on(cpu);
+}
 
 /**
  * mod_timer - modify a timer's timeout

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-22 11:21                             ` Thomas Gleixner
@ 2008-03-22 13:34                               ` Gabriel C
  2008-03-22 14:30                                 ` Thomas Gleixner
  2008-03-22 14:25                               ` Andi Kleen
  1 sibling, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-22 13:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Thomas Gleixner wrote:
> On Fri, 21 Mar 2008, Thomas Gleixner wrote:
>>> | 1.78 us, TSC-warps:0 | 19.27 us, TOD-warps:0 | 19.37 us, CLOCK-warps:0
>> Ok. So the watchdog trigger is a false positive. 
>>
>> Thinking more about it, it looks like Andi's change triggers some
>> hidden bug in the combination of NO_HZ and add_timer_on(), where the
>> CPU on which the timer is added is likely in a long idle sleep. I look
>> into this tomorrow.
> 
> Ok. Here is what's happening:
> 
> CPU0 runs the watchdog timer and schedules it on CPU1.
> 
> With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
> boot process there is probably no timer pending on CPU1, which means
> the idle sleep is infinite.
> 
> Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> timer wheel. At this point the pm_timer which is the reference clock
> has already wrapped around, so the watchdog thinks that there is a
> huge time difference and marks the TSC unstable.
> 
> Aside of that watchdog issue this also affects the other users of
> add_timer_on(): e.g. queue_delayed_work_on().
> 
> Can you please apply the patch below and verify it with Andi's
> watchdog patch applied ? 


Did that , git head , Andi's + your patch but TSC is still marked unstable.

> 
> Thanks,
> 
> 	tglx
> 


Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-22 11:21                             ` Thomas Gleixner
  2008-03-22 13:34                               ` Gabriel C
@ 2008-03-22 14:25                               ` Andi Kleen
  2008-03-22 14:41                                 ` Thomas Gleixner
  1 sibling, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-03-22 14:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk,
	Andrew Morton, Linus Torvalds, Natalie Protasevich, andi-bz,
	Ingo Molnar

> CPU0 runs the watchdog timer and schedules it on CPU1.
> 
> With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
> boot process there is probably no timer pending on CPU1, which means
> the idle sleep is infinite.
> 
> Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> timer wheel. At this point the pm_timer which is the reference clock
> has already wrapped around, so the watchdog thinks that there is a

In my old original own noidletick code I simply limited all sleeps 
to below the wrap around of the primary timer.  Wouldn't something 
like that work?

In the case of the watchdog i guess it would need to be limited
to the wrap around of multiple timers, at least all that 
are used by the watchdog.

I'm not sure just doing this for add_timer_on() only is correct.
After all it could affect any other code not run by add_timer_on()
couldn't it?

-Andi


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-22 13:34                               ` Gabriel C
@ 2008-03-22 14:30                                 ` Thomas Gleixner
  2008-03-22 15:13                                   ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-22 14:30 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Sat, 22 Mar 2008, Gabriel C wrote:
 > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> > timer wheel. At this point the pm_timer which is the reference clock
> > has already wrapped around, so the watchdog thinks that there is a
> > huge time difference and marks the TSC unstable.
> > 
> > Aside of that watchdog issue this also affects the other users of
> > add_timer_on(): e.g. queue_delayed_work_on().
> > 
> > Can you please apply the patch below and verify it with Andi's
> > watchdog patch applied ? 
> 
> 
> Did that , git head , Andi's + your patch but TSC is still marked unstable.

Doh, stupid me. We do not reevaluate the timer wheel, when we just
wake up via the smp_reschedule IPI when the resched flag on the other
CPU is not set. That's a separate vector which is not going through
irq_enter() / irq_exit(). 

Does the patch below solve the problem ?

Thanks,

	tglx

---
 include/linux/tick.h     |    4 +++
 kernel/time/tick-sched.c |   50 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/timer.c           |   14 ++++++++++++-
 3 files changed, 67 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/tick.h
===================================================================
--- linux-2.6.orig/include/linux/tick.h
+++ linux-2.6/include/linux/tick.h
@@ -111,6 +111,8 @@ extern void tick_nohz_update_jiffies(voi
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern void tick_nohz_stop_idle(int cpu);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
+extern int tick_nohz_cpu_needs_wakeup(int cpu);
+extern void tick_nohz_rescan_timers_on(int cpu);
 # else
 static inline void tick_nohz_stop_sched_tick(void) { }
 static inline void tick_nohz_restart_sched_tick(void) { }
@@ -123,6 +125,8 @@ static inline ktime_t tick_nohz_get_slee
 }
 static inline void tick_nohz_stop_idle(int cpu) { }
 static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return 0; }
+static inline int tick_nohz_cpu_needs_wakeup(int cpu) { return 0; }
+static inline void tick_nohz_rescan_timers_on(int cpu) { }
 # endif /* !NO_HZ */
 
 #endif
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -183,6 +183,56 @@ u64 get_cpu_idle_time_us(int cpu, u64 *l
 }
 
 /**
+ * tick_nohz_cpu_needs_wakeup - check possible wakeup of cpu in add_timer_on()
+ *
+ * when add_timer_on() happens on a CPU which is in a long idle sleep,
+ * then we need to wake it up so the timer wheel gets reevaluated.
+ *
+ * Note: we use idle_cpu() which checks the idle state lockless, but
+ * we are ordered against the other cpu which might be on the way to
+ * idle by the timer base lock, which we hold.
+ */
+int tick_nohz_cpu_needs_wakeup(int cpu)
+{
+	return tick_nohz_enabled && idle_cpu(cpu) &&
+		(cpu != smp_processor_id());
+}
+
+/*
+ * Rescan the timer wheel, when
+ *
+ * - the CPU is idle
+ * - the CPU is not processing an interupt
+ * - the need_resched flag is off
+ */
+static void tick_nohz_rescan_timers(void *unused)
+{
+	int cpu = smp_processor_id();
+
+	if (!idle_cpu(cpu) || in_interrupt() || need_resched())
+		return;
+
+	tick_nohz_stop_idle(cpu);
+	tick_nohz_update_jiffies();
+	tick_nohz_stop_sched_tick();
+}
+
+/**
+ * tick_nohz_rescan_timers_on - reevaluate the idle sleep time of a CPU
+ *
+ * When a CPU is idle and a timer got added to this CPU timer wheel
+ * via add_timer_on() then we need to make sure that the CPU
+ * reevaluates the timer wheel. Otherwise the timer might be delayed
+ * for a real long time.
+ */
+void tick_nohz_rescan_timers_on(int cpu)
+{
+	if (tick_nohz_enabled && idle_cpu(cpu))
+		smp_call_function_single(cpu, tick_nohz_rescan_timers, NULL,
+					 0, 0);
+}
+
+/**
  * tick_nohz_stop_sched_tick - stop the idle tick from the idle task
  *
  * When the next event is more than a tick into the future, stop the idle tick
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -445,15 +445,27 @@ void add_timer_on(struct timer_list *tim
 {
 	struct tvec_base *base = per_cpu(tvec_bases, cpu);
 	unsigned long flags;
+	int wakeidle;
 
 	timer_stats_timer_set_start_info(timer);
 	BUG_ON(timer_pending(timer) || !timer->function);
 	spin_lock_irqsave(&base->lock, flags);
 	timer_set_base(timer, base);
 	internal_add_timer(base, timer);
+	/*
+	 * Check whether the other CPU is idle and needs to be
+	 * triggered to reevaluate the timer wheel when nohz is
+	 * active. We are protected against the other CPU fiddling
+	 * with the timer by holding the timer base lock. This also
+	 * makes sure that a CPU on the way to idle can not evaluate
+	 * the timer wheel.
+	 */
+	wakeidle = tick_nohz_cpu_needs_wakeup(cpu);
 	spin_unlock_irqrestore(&base->lock, flags);
-}
 
+	if (wakeidle)
+		tick_nohz_rescan_timers_on(cpu);
+}
 
 /**
  * mod_timer - modify a timer's timeout

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-22 14:25                               ` Andi Kleen
@ 2008-03-22 14:41                                 ` Thomas Gleixner
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-22 14:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Gabriel C, Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk,
	Andrew Morton, Linus Torvalds, Natalie Protasevich, andi-bz,
	Ingo Molnar

On Sat, 22 Mar 2008, Andi Kleen wrote:
> > CPU0 runs the watchdog timer and schedules it on CPU1.
> > 
> > With NO_HZ enabled CPU1 is in a long idle sleep. At this point of the
> > boot process there is probably no timer pending on CPU1, which means
> > the idle sleep is infinite.
> > 
> > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> > timer wheel. At this point the pm_timer which is the reference clock
> > has already wrapped around, so the watchdog thinks that there is a
> 
> In my old original own noidletick code I simply limited all sleeps 
> to below the wrap around of the primary timer.  Wouldn't something 
> like that work?

No, it does not solve the real problem of not reevaluating the timer
wheel on the idle CPU when a timer gets added from some other CPU. We
would paper over the watchdog issue, but postponing a timer event,
which was added cross CPU to some artifical expiry time is simply
wrong.

> I'm not sure just doing this for add_timer_on() only is correct.
> After all it could affect any other code not run by add_timer_on()
> couldn't it?

No, it's limited to add_timer_on() simply because no other code can
add a new timer (timer_list or hrtimer) which modifies the next event
on another CPU. There is also the rare case, when one CPU runs the
timer callback and the other one modifies the timer, but that's not
relevant for the NOHZ problem because the CPU which runs the callback
is not idle at this point.

All other timer operations are CPU local and reevaluated before the
CPU goes idle again.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-22 14:30                                 ` Thomas Gleixner
@ 2008-03-22 15:13                                   ` Gabriel C
  2008-03-22 16:32                                     ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-22 15:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Thomas Gleixner wrote:
> On Sat, 22 Mar 2008, Gabriel C wrote:
>  > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
>>> timer wheel. At this point the pm_timer which is the reference clock
>>> has already wrapped around, so the watchdog thinks that there is a
>>> huge time difference and marks the TSC unstable.
>>>
>>> Aside of that watchdog issue this also affects the other users of
>>> add_timer_on(): e.g. queue_delayed_work_on().
>>>
>>> Can you please apply the patch below and verify it with Andi's
>>> watchdog patch applied ? 
>>
>> Did that , git head , Andi's + your patch but TSC is still marked unstable.
> 
> Doh, stupid me. We do not reevaluate the timer wheel, when we just
> wake up via the smp_reschedule IPI when the resched flag on the other
> CPU is not set. That's a separate vector which is not going through
> irq_enter() / irq_exit(). 
> 
> Does the patch below solve the problem ?

With this one TSC is fine but now I get a warning on boot :

..

[    0.041037] ------------[ cut here ]------------
[    0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
[    0.041074] Modules linked in:
[    0.041087] Pid: 1, comm: swapper Not tainted 2.6.25-rc6-00243-g028011e-dirty #12
[    0.041107]  [<c011b51c>] warn_on_slowpath+0x40/0x65
[    0.041128]  [<c012b543>] autoremove_wake_function+0xd/0x2d
[    0.041148]  [<c033f28b>] schedule_timeout+0x13/0x99
[    0.041167]  [<c011690d>] __wake_up+0x29/0x39
[    0.041182]  [<c011690d>] __wake_up+0x29/0x39
[    0.041197]  [<c0128769>] call_usermodehelper_exec+0x97/0xa2
[    0.041214]  [<c010dff4>] native_smp_call_function_mask+0x23/0x11e
[    0.041233]  [<c01d4a66>] kobject_uevent_env+0x346/0x368
[    0.041251]  [<c010e46d>] smp_call_function_single+0x50/0x6f
[    0.041268]  [<c01336d2>] tick_nohz_rescan_timers_on+0x27/0x2b
[    0.041287]  [<c013109f>] clocksource_register+0x162/0x174
[    0.041306]  [<c0436203>] kernel_init+0x126/0x25e
[    0.041322]  [<c011943d>] schedule_tail+0x17/0x44
[    0.041337]  [<c0103c7a>] ret_from_fork+0x6/0x1c
[    0.041353]  [<c04360dd>] kernel_init+0x0/0x25e
[    0.041367]  [<c04360dd>] kernel_init+0x0/0x25e
[    0.041381]  [<c01049ab>] kernel_thread_helper+0x7/0x10
[    0.041397]  =======================
[    0.041417] ---[ end trace ca143223eefdc828 ]---     

..

Full dmesg there ->    http://frugalware.org/~crazy/dmesg/dmesg_tsc  

                                                                                                
>
> Thanks,
> 
> 	tglx
> 

Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-22 15:13                                   ` Gabriel C
@ 2008-03-22 16:32                                     ` Thomas Gleixner
  2008-03-22 21:55                                       ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-22 16:32 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Sat, 22 Mar 2008, Gabriel C wrote:
> Thomas Gleixner wrote:
> > On Sat, 22 Mar 2008, Gabriel C wrote:
> >  > Now some time later CPU1 gets woken by an interrupt/IPI and runs the
> >>> timer wheel. At this point the pm_timer which is the reference clock
> >>> has already wrapped around, so the watchdog thinks that there is a
> >>> huge time difference and marks the TSC unstable.
> >>>
> >>> Aside of that watchdog issue this also affects the other users of
> >>> add_timer_on(): e.g. queue_delayed_work_on().
> >>>
> >>> Can you please apply the patch below and verify it with Andi's
> >>> watchdog patch applied ? 
> >>
> >> Did that , git head , Andi's + your patch but TSC is still marked unstable.
> > 
> > Doh, stupid me. We do not reevaluate the timer wheel, when we just
> > wake up via the smp_reschedule IPI when the resched flag on the other
> > CPU is not set. That's a separate vector which is not going through
> > irq_enter() / irq_exit(). 
> > 
> > Does the patch below solve the problem ?
> 
> With this one TSC is fine but now I get a warning on boot :

Good. It confirms my assumptions about the root cause.

> [    0.041037] ------------[ cut here ]------------
> [    0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()

Grr. I'll work out a solution for that one.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-22 16:32                                     ` Thomas Gleixner
@ 2008-03-22 21:55                                       ` Thomas Gleixner
  2008-03-22 22:41                                         ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-22 21:55 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Sat, 22 Mar 2008, Thomas Gleixner wrote:
> On Sat, 22 Mar 2008, Gabriel C wrote:
> > With this one TSC is fine but now I get a warning on boot :
> 
> Good. It confirms my assumptions about the root cause.
> 
> > [    0.041037] ------------[ cut here ]------------
> > [    0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
> 
> Grr. I'll work out a solution for that one.

Gabriel,

I'm happy to rack your nerves some more.

After discussing the issue with Peter and Ingo the following solution
seems to be the one which is the least intrusive. 

Can you please give it a test ride ?

Thanks,

	tglx
---
 include/linux/sched.h |    6 ++++++
 kernel/sched.c        |   42 ++++++++++++++++++++++++++++++++++++++++++
 kernel/timer.c        |   10 +++++++++-
 3 files changed, 57 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1541,6 +1541,12 @@ static inline void idle_task_exit(void) 
 
 extern void sched_idle_next(void);
 
+#ifdef CONFIG_NO_HZ
+extern void wake_up_idle_cpu(int cpu);
+#else
+static inline void wake_up_idle_cpu(int cpu) { }
+#endif
+
 #ifdef CONFIG_SCHED_DEBUG
 extern unsigned int sysctl_sched_latency;
 extern unsigned int sysctl_sched_min_granularity;
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -848,6 +848,48 @@ static inline void resched_task(struct t
 	__resched_task(p, TIF_NEED_RESCHED);
 }
 
+#ifdef CONFIG_NO_HZ
+/*
+ * When add_timer_on() enqueues a timer into the timer wheel of an
+ * idle CPU then this timer might expire before the next timer event
+ * which is scheduled to wake up that CPU. In case of a completely
+ * idle system the next event might even be infinite time into the
+ * future. wake_up_idle_cpu() ensures that the CPU is woken up and
+ * leaves the inner idle loop so the newle added timer is taken into
+ * account when the CPU goes back to idle and evaluates the timer
+ * wheel for the next timer event.
+ */
+void wake_up_idle_cpu(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	if (cpu == smp_processor_id())
+		return;
+
+	/*
+	 * This is safe, as this function is called with the timer
+	 * wheel base lock of (cpu) held. When the CPU is on the way
+	 * to idle and has not yet set rq->curr to idle then it will
+	 * be serialized on the timer wheel base lock and take the new
+	 * timer into account automatically.
+	 */
+	if (rq->curr != rq->idle)
+		return;
+
+	/*
+	 * We can set TIF_RESCHED on the idle task of the other CPU
+	 * lockless. The worst case is that the other CPU runs the
+	 * idle task through an additional NOOP schedule()
+	 */
+	set_tsk_thread_flag(rq->idle, TIF_NEED_RESCHED);
+
+	/* NEED_RESCHED must be visible before we test polling */
+	smp_mb();
+	if (!tsk_is_polling(rq->idle))
+		smp_send_reschedule(cpu);
+}
+#endif
+
 #ifdef CONFIG_SCHED_HRTICK
 /*
  * Use HR-timers to deliver accurate preemption points.
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -451,10 +451,18 @@ void add_timer_on(struct timer_list *tim
 	spin_lock_irqsave(&base->lock, flags);
 	timer_set_base(timer, base);
 	internal_add_timer(base, timer);
+	/*
+	 * Check whether the other CPU is idle and needs to be
+	 * triggered to reevaluate the timer wheel when nohz is
+	 * active. We are protected against the other CPU fiddling
+	 * with the timer by holding the timer base lock. This also
+	 * makes sure that a CPU on the way to idle can not evaluate
+	 * the timer wheel.
+	 */
+	wake_up_idle_cpu(cpu);
 	spin_unlock_irqrestore(&base->lock, flags);
 }
 
-
 /**
  * mod_timer - modify a timer's timeout
  * @timer: the timer to be modified

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-22 21:55                                       ` Thomas Gleixner
@ 2008-03-22 22:41                                         ` Gabriel C
  2008-03-23 11:00                                           ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-22 22:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Thomas Gleixner wrote:
> On Sat, 22 Mar 2008, Thomas Gleixner wrote:
>> On Sat, 22 Mar 2008, Gabriel C wrote:
>>> With this one TSC is fine but now I get a warning on boot :
>> Good. It confirms my assumptions about the root cause.
>>
>>> [    0.041037] ------------[ cut here ]------------
>>> [    0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
>> Grr. I'll work out a solution for that one.
> 
> Gabriel,
> 
> I'm happy to rack your nerves some more.

No worries :) 

> 
> After discussing the issue with Peter and Ingo the following solution
> seems to be the one which is the least intrusive. 
> 
> Can you please give it a test ride ?

Done , git head + Andi's patch + this version of your patch does work here.

Also time-warp-test is just fine and everything else seems to work.


> ---
>  include/linux/sched.h |    6 ++++++
>  kernel/sched.c        |   42 ++++++++++++++++++++++++++++++++++++++++++
>  kernel/timer.c        |   10 +++++++++-
>  3 files changed, 57 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/include/linux/sched.h
> ===================================================================
> --- linux-2.6.orig/include/linux/sched.h
> +++ linux-2.6/include/linux/sched.h
> @@ -1541,6 +1541,12 @@ static inline void idle_task_exit(void) 
>  
>  extern void sched_idle_next(void);
>  
> +#ifdef CONFIG_NO_HZ
> +extern void wake_up_idle_cpu(int cpu);
> +#else
> +static inline void wake_up_idle_cpu(int cpu) { }
> +#endif
> +
>  #ifdef CONFIG_SCHED_DEBUG
>  extern unsigned int sysctl_sched_latency;
>  extern unsigned int sysctl_sched_min_granularity;
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -848,6 +848,48 @@ static inline void resched_task(struct t
>  	__resched_task(p, TIF_NEED_RESCHED);
>  }
>  
> +#ifdef CONFIG_NO_HZ
> +/*
> + * When add_timer_on() enqueues a timer into the timer wheel of an
> + * idle CPU then this timer might expire before the next timer event
> + * which is scheduled to wake up that CPU. In case of a completely
> + * idle system the next event might even be infinite time into the
> + * future. wake_up_idle_cpu() ensures that the CPU is woken up and
> + * leaves the inner idle loop so the newle added timer is taken into
> + * account when the CPU goes back to idle and evaluates the timer
> + * wheel for the next timer event.
> + */
> +void wake_up_idle_cpu(int cpu)
> +{
> +	struct rq *rq = cpu_rq(cpu);
> +
> +	if (cpu == smp_processor_id())
> +		return;
> +
> +	/*
> +	 * This is safe, as this function is called with the timer
> +	 * wheel base lock of (cpu) held. When the CPU is on the way
> +	 * to idle and has not yet set rq->curr to idle then it will
> +	 * be serialized on the timer wheel base lock and take the new
> +	 * timer into account automatically.
> +	 */
> +	if (rq->curr != rq->idle)
> +		return;
> +
> +	/*
> +	 * We can set TIF_RESCHED on the idle task of the other CPU
> +	 * lockless. The worst case is that the other CPU runs the
> +	 * idle task through an additional NOOP schedule()
> +	 */
> +	set_tsk_thread_flag(rq->idle, TIF_NEED_RESCHED);
> +
> +	/* NEED_RESCHED must be visible before we test polling */
> +	smp_mb();
> +	if (!tsk_is_polling(rq->idle))
> +		smp_send_reschedule(cpu);
> +}
> +#endif
> +
>  #ifdef CONFIG_SCHED_HRTICK
>  /*
>   * Use HR-timers to deliver accurate preemption points.
> Index: linux-2.6/kernel/timer.c
> ===================================================================
> --- linux-2.6.orig/kernel/timer.c
> +++ linux-2.6/kernel/timer.c
> @@ -451,10 +451,18 @@ void add_timer_on(struct timer_list *tim
>  	spin_lock_irqsave(&base->lock, flags);
>  	timer_set_base(timer, base);
>  	internal_add_timer(base, timer);
> +	/*
> +	 * Check whether the other CPU is idle and needs to be
> +	 * triggered to reevaluate the timer wheel when nohz is
> +	 * active. We are protected against the other CPU fiddling
> +	 * with the timer by holding the timer base lock. This also
> +	 * makes sure that a CPU on the way to idle can not evaluate
> +	 * the timer wheel.
> +	 */
> +	wake_up_idle_cpu(cpu);
>  	spin_unlock_irqrestore(&base->lock, flags);
>  }
>  
> -
>  /**
>   * mod_timer - modify a timer's timeout
>   * @timer: the timer to be modified

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-22 22:41                                         ` Gabriel C
@ 2008-03-23 11:00                                           ` Gabriel C
  2008-03-23 23:31                                             ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-23 11:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Gabriel C wrote:
> Thomas Gleixner wrote:
>> On Sat, 22 Mar 2008, Thomas Gleixner wrote:
>>> On Sat, 22 Mar 2008, Gabriel C wrote:
>>>> With this one TSC is fine but now I get a warning on boot :
>>> Good. It confirms my assumptions about the root cause.
>>>
>>>> [    0.041037] ------------[ cut here ]------------
>>>> [    0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
>>> Grr. I'll work out a solution for that one.
>> Gabriel,
>>
>> I'm happy to rack your nerves some more.
> 
> No worries :) 
> 
>> After discussing the issue with Peter and Ingo the following solution
>> seems to be the one which is the least intrusive. 
>>
>> Can you please give it a test ride ?
> 
> Done , git head + Andi's patch + this version of your patch does work here.
> 
> Also time-warp-test is just fine and everything else seems to work.

Also I've tested with my other motherboard and is fine too :)

Feel free to add my Tested-by when you push this patch.

> 
> 
>> ---
>>  include/linux/sched.h |    6 ++++++
>>  kernel/sched.c        |   42 ++++++++++++++++++++++++++++++++++++++++++
>>  kernel/timer.c        |   10 +++++++++-
>>  3 files changed, 57 insertions(+), 1 deletion(-)
>>
>> Index: linux-2.6/include/linux/sched.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/sched.h
>> +++ linux-2.6/include/linux/sched.h
>> @@ -1541,6 +1541,12 @@ static inline void idle_task_exit(void) 
>>  
>>  extern void sched_idle_next(void);
>>  
>> +#ifdef CONFIG_NO_HZ
>> +extern void wake_up_idle_cpu(int cpu);
>> +#else
>> +static inline void wake_up_idle_cpu(int cpu) { }
>> +#endif
>> +
>>  #ifdef CONFIG_SCHED_DEBUG
>>  extern unsigned int sysctl_sched_latency;
>>  extern unsigned int sysctl_sched_min_granularity;
>> Index: linux-2.6/kernel/sched.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/sched.c
>> +++ linux-2.6/kernel/sched.c
>> @@ -848,6 +848,48 @@ static inline void resched_task(struct t
>>  	__resched_task(p, TIF_NEED_RESCHED);
>>  }
>>  
>> +#ifdef CONFIG_NO_HZ
>> +/*
>> + * When add_timer_on() enqueues a timer into the timer wheel of an
>> + * idle CPU then this timer might expire before the next timer event
>> + * which is scheduled to wake up that CPU. In case of a completely
>> + * idle system the next event might even be infinite time into the
>> + * future. wake_up_idle_cpu() ensures that the CPU is woken up and
>> + * leaves the inner idle loop so the newle added timer is taken into
>> + * account when the CPU goes back to idle and evaluates the timer
>> + * wheel for the next timer event.
>> + */
>> +void wake_up_idle_cpu(int cpu)
>> +{
>> +	struct rq *rq = cpu_rq(cpu);
>> +
>> +	if (cpu == smp_processor_id())
>> +		return;
>> +
>> +	/*
>> +	 * This is safe, as this function is called with the timer
>> +	 * wheel base lock of (cpu) held. When the CPU is on the way
>> +	 * to idle and has not yet set rq->curr to idle then it will
>> +	 * be serialized on the timer wheel base lock and take the new
>> +	 * timer into account automatically.
>> +	 */
>> +	if (rq->curr != rq->idle)
>> +		return;
>> +
>> +	/*
>> +	 * We can set TIF_RESCHED on the idle task of the other CPU
>> +	 * lockless. The worst case is that the other CPU runs the
>> +	 * idle task through an additional NOOP schedule()
>> +	 */
>> +	set_tsk_thread_flag(rq->idle, TIF_NEED_RESCHED);
>> +
>> +	/* NEED_RESCHED must be visible before we test polling */
>> +	smp_mb();
>> +	if (!tsk_is_polling(rq->idle))
>> +		smp_send_reschedule(cpu);
>> +}
>> +#endif
>> +
>>  #ifdef CONFIG_SCHED_HRTICK
>>  /*
>>   * Use HR-timers to deliver accurate preemption points.
>> Index: linux-2.6/kernel/timer.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/timer.c
>> +++ linux-2.6/kernel/timer.c
>> @@ -451,10 +451,18 @@ void add_timer_on(struct timer_list *tim
>>  	spin_lock_irqsave(&base->lock, flags);
>>  	timer_set_base(timer, base);
>>  	internal_add_timer(base, timer);
>> +	/*
>> +	 * Check whether the other CPU is idle and needs to be
>> +	 * triggered to reevaluate the timer wheel when nohz is
>> +	 * active. We are protected against the other CPU fiddling
>> +	 * with the timer by holding the timer base lock. This also
>> +	 * makes sure that a CPU on the way to idle can not evaluate
>> +	 * the timer wheel.
>> +	 */
>> +	wake_up_idle_cpu(cpu);
>>  	spin_unlock_irqrestore(&base->lock, flags);
>>  }
>>  
>> -
>>  /**
>>   * mod_timer - modify a timer's timeout
>>   * @timer: the timer to be modified
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-16 23:18 2.6.25-rc5-git6: Reported regressions from 2.6.24 Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2008-03-17  6:47 ` Jason Wu
@ 2008-03-23 19:01 ` Christian Kujau
  2008-03-23 19:06   ` Rafael J. Wysocki
  2008-03-23 21:17 ` Christian Kujau
  4 siblings, 1 reply; 41+ messages in thread
From: Christian Kujau @ 2008-03-23 19:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich

Hi Rafael,

On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10207
> Subject		: INFO: task mount:11202 blocked for more than 120 seconds
> Submitter	: Christian Kujau <lists@nerdbynature.de>
> Date		: 2008-03-07 21:32 (10 days old)
> References	: http://lkml.org/lkml/2008/3/7/308
> 		  http://lkml.org/lkml/2008/3/9/186
>

The other Christian reported this as fixed: http://lkml.org/lkml/2008/3/17/232
I too can confirm that the hangs are gone now: http://lkml.org/lkml/2008/3/21/532

Thanks for maintaining the regression list,
Christian.
-- 
BOFH excuse #91:

Mouse chewed through power cable

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-23 19:01 ` Christian Kujau
@ 2008-03-23 19:06   ` Rafael J. Wysocki
  2008-03-23 19:40     ` Chr
  0 siblings, 1 reply; 41+ messages in thread
From: Rafael J. Wysocki @ 2008-03-23 19:06 UTC (permalink / raw)
  To: Christian Kujau
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich

On Sunday, 23 of March 2008, Christian Kujau wrote:
> Hi Rafael,
> 
> On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10207
> > Subject		: INFO: task mount:11202 blocked for more than 120 seconds
> > Submitter	: Christian Kujau <lists@nerdbynature.de>
> > Date		: 2008-03-07 21:32 (10 days old)
> > References	: http://lkml.org/lkml/2008/3/7/308
> > 		  http://lkml.org/lkml/2008/3/9/186
> >
> 
> The other Christian reported this as fixed: http://lkml.org/lkml/2008/3/17/232
> I too can confirm that the hangs are gone now: http://lkml.org/lkml/2008/3/21/532

Is the patch present in the mainline yet?

> Thanks for maintaining the regression list,

You're welcome. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-23 19:06   ` Rafael J. Wysocki
@ 2008-03-23 19:40     ` Chr
  0 siblings, 0 replies; 41+ messages in thread
From: Chr @ 2008-03-23 19:40 UTC (permalink / raw)
  To: Rafael J. Wysocki, Milan Broz, dm-crypt, Alasdair G Kergon
  Cc: Christian Kujau, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich

On Sunday 23 March 2008 20:06:56 Rafael J. Wysocki wrote:
> > > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10207
> > > Subject		: INFO: task mount:11202 blocked for more than 120 seconds
> > > Submitter	: Christian Kujau <lists@nerdbynature.de>
> > > Date		: 2008-03-07 21:32 (10 days old)
> > > References	: http://lkml.org/lkml/2008/3/7/308
> > > 		  http://lkml.org/lkml/2008/3/9/186
> >
> > The other Christian reported this as fixed:
> > http://lkml.org/lkml/2008/3/17/232 I too can confirm that the hangs are
> > gone now: http://lkml.org/lkml/2008/3/21/532
>
> Is the patch present in the mainline yet?
No... it isn't in the mainline?! (or was is commited as I wrote this mail?!) 
anyway, can someone please merge the patch there?

http://lkml.org/lkml/2008/3/17/214

Regards,
	Christian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-16 23:18 2.6.25-rc5-git6: Reported regressions from 2.6.24 Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2008-03-23 19:01 ` Christian Kujau
@ 2008-03-23 21:17 ` Christian Kujau
  2008-03-23 21:29   ` Rafael J. Wysocki
  4 siblings, 1 reply; 41+ messages in thread
From: Christian Kujau @ 2008-03-23 21:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich

On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9983
> Subject		: PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> Submitter	: Linas Žvirblis <0x0007@gmail.com>
> Date		: 2008-02-13 22:38 (33 days old)
> References	: http://lkml.org/lkml/2008/2/13/566

Linas did not respond any more, and you closed the bug :)

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10051
> Subject		: Spurious messages at boot, eventually hangs the usb subsustem
> Submitter	: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
> Date		: 2008-02-20 09:10 (26 days old)

Hm, Jean-Luc said:
>  ------- Comment  #4 From Jean-Luc Coulon  2008-03-09 22:50:19   ----
>  BTW, I can normally boot my system since rc4

Close?

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10086
> Subject		: 2.6.25-rc2 + smartd = hang
> Submitter	: Anders Eriksson <aeriksson@fastmail.fm>
> Date		: 2008-02-22 17:51 (24 days old)
> References	: http://lkml.org/lkml/2008/2/22/239
> Handled-By	: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

http://bugzilla.kernel.org/show_bug.cgi?id=10086#c5 says it's fixed.


> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10133
> Subject		: INFO: possible circular locking in the resume
> Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
> Date		: 2008-02-27 (19 days old)
> References	: http://lkml.org/lkml/2008/2/26/479
> Handled-By	: Gautham R Shenoy <ego@in.ibm.com>

Gautham said on 2008-02-28 he has a patch - but did not post it. What now?


> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10146
> Subject		: 2.6.25-rc: complete lockup on boot/start of X (bisected)
> Submitter	: Marcin Slusarz <marcin.slusarz@gmail.com>
> Date		: 2008-03-02 20:00 (15 days old)
> References	: http://lkml.org/lkml/2008/3/2/91
> Handled-By	: Peter Zijlstra <a.p.zijlstra@chello.nl>

Seems to be fixed: http://lkml.org/lkml/2008/3/23/275

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10152
> Subject		: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> Submitter	: Gabriel C <nix.or.die@googlemail.com>
> Date		: 2008-02-24 01:31 (22 days old)
> References	: http://lkml.org/lkml/2008/2/23/380
> 		  http://lkml.org/lkml/2008/2/24/281
> Handled-By	: Thomas Gleixner <tglx@linutronix.de>

Seems to be fixed by: http://lkml.org/lkml/2008/3/22/66
Which introduced a WARNNG:, fixed by the subsequent: 
http://lkml.org/lkml/2008/3/23/199

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10190
> Subject		: [BUG] Linux-2.6.25-rc4 (and also in rc3) Compile Error
> Submitter	: Tarkan Erimer <tarkan@netone.net.tr>
> Date		: 2008-03-05 05:01 (12 days old)
> References	: http://www.ussg.iu.edu/hypermail/linux/kernel/0803.0/1867.html

Bugzila entry is closed.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10211
> Subject		: drivers/media/video/cx2341x.c: undefined references
> Submitter	: Toralf Förster <toralf.foerster@gmx.de>
> Date		: 2008-03-07 13:48 (10 days old)
> References	: http://lkml.org/lkml/2008/3/7/168

Closed.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10234
> Subject		: pciehp hang on hp ia64 rx6600
> Submitter	: Alex Chiang <achiang@hp.com>
> Date		: 2008-03-12 00:47 (5 days old)
> References	: http://lkml.org/lkml/2008/3/12/31
> Handled-By	: Mark Lord <mlord@pobox.com>

Closed.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10238
> Subject		: netconsole still hangs
> Submitter	: Andrew Morton <akpm@linux-foundation.org>
> Date		: 2008-03-12 23:14 (5 days old)
> References	: http://marc.info/?t=120536379200004&amp;r=1&amp;w=2
> Handled-By	: David Miller <davem@davemloft.net>
> 		  Stephen Hemminger <shemminger@linux-foundation.org>

Closed.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10242
> Subject		: rm command hangs
> Submitter	: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
> Date		: 2008-03-14 05:47 (3 days old)

Maybe related to http://bugzilla.kernel.org/show_bug.cgi?id=10207, which 
is (about to be) closed?

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10266
> Subject		: [PATCH] i810fb: Fix console switch regression
> Submitter	: Stefan Bauer <stefan.bauer@cs.tu-chemnitz.de>
> Date		: 2008-03-16 19:42 (1 days old)
> References	: http://lkml.org/lkml/2008/3/16/84

Closed.

> Regressionn with patches
> ------------------------
> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10016
> Subject		: cobalt_btns.c &lt;-&gt; struct platform_device compile error
> Submitter	: Adrian Bunk <bunk@kernel.org>
> Date		: 2008-02-17 12:12 (29 days old)
> References	: http://lkml.org/lkml/2008/2/17/293
> Handled-By	: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
> Patch		: http://lkml.org/lkml/2008/3/9/25

Closed.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10017
> Subject		: cdev removal broke cobalt_btns.c compilation
> Submitter	: Adrian Bunk <bunk@kernel.org>
> Date		: 2008-02-17 12:14 (29 days old)
> References	: http://lkml.org/lkml/2008/2/17/295
> Handled-By	: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
> Patch		: http://lkml.org/lkml/2008/3/9/25

Closed.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10186
> Subject		: SCSI_AIC94XX must depend on SCSI
> Submitter	: Toralf Förster <toralf.foerster@gmx.de>
> Date		: 2008-03-06 19:09 (11 days old)
> References	: http://marc.info/?l=linux-kernel&amp;m=120483073617232&amp;w=2
> Handled-By	: Adrian Bunk <bunk@kernel.org>
> Patch		: http://marc.info/?l=linux-kernel&amp;m=120483499725928&amp;w=2

Testing...

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10210
> Subject		: 2.6.25-rc4-git3: Handling of audio CDs broken on pata_ali
> Submitter	: Rafael J. Wysocki <rjw@sisk.pl>
> Date		: 2008-03-08 22:46 (9 days old)
> References	: http://lkml.org/lkml/2008/3/8/123
> Handled-By	: Tejun Heo <htejun@gmail.com>
> Patch		: http://lkml.org/lkml/2008/3/10/69

Closed.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10232
> Subject		: intel mtrr fixups apparently broke display and e1000 probe
> Submitter	: Stephen Gran <steve@lobefin.net>
> Date		: 2008-03-12 08:37 (5 days old)
> Handled-By	: Yinghai Lu <yhlu.kenrel@gmail.com>
> Patch		: http://bugzilla.kernel.org/attachment.cgi?id=15271&amp;action=view

Closed.

> Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10259
> Subject		: /sys/class/hwmon/hwmon0 is missing a device link
> Submitter	: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
> Date		: 2008-03-16 04:56 (1 days old)
> Handled-By	: Jean Delvare <khali@linux-fr.org>
> Patch		: http://bugzilla.kernel.org/attachment.cgi?id=15301&amp;action=view

Closed.


Thanks,
Christian.
-- 
BOFH excuse #387:

Your computer's union contract is set to expire at midnight.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-23 21:17 ` Christian Kujau
@ 2008-03-23 21:29   ` Rafael J. Wysocki
  0 siblings, 0 replies; 41+ messages in thread
From: Rafael J. Wysocki @ 2008-03-23 21:29 UTC (permalink / raw)
  To: Christian Kujau
  Cc: LKML, Adrian Bunk, Andrew Morton, Linus Torvalds,
	Natalie Protasevich, Zdenek Kabelac

Hi,

Please have a look at the latest report:
http://lkml.org/lkml/2008/3/21/516

On Sunday, 23 of March 2008, Christian Kujau wrote:
> On Mon, 17 Mar 2008, Rafael J. Wysocki wrote:
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=9983
> > Subject		: PROBLEM: 2.6.25-rc1-git2 freezes when accessing external USB hard disk (ehci-hcd)
> > Submitter	: Linas Žvirblis <0x0007@gmail.com>
> > Date		: 2008-02-13 22:38 (33 days old)
> > References	: http://lkml.org/lkml/2008/2/13/566
> 
> Linas did not respond any more, and you closed the bug :)
> 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10051
> > Subject		: Spurious messages at boot, eventually hangs the usb subsustem
> > Submitter	: Jean-Luc Coulon <jean.luc.coulon@gmail.com>
> > Date		: 2008-02-20 09:10 (26 days old)
> 
> Hm, Jean-Luc said:
> >  ------- Comment  #4 From Jean-Luc Coulon  2008-03-09 22:50:19   ----
> >  BTW, I can normally boot my system since rc4
> 
> Close?

Yes, if he doesn't respond for a couple of days.

> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10086
> > Subject		: 2.6.25-rc2 + smartd = hang
> > Submitter	: Anders Eriksson <aeriksson@fastmail.fm>
> > Date		: 2008-02-22 17:51 (24 days old)
> > References	: http://lkml.org/lkml/2008/2/22/239
> > Handled-By	: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=10086#c5 says it's fixed.

Yes, it's closed now.
 
> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10133
> > Subject		: INFO: possible circular locking in the resume
> > Submitter	: Zdenek Kabelac <zdenek.kabelac@gmail.com>
> > Date		: 2008-02-27 (19 days old)
> > References	: http://lkml.org/lkml/2008/2/26/479
> > Handled-By	: Gautham R Shenoy <ego@in.ibm.com>
> 
> Gautham said on 2008-02-28 he has a patch - but did not post it. What now?

The reporter is unresponsive.  We're waiting for him to respond.

> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10146
> > Subject		: 2.6.25-rc: complete lockup on boot/start of X (bisected)
> > Submitter	: Marcin Slusarz <marcin.slusarz@gmail.com>
> > Date		: 2008-03-02 20:00 (15 days old)
> > References	: http://lkml.org/lkml/2008/3/2/91
> > Handled-By	: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> Seems to be fixed: http://lkml.org/lkml/2008/3/23/275

Yes, I've already updated the entry with this patch.

> > Bug-Entry	: http://bugzilla.kernel.org/show_bug.cgi?id=10152
> > Subject		: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
> > Submitter	: Gabriel C <nix.or.die@googlemail.com>
> > Date		: 2008-02-24 01:31 (22 days old)
> > References	: http://lkml.org/lkml/2008/2/23/380
> > 		  http://lkml.org/lkml/2008/2/24/281
> > Handled-By	: Thomas Gleixner <tglx@linutronix.de>
> 
> Seems to be fixed by: http://lkml.org/lkml/2008/3/22/66
> Which introduced a WARNNG:, fixed by the subsequent: 
> http://lkml.org/lkml/2008/3/23/199

This is not on the list any more.

BTW, the reports I send reflect the state of the Bugzilla entries.  The closed
entries will not be reported next time.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-23 11:00                                           ` Gabriel C
@ 2008-03-23 23:31                                             ` Gabriel C
  2008-03-24 10:24                                               ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-23 23:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Gabriel C wrote:
> Gabriel C wrote:
>> Thomas Gleixner wrote:
>>> On Sat, 22 Mar 2008, Thomas Gleixner wrote:
>>>> On Sat, 22 Mar 2008, Gabriel C wrote:
>>>>> With this one TSC is fine but now I get a warning on boot :
>>>> Good. It confirms my assumptions about the root cause.
>>>>
>>>>> [    0.041037] ------------[ cut here ]------------
>>>>> [    0.041052] WARNING: at arch/x86/kernel/smp_32.c:562 native_smp_call_function_mask+0x23/0x11e()
>>>> Grr. I'll work out a solution for that one.
>>> Gabriel,
>>>
>>> I'm happy to rack your nerves some more.
>> No worries :) 
>>
>>> After discussing the issue with Peter and Ingo the following solution
>>> seems to be the one which is the least intrusive. 
>>>
>>> Can you please give it a test ride ?
>> Done , git head + Andi's patch + this version of your patch does work here.
>>
>> Also time-warp-test is just fine and everything else seems to work.
> 
> Also I've tested with my other motherboard and is fine too :)
> 
> Feel free to add my Tested-by when you push this patch.

Heh :/

...

[ 5902.632878] Clocksource tsc unstable (delta = 4686687272 ns)
[ 5920.650516] Time: acpi_pm clocksource has been installed.

...

Seems like something still triggers that :/ 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-23 23:31                                             ` Gabriel C
@ 2008-03-24 10:24                                               ` Thomas Gleixner
  2008-03-24 22:33                                                 ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-24 10:24 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Mon, 24 Mar 2008, Gabriel C wrote:
> >>> Can you please give it a test ride ?
> >> Done , git head + Andi's patch + this version of your patch does work here.
> >>
> >> Also time-warp-test is just fine and everything else seems to work.
> > 
> > Also I've tested with my other motherboard and is fine too :)
> > 
> > Feel free to add my Tested-by when you push this patch.
> 
> Heh :/
> 
> ...
> 
> [ 5902.632878] Clocksource tsc unstable (delta = 4686687272 ns)
> [ 5920.650516] Time: acpi_pm clocksource has been installed.
> 
> ...
> 
> Seems like something still triggers that :/ 

Hmm. Can you please apply the patch below. It add some more info and
triggers the sysrq-q timer list printout when the watchdog
triggers. That might us give some insight into this.

Thanks,
	tglx

---
 kernel/time/clocksource.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/kernel/time/clocksource.c
===================================================================
--- linux-2.6.orig/kernel/time/clocksource.c
+++ linux-2.6/kernel/time/clocksource.c
@@ -87,8 +87,10 @@ static void clocksource_ratewd(struct cl
 	if (delta > -WATCHDOG_THRESHOLD && delta < WATCHDOG_THRESHOLD)
 		return;
 
-	printk(KERN_WARNING "Clocksource %s unstable (delta = %Ld ns)\n",
-	       cs->name, delta);
+	printk(KERN_WARNING
+	       "Clocksource %s unstable (delta = %Ld ns) E:%lu J:%lu\n",
+	       cs->name, delta, watchdog_timer.expires, jiffies);
+	sysrq_timer_list_show();
 	cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
 	clocksource_change_rating(cs, 0);
 	list_del(&cs->wd_list);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-24 10:24                                               ` Thomas Gleixner
@ 2008-03-24 22:33                                                 ` Gabriel C
  2008-03-25  8:06                                                   ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-24 22:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Thomas Gleixner wrote:
> On Mon, 24 Mar 2008, Gabriel C wrote:
>>>>> Can you please give it a test ride ?
>>>> Done , git head + Andi's patch + this version of your patch does work here.
>>>>
>>>> Also time-warp-test is just fine and everything else seems to work.
>>> Also I've tested with my other motherboard and is fine too :)
>>>
>>> Feel free to add my Tested-by when you push this patch.
>> Heh :/
>>
>> ...
>>
>> [ 5902.632878] Clocksource tsc unstable (delta = 4686687272 ns)
>> [ 5920.650516] Time: acpi_pm clocksource has been installed.
>>
>> ...
>>
>> Seems like something still triggers that :/ 
> 
> Hmm. Can you please apply the patch below. It add some more info and
> triggers the sysrq-q timer list printout when the watchdog
> triggers. That might us give some insight into this.

Sorry for the lag , I was out the whole day.

Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):

...

[34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723
[34528.893380] Timer List Version: v0.3
[34528.893386] HRTIMER_MAX_CLOCK_BASES: 2
[34528.893392] now at 34510722407314 nsecs
[34528.893396]
[34528.893399] cpu: 0
[34528.893402]  clock 0:
[34528.893404]   .index:      0
[34528.893407]   .resolution: 1 nsecs
[34528.893409]   .get_time:   ktime_get_real
[34528.893422]   .offset:     1206358214734619011 nsecs
[34528.893425] active timers:
[34528.893428]  clock 1:
[34528.893430]   .index:      1
[34528.893433]   .resolution: 1 nsecs
[34528.893435]   .get_time:   ktime_get
[34528.893440]   .offset:     0 nsecs
[34528.893443] active timers:
[34528.893445]  #0: <e26a7d68>, tick_sched_timer, S:01
[34528.893467]  # expires at 34510723000000 nsecs [in 592686 nsecs]
[34528.893470]  #1: <e26a7d68>, it_real_fn, S:01
[34528.893481]  # expires at 34510724648354 nsecs [in 2241040 nsecs]
[34528.893485]  #2: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893495]  # expires at 34510997616597 nsecs [in 275209283 nsecs]
[34528.893498]  #3: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893508]  # expires at 34511115498292 nsecs [in 393090978 nsecs]
[34528.893512]  #4: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893521]  # expires at 34511328809630 nsecs [in 606402316 nsecs]
[34528.893525]  #5: <e26a7d68>, it_real_fn, S:01
[34528.893534]  # expires at 34511515619673 nsecs [in 793212359 nsecs]
[34528.893537]  #6: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893547]  # expires at 34512265383335 nsecs [in 1542976021 nsecs]
[34528.893551]  #7: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893561]  # expires at 34518835323224 nsecs [in 8112915910 nsecs]
[34528.893564]  #8: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893574]  # expires at 34546891223588 nsecs [in 36168816274 nsecs]
[34528.893578]  #9: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893588]  # expires at 36035545999324 nsecs [in 1524823592010 nsecs]
[34528.893592]  #10: <e26a7d68>, hrtimer_wakeup, S:01
[34528.893601]  # expires at 36035980869577 nsecs [in 1525258462263 nsecs]
[34528.893606]   .expires_next   : 34510723000000 nsecs
[34528.893609]   .hres_active    : 1
[34528.893612]   .nr_events      : 3447408
[34528.893615]   .nohz_mode      : 2
[34528.893618]   .idle_tick      : 34510712000000 nsecs
[34528.893621]   .tick_stopped   : 0
[34528.893624]   .idle_jiffies   : 34210712
[34528.893627]   .idle_calls     : 3267634
[34528.893630]   .idle_sleeps    : 1588325
[34528.893633]   .idle_entrytime : 34510722486118 nsecs
[34528.893636]   .idle_waketime  : 34510722348607 nsecs
[34528.893640]   .idle_exittime  : 34510722383780 nsecs
[34528.893643]   .idle_sleeptime : 33379861006002 nsecs
[34528.893646]   .last_jiffies   : 34210723
[34528.893649]   .next_jiffies   : 34210725
[34528.893652]   .idle_expires   : 34510724000000 nsecs
[34528.893655] jiffies: 34210723
[34528.893657]
[34528.893660] cpu: 1
[34528.893662]  clock 0:
[34528.893664]   .index:      0
[34528.893666]   .resolution: 1 nsecs
[34528.893669]   .get_time:   ktime_get_real
[34528.893675]   .offset:     1206358214734619011 nsecs
[34528.893677] active timers:
[34528.893680]  clock 1:
[34528.893682]   .index:      1
[34528.893685]   .resolution: 1 nsecs
[34528.893687]   .get_time:   ktime_get
[34528.893692]   .offset:     0 nsecs
[34528.893694] active timers:
[34528.893697]  #0: <e26a7d68>, tick_sched_timer, S:01
[34528.893706]  # expires at 34510996000000 nsecs [in 273592686 nsecs]
[34528.893710]   .expires_next   : 34510996000000 nsecs
[34528.893713]   .hres_active    : 1
[34528.893716]   .nr_events      : 3081558
[34528.893719]   .nohz_mode      : 2
[34528.893722]   .idle_tick      : 34510713125000 nsecs
[34528.893725]   .tick_stopped   : 1
[34528.893727]   .idle_jiffies   : 34210713
[34528.893730]   .idle_calls     : 2673472
[34528.893733]   .idle_sleeps    : 1233326
[34528.893736]   .idle_entrytime : 34510712135468 nsecs
[34528.893740]   .idle_waketime  : 34507995998292 nsecs
[34528.893743]   .idle_exittime  : 34510711012024 nsecs
[34528.893746]   .idle_sleeptime : 33654735968486 nsecs
[34528.893749]   .last_jiffies   : 34210713
[34528.893752]   .next_jiffies   : 34210997
[34528.893755]   .idle_expires   : 34510996000000 nsecs
[34528.893758] jiffies: 34210723
[34528.893760]
[34528.893763] cpu: 2
[34528.893765]  clock 0:
[34528.893767]   .index:      0
[34528.893769]   .resolution: 1 nsecs
[34528.893772]   .get_time:   ktime_get_real
[34528.893778]   .offset:     1206358214734619011 nsecs
[34528.893780] active timers:
[34528.893783]  clock 1:
[34528.893785]   .index:      1
[34528.893787]   .resolution: 1 nsecs
[34528.893790]   .get_time:   ktime_get
[34528.893795]   .offset:     0 nsecs
[34528.893797] active timers:
[34528.893799]  #0: <e26a7d68>, tick_sched_timer, S:01
[34528.893809]  # expires at 34511541000000 nsecs [in 818592686 nsecs]
[34528.893813]   .expires_next   : 34511541000000 nsecs
[34528.893815]   .hres_active    : 1
[34528.893818]   .nr_events      : 2005329
[34528.893821]   .nohz_mode      : 2
[34528.893824]   .idle_tick      : 34510562250000 nsecs
[34528.893827]   .tick_stopped   : 1
[34528.893830]   .idle_jiffies   : 34210562
[34528.893833]   .idle_calls     : 1749202
[34528.893836]   .idle_sleeps    : 898585
[34528.893839]   .idle_entrytime : 34510561258541 nsecs
[34528.893842]   .idle_waketime  : 34509285251187 nsecs
[34528.893845]   .idle_exittime  : 34510176022616 nsecs
[34528.893848]   .idle_sleeptime : 33931425421772 nsecs
[34528.893851]   .last_jiffies   : 34210562
[34528.893854]   .next_jiffies   : 34211542
[34528.893858]   .idle_expires   : 34511541000000 nsecs
[34528.893860] jiffies: 34210723
[34528.893863]
[34528.893865] cpu: 3
[34528.893867]  clock 0:
[34528.893869]   .index:      0
[34528.893872]   .resolution: 1 nsecs
[34528.893874]   .get_time:   ktime_get_real
[34528.893880]   .offset:     1206358214734619011 nsecs
[34528.893883] active timers:
[34528.893885]  clock 1:
[34528.893887]   .index:      1
[34528.893890]   .resolution: 1 nsecs
[34528.893892]   .get_time:   ktime_get
[34528.893897]   .offset:     0 nsecs
[34528.893899] active timers:
[34528.893902]  #0: <e26a7d68>, tick_sched_timer, S:01
[34528.893911]  # expires at 34510723375000 nsecs [in 967686 nsecs]
[34528.893915]   .expires_next   : 34510723375000 nsecs
[34528.893918]   .hres_active    : 1
[34528.893921]   .nr_events      : 1532911
[34528.893923]   .nohz_mode      : 2
[34528.893926]   .idle_tick      : 34510713375000 nsecs
[34528.893929]   .tick_stopped   : 0
[34528.893932]   .idle_jiffies   : 34210714
[34528.893935]   .idle_calls     : 1350449
[34528.893938]   .idle_sleeps    : 896094
[34528.893941]   .idle_entrytime : 34510713334805 nsecs
[34528.893944]   .idle_waketime  : 34509973216268 nsecs
[34528.893947]   .idle_exittime  : 34510722367621 nsecs
[34528.893951]   .idle_sleeptime : 34031256949569 nsecs
[34528.893954]   .last_jiffies   : 34210714
[34528.893957]   .next_jiffies   : 34240714
[34528.893960]   .idle_expires   : 34540713000000 nsecs
[34528.893963] jiffies: 34210723
[34528.893965]
[34528.893967]
[34528.893969] Tick Device: mode:     1
[34528.893972] Clock Event Device: pit
[34528.893976]  max_delta_ns:   27461866
[34528.893979]  min_delta_ns:   12571
[34528.893982]  mult:           5124677
[34528.893984]  shift:          32
[34528.893987]  mode:           1
[34528.893990]  next_event:     9223372036854775807 nsecs
[34528.893992]  set_next_event: pit_next_event
[34528.894000]  set_mode:       init_pit_timer
[34528.894005]  event_handler:  tick_handle_oneshot_broadcast
[34528.894013] tick_broadcast_mask: 00000000
[34528.894016] tick_broadcast_oneshot_mask: 00000000
[34528.894019]
[34528.894021]
[34528.894023] Tick Device: mode:     1
[34528.894026] Clock Event Device: lapic
[34528.894030]  max_delta_ns:   1346255303
[34528.894033]  min_delta_ns:   2407
[34528.894035]  mult:           26762229
[34528.894038]  shift:          32
[34528.894041]  mode:           3
[34528.894044]  next_event:     34510724000000 nsecs
[34528.894046]  set_next_event: lapic_next_event
[34528.894054]  set_mode:       lapic_timer_setup
[34528.894059]  event_handler:  hrtimer_interrupt
[34528.894064]
[34528.894066] Tick Device: mode:     1
[34528.894069] Clock Event Device: lapic
[34528.894073]  max_delta_ns:   1346255303
[34528.894075]  min_delta_ns:   2407
[34528.894078]  mult:           26762229
[34528.894081]  shift:          32
[34528.894083]  mode:           3
[34528.894086]  next_event:     34510996000000 nsecs
[34528.894089]  set_next_event: lapic_next_event
[34528.894094]  set_mode:       lapic_timer_setup
[34528.894099]  event_handler:  hrtimer_interrupt
[34528.894104]
[34528.894107] Tick Device: mode:     1
[34528.894109] Clock Event Device: lapic
[34528.894113]  max_delta_ns:   1346255303
[34528.894115]  min_delta_ns:   2407
[34528.894118]  mult:           26762229
[34528.894121]  shift:          32
[34528.894123]  mode:           3
[34528.894126]  next_event:     34511541000000 nsecs
[34528.894129]  set_next_event: lapic_next_event
[34528.894134]  set_mode:       lapic_timer_setup
[34528.894139]  event_handler:  hrtimer_interrupt
[34528.894144]
[34528.894146] Tick Device: mode:     1
[34528.894149] Clock Event Device: lapic
[34528.894153]  max_delta_ns:   1346255303
[34528.894155]  min_delta_ns:   2407
[34528.894158]  mult:           26762229
[34528.894161]  shift:          32
[34528.894163]  mode:           3
[34528.894166]  next_event:     34510723375000 nsecs
[34528.894169]  set_next_event: lapic_next_event
[34528.894174]  set_mode:       lapic_timer_setup
[34528.894179]  event_handler:  hrtimer_interrupt
[34528.894184]
[34528.894350] Time: acpi_pm clocksource has been installed.

...

And that made irqbalance go mad which got killed by OOM , very strange.

                                                                                                 
> 
> Thanks,
> 	tglx


Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-24 22:33                                                 ` Gabriel C
@ 2008-03-25  8:06                                                   ` Thomas Gleixner
  2008-03-26 12:43                                                     ` Gabriel C
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-25  8:06 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Mon, 24 Mar 2008, Gabriel C wrote:
> > Hmm. Can you please apply the patch below. It add some more info and
> > triggers the sysrq-q timer list printout when the watchdog
> > triggers. That might us give some insight into this.
> 
> Sorry for the lag , I was out the whole day.
> 
> Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):
> 
> ...
> 
> [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723

Ok. The timer got delayed. It got delayed because it is initialized as
a deferrable timer, which is obviously wrong. Sigh, I signed off on
that commit myself without thinking about the consequences.

Can you please apply the patch below on top of the others?

> ...
> 
> And that made irqbalance go mad which got killed by OOM , very strange.

Ouch.

revert: 1077f5a917b7c630231037826b344b2f7f5b903f

---
 kernel/time/clocksource.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/time/clocksource.c
===================================================================
--- linux-2.6.orig/kernel/time/clocksource.c
+++ linux-2.6/kernel/time/clocksource.c
@@ -176,7 +176,7 @@ static void clocksource_check_watchdog(s
 			if (watchdog)
 				del_timer(&watchdog_timer);
 			watchdog = cs;
-			init_timer_deferrable(&watchdog_timer);
+			init_timer(&watchdog_timer);
 			watchdog_timer.function = clocksource_watchdog;
 
 			/* Reset watchdog cycles */



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-25  8:06                                                   ` Thomas Gleixner
@ 2008-03-26 12:43                                                     ` Gabriel C
  2008-03-26 14:51                                                       ` Thomas Gleixner
  0 siblings, 1 reply; 41+ messages in thread
From: Gabriel C @ 2008-03-26 12:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

Thomas Gleixner wrote:
> On Mon, 24 Mar 2008, Gabriel C wrote:
>>> Hmm. Can you please apply the patch below. It add some more info and
>>> triggers the sysrq-q timer list printout when the watchdog
>>> triggers. That might us give some insight into this.
>> Sorry for the lag , I was out the whole day.
>>
>> Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):
>>
>> ...
>>
>> [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723
> 
> Ok. The timer got delayed. It got delayed because it is initialized as
> a deferrable timer, which is obviously wrong. Sigh, I signed off on
> that commit myself without thinking about the consequences.
> 
> Can you please apply the patch below on top of the others?

Box is up for almost one day with that patch on top the other ones and everything is fine so far.


> revert: 1077f5a917b7c630231037826b344b2f7f5b903f
> 
> ---
>  kernel/time/clocksource.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6/kernel/time/clocksource.c
> ===================================================================
> --- linux-2.6.orig/kernel/time/clocksource.c
> +++ linux-2.6/kernel/time/clocksource.c
> @@ -176,7 +176,7 @@ static void clocksource_check_watchdog(s
>  			if (watchdog)
>  				del_timer(&watchdog_timer);
>  			watchdog = cs;
> -			init_timer_deferrable(&watchdog_timer);
> +			init_timer(&watchdog_timer);
>  			watchdog_timer.function = clocksource_watchdog;
>  
>  			/* Reset watchdog cycles */
> 
> 


Gabriel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.6.25-rc5-git6: Reported regressions from 2.6.24
  2008-03-26 12:43                                                     ` Gabriel C
@ 2008-03-26 14:51                                                       ` Thomas Gleixner
  0 siblings, 0 replies; 41+ messages in thread
From: Thomas Gleixner @ 2008-03-26 14:51 UTC (permalink / raw)
  To: Gabriel C
  Cc: Gabriel C, Rafael J. Wysocki, LKML, Adrian Bunk, Andrew Morton,
	Linus Torvalds, Natalie Protasevich, andi-bz, Ingo Molnar

On Wed, 26 Mar 2008, Gabriel C wrote:
> Thomas Gleixner wrote:
> > On Mon, 24 Mar 2008, Gabriel C wrote:
> >>> Hmm. Can you please apply the patch below. It add some more info and
> >>> triggers the sysrq-q timer list printout when the watchdog
> >>> triggers. That might us give some insight into this.
> >> Sorry for the lag , I was out the whole day.
> >>
> >> Here is what I've found in dmesg ( the box was idling at that time , as said I was not around ):
> >>
> >> ...
> >>
> >> [34528.893366] Clocksource tsc unstable (delta = 4686697613 ns) E:34204592 J:34210723
> > 
> > Ok. The timer got delayed. It got delayed because it is initialized as
> > a deferrable timer, which is obviously wrong. Sigh, I signed off on
> > that commit myself without thinking about the consequences.
> > 
> > Can you please apply the patch below on top of the others?
> 
> Box is up for almost one day with that patch on top the other ones and everything is fine so far.

Thanks for testing. I push the patches Linuswards.

@Andi: The revert of the reverted clocksource watchdog is staged for .26

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2008-03-26 14:58 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-16 23:18 2.6.25-rc5-git6: Reported regressions from 2.6.24 Rafael J. Wysocki
2008-03-16 23:33 ` Linus Torvalds
2008-03-16 23:38   ` Rafael J. Wysocki
2008-03-17  0:20 ` Gabriel C
2008-03-17 16:17   ` Thomas Gleixner
2008-03-17 18:20     ` Gabriel C
2008-03-18  4:01       ` Gabriel C
2008-03-18  4:24         ` Gabriel C
2008-03-21 15:24         ` Gabriel C
2008-03-21 16:26           ` Thomas Gleixner
2008-03-21 16:46             ` Gabriel C
2008-03-21 18:11               ` Gabriel C
2008-03-21 18:49                 ` Thomas Gleixner
2008-03-21 19:23                   ` Gabriel C
2008-03-21 20:55                     ` Gabriel C
2008-03-21 21:15                       ` Thomas Gleixner
2008-03-21 21:59                         ` Gabriel C
2008-03-21 22:09                           ` Thomas Gleixner
2008-03-22 11:21                             ` Thomas Gleixner
2008-03-22 13:34                               ` Gabriel C
2008-03-22 14:30                                 ` Thomas Gleixner
2008-03-22 15:13                                   ` Gabriel C
2008-03-22 16:32                                     ` Thomas Gleixner
2008-03-22 21:55                                       ` Thomas Gleixner
2008-03-22 22:41                                         ` Gabriel C
2008-03-23 11:00                                           ` Gabriel C
2008-03-23 23:31                                             ` Gabriel C
2008-03-24 10:24                                               ` Thomas Gleixner
2008-03-24 22:33                                                 ` Gabriel C
2008-03-25  8:06                                                   ` Thomas Gleixner
2008-03-26 12:43                                                     ` Gabriel C
2008-03-26 14:51                                                       ` Thomas Gleixner
2008-03-22 14:25                               ` Andi Kleen
2008-03-22 14:41                                 ` Thomas Gleixner
2008-03-17  6:47 ` Jason Wu
2008-03-17 21:36   ` Rafael J. Wysocki
2008-03-23 19:01 ` Christian Kujau
2008-03-23 19:06   ` Rafael J. Wysocki
2008-03-23 19:40     ` Chr
2008-03-23 21:17 ` Christian Kujau
2008-03-23 21:29   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).