LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* XFS related Oops
@ 2007-11-12  6:47 Tino Keitel
  2007-11-12 22:27 ` David Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Tino Keitel @ 2007-11-12  6:47 UTC (permalink / raw)
  To: linux-kernel

Hi,

after resume from suspend with 2.6.23.1, I got the following Oops:

BUG: unable to handle kernel paging request at virtual address 3e0d204c
 printing eip:
c022807f
*pde = 00000000
Oops: 0000 [#1]
SMP 
Modules linked in: dvb_usb_cinergyT2 i915 drm cpufreq_stats usblp
firewire_ohci firewire_core dvb_usb crc_itu_t evdev snd_hda_intel
rtc_cmos sky2 dvb_core applesmc led_class coretemp hwmon
CPU:    0
EIP:    0060:[<c022807f>]    Not tainted VLI
EFLAGS: 00010202   (2.6.23.1 #4)
EIP is at xfs_iget_core+0x6f/0x690
eax: 0033f000   ebx: 3e0d2010   ecx: f7826000   edx: 0033f000
esi: 001346d7   edi: 00000000   ebp: f7525214   esp: f68b7cb4
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process squid (pid: 2967, ti=f68b6000 task=f6833030 task.ti=f68b6000)
Stack: c024f1b1 00000001 f6a3d6c0 001346d7 f789ce00 c18993d8 c04fb280
f7525218 
       00000000 00000000 f7826000 e652c240 00000000 3e0d2010 e652c25c
e652c240 
       001346d7 f7826000 c0228774 001346d7 00000000 00000000 00000000
f68b7d80 
Call Trace:
 [<c024f1b1>] kmem_zone_alloc+0x51/0xb0
 [<c0228774>] xfs_iget+0xd4/0x160
 [<c024474b>] xfs_dir_lookup_int+0x9b/0x100
 [<c0247ec5>] xfs_lookup+0x75/0xa0
 [<c0256874>] xfs_vn_lookup+0x54/0x90
 [<c0177a32>] do_lookup+0x122/0x1a0
 [<c01794d4>] __link_path_walk+0x784/0xd80
 [<c0273072>] __next_cpu+0x12/0x30
 [<c0121dbd>] find_busiest_group+0x19d/0x6a0
 [<c0179b15>] link_path_walk+0x45/0xc0
 [<c0131350>] process_timeout+0x0/0x10
 [<c016eb62>] get_unused_fd_flags+0x52/0xc0
 [<c0179db3>] do_path_lookup+0x73/0x1b0
 [<c01719b8>] get_empty_filp+0x58/0x120
 [<c017a9d1>] __path_lookup_intent_open+0x51/0xa0
 [<c017aab0>] path_lookup_open+0x20/0x30
 [<c017abb6>] open_namei+0x66/0x640
 [<c0131697>] lock_timer_base+0x27/0x60
 [<c0131715>] try_to_del_timer_sync+0x45/0x50
 [<c016ee7e>] do_filp_open+0x2e/0x60
 [<c0131350>] process_timeout+0x0/0x10
 [<c016eb62>] get_unused_fd_flags+0x52/0xc0
 [<c016eefc>] do_sys_open+0x4c/0xe0
 [<c016efcc>] sys_open+0x1c/0x20
 [<c010428e>] sysenter_past_esp+0x5f/0x85
 =======================
Code: 44 24 1c 8b 44 24 1c e8 60 92 1b 00 8b 5d 00 85 db 89 5c 24 34 75
14 e9 90 00 00 00 8b 5b 04 85 db 89 5c 24 34 0f 84 81 00 00 00 <8b> 53
3c 8b 43 38 31 fa 31 f0 09 c2 75 e3 8d 83 dc 00 00 00 e8 
EIP: [<c022807f>] xfs_iget_core+0x6f/0x690 SS:ESP 0068:f68b7cb4

Does this ring any bells?

Regards,
Tino

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops
  2007-11-12  6:47 XFS related Oops Tino Keitel
@ 2007-11-12 22:27 ` David Chinner
  2007-11-13 10:51   ` Tino Keitel
  0 siblings, 1 reply; 16+ messages in thread
From: David Chinner @ 2007-11-12 22:27 UTC (permalink / raw)
  To: linux-kernel

On Mon, Nov 12, 2007 at 07:47:06AM +0100, Tino Keitel wrote:
> Hi,
> 
> after resume from suspend with 2.6.23.1, I got the following Oops:
> 
> BUG: unable to handle kernel paging request at virtual address 3e0d204c
>  printing eip:
> c022807f
> *pde = 00000000
> Oops: 0000 [#1]
> SMP 
> Modules linked in: dvb_usb_cinergyT2 i915 drm cpufreq_stats usblp
> firewire_ohci firewire_core dvb_usb crc_itu_t evdev snd_hda_intel
> rtc_cmos sky2 dvb_core applesmc led_class coretemp hwmon
> CPU:    0
> EIP:    0060:[<c022807f>]    Not tainted VLI
> EFLAGS: 00010202   (2.6.23.1 #4)
> EIP is at xfs_iget_core+0x6f/0x690
> eax: 0033f000   ebx: 3e0d2010   ecx: f7826000   edx: 0033f000
> esi: 001346d7   edi: 00000000   ebp: f7525214   esp: f68b7cb4
> ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> Process squid (pid: 2967, ti=f68b6000 task=f6833030 task.ti=f68b6000)
> Stack: c024f1b1 00000001 f6a3d6c0 001346d7 f789ce00 c18993d8 c04fb280
> f7525218 
>        00000000 00000000 f7826000 e652c240 00000000 3e0d2010 e652c25c
> e652c240 
>        001346d7 f7826000 c0228774 001346d7 00000000 00000000 00000000
> f68b7d80 
> Call Trace:
>  [<c024f1b1>] kmem_zone_alloc+0x51/0xb0
>  [<c0228774>] xfs_iget+0xd4/0x160
>  [<c024474b>] xfs_dir_lookup_int+0x9b/0x100
>  [<c0247ec5>] xfs_lookup+0x75/0xa0
>  [<c0256874>] xfs_vn_lookup+0x54/0x90
>  [<c0177a32>] do_lookup+0x122/0x1a0
>  [<c01794d4>] __link_path_walk+0x784/0xd80
>  [<c0273072>] __next_cpu+0x12/0x30
>  [<c0121dbd>] find_busiest_group+0x19d/0x6a0
>  [<c0179b15>] link_path_walk+0x45/0xc0
>  [<c0131350>] process_timeout+0x0/0x10
>  [<c016eb62>] get_unused_fd_flags+0x52/0xc0
>  [<c0179db3>] do_path_lookup+0x73/0x1b0
>  [<c01719b8>] get_empty_filp+0x58/0x120
>  [<c017a9d1>] __path_lookup_intent_open+0x51/0xa0
>  [<c017aab0>] path_lookup_open+0x20/0x30
>  [<c017abb6>] open_namei+0x66/0x640
>  [<c0131697>] lock_timer_base+0x27/0x60
>  [<c0131715>] try_to_del_timer_sync+0x45/0x50
>  [<c016ee7e>] do_filp_open+0x2e/0x60
>  [<c0131350>] process_timeout+0x0/0x10
>  [<c016eb62>] get_unused_fd_flags+0x52/0xc0
>  [<c016eefc>] do_sys_open+0x4c/0xe0
>  [<c016efcc>] sys_open+0x1c/0x20
>  [<c010428e>] sysenter_past_esp+0x5f/0x85
>  =======================
> Code: 44 24 1c 8b 44 24 1c e8 60 92 1b 00 8b 5d 00 85 db 89 5c 24 34 75
> 14 e9 90 00 00 00 8b 5b 04 85 db 89 5c 24 34 0f 84 81 00 00 00 <8b> 53
> 3c 8b 43 38 31 fa 31 f0 09 c2 75 e3 8d 83 dc 00 00 00 e8 
> EIP: [<c022807f>] xfs_iget_core+0x6f/0x690 SS:ESP 0068:f68b7cb4
> 
> Does this ring any bells?

No. I'd say something got screwed up during suspend/resume. Is it
reproducable?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops
  2007-11-12 22:27 ` David Chinner
@ 2007-11-13 10:51   ` Tino Keitel
  2007-11-13 23:04     ` David Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Tino Keitel @ 2007-11-13 10:51 UTC (permalink / raw)
  To: linux-kernel

On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:

[...]

> No. I'd say something got screwed up during suspend/resume. Is it
> reproducable?

No. I often use suspend to RAM, and usually it works without such
failures. I restart squid during the resume prosecure, and the above
Oops lead to a squid in D state.

Regards,
Tino

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops
  2007-11-13 10:51   ` Tino Keitel
@ 2007-11-13 23:04     ` David Chinner
  2007-11-26 13:07       ` Tino Keitel
  2007-11-26 13:12       ` Tino Keitel
  0 siblings, 2 replies; 16+ messages in thread
From: David Chinner @ 2007-11-13 23:04 UTC (permalink / raw)
  To: linux-kernel

On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote:
> On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:
> 
> [...]
> 
> > No. I'd say something got screwed up during suspend/resume. Is it
> > reproducable?
> 
> No. I often use suspend to RAM, and usually it works without such
> failures. I restart squid during the resume prosecure, and the above
> Oops lead to a squid in D state.

Ok. Sounds like there's not much we can debug at this point. Thanks
for the report, though.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops
  2007-11-13 23:04     ` David Chinner
@ 2007-11-26 13:07       ` Tino Keitel
  2007-11-26 13:12       ` Tino Keitel
  1 sibling, 0 replies; 16+ messages in thread
From: Tino Keitel @ 2007-11-26 13:07 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote:
> On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote:
> > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:
> > 
> > [...]
> > 
> > > No. I'd say something got screwed up during suspend/resume. Is it
> > > reproducable?
> > 
> > No. I often use suspend to RAM, and usually it works without such
> > failures. I restart squid during the resume prosecure, and the above
> > Oops lead to a squid in D state.
> 
> Ok. Sounds like there's not much we can debug at this point. Thanks
> for the report, though.

I got a similar Oops again:

xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680
------------[ cut here ]------------
kernel BUG at fs/xfs/support/debug.c:55!
invalid opcode: 0000 [#2]
SMP 
Modules linked in: dvb_usb_cinergyT2 rfcomm l2cap bluetooth i915 drm
cpufreq_stats firewire_ohci firewire_core dvb_usb crc_itu_t usblp evdev
snd_hda_intel rtc_cmos sky2 dvb_core applesmc led_class coretemp hwmon
CPU:    0
EIP:    0060:[<c025a45a>]    Tainted: G      D VLI
EFLAGS: 00010246   (2.6.23.1 #4)
EIP is at cmn_err+0x9a/0xa0
eax: c049efb0   ebx: c044ffac   ecx: 00000001   edx: 00000286
esi: 00000000   edi: 00000286   ebp: f75f86f0   esp: eab9fccc
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process gc_approx (pid: 29386, ti=eab9e000 task=e409b030
task.ti=eab9e000)
Stack: c04531ce c043a07d c05544c0 eab9fcf4 c0071500 00002094 00000000
c022855b 
       00000000 c044ffac c00700c0 cb5a1680 f796ac00 c18828c4 c04fb280
f75f86f4 
       00000000 00000000 f748e800 cb5a1680 00000000 c0071500 cb5a169c
cb5a1680 
Call Trace:
 [<c022855b>] xfs_iget_core+0x54b/0x690
 [<c0228774>] xfs_iget+0xd4/0x160
 [<c024474b>] xfs_dir_lookup_int+0x9b/0x100
 [<c0247ec5>] xfs_lookup+0x75/0xa0
 [<c0256874>] xfs_vn_lookup+0x54/0x90
 [<c0177a32>] do_lookup+0x122/0x1a0
 [<c01794d4>] __link_path_walk+0x784/0xd80
 [<c0179b35>] link_path_walk+0x65/0xc0
 [<c0179b15>] link_path_walk+0x45/0xc0
 [<c0179db3>] do_path_lookup+0x73/0x1b0
 [<c0178a55>] getname+0xa5/0xc0
 [<c017a7bb>] __user_walk_fd+0x3b/0x60
 [<c0173992>] vfs_stat_fd+0x22/0x60
 [<c0173a7f>] sys_stat64+0xf/0x30
 [<c0181885>] dput+0x85/0x100
 [<c0171744>] __fput+0x114/0x160
 [<c01856eb>] mntput_no_expire+0x1b/0x80
 [<c016ead7>] filp_close+0x47/0x80
 [<c016ff93>] sys_close+0x63/0xc0
 [<c010428e>] sysenter_past_esp+0x5f/0x85
 [<c03e0000>] __mutex_lock_interruptible_slowpath+0xb0/0x290
 =======================
Code: 45 c0 89 44 24 04 e8 e6 ec ec ff 89 fa b8 b0 ef 49 c0 e8 5a 6e 18
00 85 f6 74 10 83 c4 10 5b 5e 5f c3 c6 81 c0 44 55 c0 00 eb c1 <0f> 0b
eb fe 90 90 83 ec 10 85 d2 89 1c 24 89 cb 89 74 24 04 89 
EIP: [<c025a45a>] cmn_err+0x9a/0xa0 SS:ESP 0068:eab9fccc

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops
  2007-11-13 23:04     ` David Chinner
  2007-11-26 13:07       ` Tino Keitel
@ 2007-11-26 13:12       ` Tino Keitel
  2007-11-26 21:08         ` XFS related Oops (suspend/resume related) David Chinner
  1 sibling, 1 reply; 16+ messages in thread
From: Tino Keitel @ 2007-11-26 13:12 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote:
> On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote:
> > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:
> > 
> > [...]
> > 
> > > No. I'd say something got screwed up during suspend/resume. Is it
> > > reproducable?
> > 
> > No. I often use suspend to RAM, and usually it works without such
> > failures. I restart squid during the resume prosecure, and the above
> > Oops lead to a squid in D state.
> 
> Ok. Sounds like there's not much we can debug at this point. Thanks
> for the report, though.

I got a similar Oops again:

xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680
------------[ cut here ]------------
kernel BUG at fs/xfs/support/debug.c:55!
invalid opcode: 0000 [#2]
SMP 
Modules linked in: dvb_usb_cinergyT2 rfcomm l2cap bluetooth i915 drm
cpufreq_stats firewire_ohci firewire_core dvb_usb crc_itu_t usblp evdev
snd_hda_intel rtc_cmos sky2 dvb_core applesmc led_class coretemp hwmon
CPU:    0
EIP:    0060:[<c025a45a>]    Tainted: G      D VLI
EFLAGS: 00010246   (2.6.23.1 #4)
EIP is at cmn_err+0x9a/0xa0
eax: c049efb0   ebx: c044ffac   ecx: 00000001   edx: 00000286
esi: 00000000   edi: 00000286   ebp: f75f86f0   esp: eab9fccc
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process gc_approx (pid: 29386, ti=eab9e000 task=e409b030
task.ti=eab9e000)
Stack: c04531ce c043a07d c05544c0 eab9fcf4 c0071500 00002094 00000000
c022855b 
       00000000 c044ffac c00700c0 cb5a1680 f796ac00 c18828c4 c04fb280
f75f86f4 
       00000000 00000000 f748e800 cb5a1680 00000000 c0071500 cb5a169c
cb5a1680 
Call Trace:
 [<c022855b>] xfs_iget_core+0x54b/0x690
 [<c0228774>] xfs_iget+0xd4/0x160
 [<c024474b>] xfs_dir_lookup_int+0x9b/0x100
 [<c0247ec5>] xfs_lookup+0x75/0xa0
 [<c0256874>] xfs_vn_lookup+0x54/0x90
 [<c0177a32>] do_lookup+0x122/0x1a0
 [<c01794d4>] __link_path_walk+0x784/0xd80
 [<c0179b35>] link_path_walk+0x65/0xc0
 [<c0179b15>] link_path_walk+0x45/0xc0
 [<c0179db3>] do_path_lookup+0x73/0x1b0
 [<c0178a55>] getname+0xa5/0xc0
 [<c017a7bb>] __user_walk_fd+0x3b/0x60
 [<c0173992>] vfs_stat_fd+0x22/0x60
 [<c0173a7f>] sys_stat64+0xf/0x30
 [<c0181885>] dput+0x85/0x100
 [<c0171744>] __fput+0x114/0x160
 [<c01856eb>] mntput_no_expire+0x1b/0x80
 [<c016ead7>] filp_close+0x47/0x80
 [<c016ff93>] sys_close+0x63/0xc0
 [<c010428e>] sysenter_past_esp+0x5f/0x85
 [<c03e0000>] __mutex_lock_interruptible_slowpath+0xb0/0x290
 =======================
Code: 45 c0 89 44 24 04 e8 e6 ec ec ff 89 fa b8 b0 ef 49 c0 e8 5a 6e 18
00 85 f6 74 10 83 c4 10 5b 5e 5f c3 c6 81 c0 44 55 c0 00 eb c1 <0f> 0b
eb fe 90 90 83 ec 10 85 d2 89 1c 24 89 cb 89 74 24 04 89 
EIP: [<c025a45a>] cmn_err+0x9a/0xa0 SS:ESP 0068:eab9fccc

Regards,
Tino

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-26 13:12       ` Tino Keitel
@ 2007-11-26 21:08         ` David Chinner
  2007-11-26 22:07           ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: David Chinner @ 2007-11-26 21:08 UTC (permalink / raw)
  To: David Chinner, linux-kernel; +Cc: rjw, xfs

On Mon, Nov 26, 2007 at 02:12:10PM +0100, Tino Keitel wrote:
> On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote:
> > On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote:
> > > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:
> > > 
> > > [...]
> > > 
> > > > No. I'd say something got screwed up during suspend/resume. Is it
> > > > reproducable?
> > > 
> > > No. I often use suspend to RAM, and usually it works without such
> > > failures. I restart squid during the resume prosecure, and the above
> > > Oops lead to a squid in D state.
> > 
> > Ok. Sounds like there's not much we can debug at this point. Thanks
> > for the report, though.
> 
> I got a similar Oops again:
> 
> xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680

Now there's a message that I haven't seen in about 3 years.

It indicates that the linux inode connected to the xfs_inode is not
the correct one. i.e. that the linux inode cache is out of step with
the XFS inode cache.

Basically, that is not supposed to happen. I suspect that the way
threads are frozen is resulting in an inode lookup racing with
a reclaim. The reclaim thread gets stopped after any use threads,
and so we could have the situation that a process blocked in lookup
has the XFS inode reclaimed and reused before it gets unblocked.

The question is why is it happening now when none of that code in
XFS has changed?

Rafael, when are threads frozen? Only when they schedule or call
try_to_freeze()? Did the freezer mechanism change in 2.6.23 (this is
on 2.6.23.1)?  Is there some way of getting a stack trace of all the
processes in the system once the machine is frozen and about to
suspend so we can see if we blocked in a lookup?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-26 21:08         ` XFS related Oops (suspend/resume related) David Chinner
@ 2007-11-26 22:07           ` Rafael J. Wysocki
  2007-11-27 13:20             ` Tino Keitel
  2007-11-27 15:51             ` Rafael J. Wysocki
  0 siblings, 2 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2007-11-26 22:07 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel, xfs

On Monday, 26 of November 2007, David Chinner wrote:
> On Mon, Nov 26, 2007 at 02:12:10PM +0100, Tino Keitel wrote:
> > On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote:
> > > On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote:
> > > > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > No. I'd say something got screwed up during suspend/resume. Is it
> > > > > reproducable?
> > > > 
> > > > No. I often use suspend to RAM, and usually it works without such
> > > > failures. I restart squid during the resume prosecure, and the above
> > > > Oops lead to a squid in D state.
> > > 
> > > Ok. Sounds like there's not much we can debug at this point. Thanks
> > > for the report, though.
> > 
> > I got a similar Oops again:
> > 
> > xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680
> 
> Now there's a message that I haven't seen in about 3 years.
> 
> It indicates that the linux inode connected to the xfs_inode is not
> the correct one. i.e. that the linux inode cache is out of step with
> the XFS inode cache.
> 
> Basically, that is not supposed to happen. I suspect that the way
> threads are frozen is resulting in an inode lookup racing with
> a reclaim. The reclaim thread gets stopped after any use threads,
> and so we could have the situation that a process blocked in lookup
> has the XFS inode reclaimed and reused before it gets unblocked.
> 
> The question is why is it happening now when none of that code in
> XFS has changed?
> 
> Rafael, when are threads frozen? Only when they schedule or call
> try_to_freeze()?

Kernel threads freeze only when they call try_to_freeze().  User space tasks
freeze while executing the signals handling code.

> Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?

Yes.  Kernel threads are not sent fake signals by the freezer any more.

> Is there some way of getting a stack trace of all the 
> processes in the system once the machine is frozen and about to
> suspend so we can see if we blocked in a lookup?

Yes.  Please add show_state() before the last "return" in freeze_processes().

On 2.6.23.1 you can test the freezer alone by doing

# echo testproc > /sys/power/disk
# echo disk > /sys/power/state

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-26 22:07           ` Rafael J. Wysocki
@ 2007-11-27 13:20             ` Tino Keitel
  2007-11-27 15:46               ` Rafael J. Wysocki
  2007-11-27 15:51             ` Rafael J. Wysocki
  1 sibling, 1 reply; 16+ messages in thread
From: Tino Keitel @ 2007-11-27 13:20 UTC (permalink / raw)
  To: linux-kernel

On Mon, Nov 26, 2007 at 23:07:56 +0100, Rafael J. Wysocki wrote:

[...]

> On 2.6.23.1 you can test the freezer alone by doing
> 
> # echo testproc > /sys/power/disk
> # echo disk > /sys/power/state

This is suspend to RAM, not to disk.

Regards,
Tino

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-27 13:20             ` Tino Keitel
@ 2007-11-27 15:46               ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2007-11-27 15:46 UTC (permalink / raw)
  To: Tino Keitel; +Cc: linux-kernel, David Chinner, xfs

On Tuesday, 27 of November 2007, Tino Keitel wrote:
> On Mon, Nov 26, 2007 at 23:07:56 +0100, Rafael J. Wysocki wrote:
> 
> [...]
> 
> > On 2.6.23.1 you can test the freezer alone by doing
> > 
> > # echo testproc > /sys/power/disk
> > # echo disk > /sys/power/state
> 
> This is suspend to RAM, not to disk.

I know. :-)

Nevertheless, this is how you can test the tasks freezer _without_ actually
doing a suspend of any kind.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-26 22:07           ` Rafael J. Wysocki
  2007-11-27 13:20             ` Tino Keitel
@ 2007-11-27 15:51             ` Rafael J. Wysocki
  2007-11-27 21:11               ` David Chinner
  1 sibling, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2007-11-27 15:51 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel, xfs

On Monday, 26 of November 2007, Rafael J. Wysocki wrote:
> On Monday, 26 of November 2007, David Chinner wrote:
> > On Mon, Nov 26, 2007 at 02:12:10PM +0100, Tino Keitel wrote:
> > > On Wed, Nov 14, 2007 at 10:04:45 +1100, David Chinner wrote:
> > > > On Tue, Nov 13, 2007 at 11:51:19AM +0100, Tino Keitel wrote:
> > > > > On Tue, Nov 13, 2007 at 09:27:20 +1100, David Chinner wrote:
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > No. I'd say something got screwed up during suspend/resume. Is it
> > > > > > reproducable?
> > > > > 
> > > > > No. I often use suspend to RAM, and usually it works without such
> > > > > failures. I restart squid during the resume prosecure, and the above
> > > > > Oops lead to a squid in D state.
> > > > 
> > > > Ok. Sounds like there's not much we can debug at this point. Thanks
> > > > for the report, though.
> > > 
> > > I got a similar Oops again:
> > > 
> > > xfs_iget_core: ambiguous vns: vp/0xc00700c0, invp/0xcb5a1680
> > 
> > Now there's a message that I haven't seen in about 3 years.
> > 
> > It indicates that the linux inode connected to the xfs_inode is not
> > the correct one. i.e. that the linux inode cache is out of step with
> > the XFS inode cache.
> > 
> > Basically, that is not supposed to happen. I suspect that the way
> > threads are frozen is resulting in an inode lookup racing with
> > a reclaim. The reclaim thread gets stopped after any use threads,
> > and so we could have the situation that a process blocked in lookup
> > has the XFS inode reclaimed and reused before it gets unblocked.
> > 
> > The question is why is it happening now when none of that code in
> > XFS has changed?
> > 
> > Rafael, when are threads frozen? Only when they schedule or call
> > try_to_freeze()?
> 
> Kernel threads freeze only when they call try_to_freeze().  User space tasks
> freeze while executing the signals handling code.
> 
> > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?
> 
> Yes.  Kernel threads are not sent fake signals by the freezer any more.

Ah, sorry, this change has been merged after 2.6.23.  However, before 2.6.23
we had another important change that caused all kernel threads to have
PF_NOFREEZE set by default, unless they call set_freezable() explicitly.

Greetings,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-27 15:51             ` Rafael J. Wysocki
@ 2007-11-27 21:11               ` David Chinner
  2007-11-27 21:53                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: David Chinner @ 2007-11-27 21:11 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: David Chinner, linux-kernel, xfs

On Tue, Nov 27, 2007 at 04:51:38PM +0100, Rafael J. Wysocki wrote:
> On Monday, 26 of November 2007, Rafael J. Wysocki wrote:
> > On Monday, 26 of November 2007, David Chinner wrote:
> > > Now there's a message that I haven't seen in about 3 years.
> > > 
> > > It indicates that the linux inode connected to the xfs_inode is not
> > > the correct one. i.e. that the linux inode cache is out of step with
> > > the XFS inode cache.
> > > 
> > > Basically, that is not supposed to happen. I suspect that the way
> > > threads are frozen is resulting in an inode lookup racing with
> > > a reclaim. The reclaim thread gets stopped after any use threads,
> > > and so we could have the situation that a process blocked in lookup
> > > has the XFS inode reclaimed and reused before it gets unblocked.
> > > 
> > > The question is why is it happening now when none of that code in
> > > XFS has changed?
> > > 
> > > Rafael, when are threads frozen? Only when they schedule or call
> > > try_to_freeze()?
> > 
> > Kernel threads freeze only when they call try_to_freeze().  User space tasks
> > freeze while executing the signals handling code.
> > 
> > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?
> > 
> > Yes.  Kernel threads are not sent fake signals by the freezer any more.
> 
> Ah, sorry, this change has been merged after 2.6.23.  However, before 2.6.23
> we had another important change that caused all kernel threads to have
> PF_NOFREEZE set by default, unless they call set_freezable() explicitly.

So try_to_freeze() will never freeze a thread if it has not been
set_freezable()? And xfsbufd will never be frozen?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-27 21:11               ` David Chinner
@ 2007-11-27 21:53                 ` Rafael J. Wysocki
  2007-11-29 21:05                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2007-11-27 21:53 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel, xfs, Tino Keitel

On Tuesday, 27 of November 2007, David Chinner wrote:
> On Tue, Nov 27, 2007 at 04:51:38PM +0100, Rafael J. Wysocki wrote:
> > On Monday, 26 of November 2007, Rafael J. Wysocki wrote:
> > > On Monday, 26 of November 2007, David Chinner wrote:
> > > > Now there's a message that I haven't seen in about 3 years.
> > > > 
> > > > It indicates that the linux inode connected to the xfs_inode is not
> > > > the correct one. i.e. that the linux inode cache is out of step with
> > > > the XFS inode cache.
> > > > 
> > > > Basically, that is not supposed to happen. I suspect that the way
> > > > threads are frozen is resulting in an inode lookup racing with
> > > > a reclaim. The reclaim thread gets stopped after any use threads,
> > > > and so we could have the situation that a process blocked in lookup
> > > > has the XFS inode reclaimed and reused before it gets unblocked.
> > > > 
> > > > The question is why is it happening now when none of that code in
> > > > XFS has changed?
> > > > 
> > > > Rafael, when are threads frozen? Only when they schedule or call
> > > > try_to_freeze()?
> > > 
> > > Kernel threads freeze only when they call try_to_freeze().  User space tasks
> > > freeze while executing the signals handling code.
> > > 
> > > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?
> > > 
> > > Yes.  Kernel threads are not sent fake signals by the freezer any more.
> > 
> > Ah, sorry, this change has been merged after 2.6.23.  However, before 2.6.23
> > we had another important change that caused all kernel threads to have
> > PF_NOFREEZE set by default, unless they call set_freezable() explicitly.
> 
> So try_to_freeze() will never freeze a thread if it has not been
> set_freezable()? And xfsbufd will never be frozen?

No, it won't.

I must have overlooked it, probably because it calls refrigerator() directly
and not try_to_freeze() ...

I think something like the appended patch will help, then.

Greetings,
Rafael


---
Fix breakage caused by commit 831441862956fffa17b9801db37e6ea1650b0f69
that did not introduce the necessary call to set_freezable() in
xfs/linux-2.6/xfs_buf.c .

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 fs/xfs/linux-2.6/xfs_buf.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
@@ -1750,6 +1750,8 @@ xfsbufd(
 
 	current->flags |= PF_MEMALLOC;
 
+	set_freezable();
+
 	do {
 		if (unlikely(freezing(current))) {
 			set_bit(XBT_FORCE_SLEEP, &target->bt_flags);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-29 21:05                   ` Rafael J. Wysocki
@ 2007-11-29 20:49                     ` Tino Keitel
  2007-11-29 22:13                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Tino Keitel @ 2007-11-29 20:49 UTC (permalink / raw)
  To: linux-kernel

On Thu, Nov 29, 2007 at 22:05:24 +0100, Rafael J. Wysocki wrote:

[...]

> Tino, can you check if this patch helps, please?

Not really. I suspend one to several times a day, and in most cases
resume works. I thing checking the patch is hard when I have no real
procedure to reproduce the resume failure.

Regards,
Tino

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-27 21:53                 ` Rafael J. Wysocki
@ 2007-11-29 21:05                   ` Rafael J. Wysocki
  2007-11-29 20:49                     ` Tino Keitel
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2007-11-29 21:05 UTC (permalink / raw)
  To: Tino Keitel; +Cc: David Chinner, linux-kernel, xfs

On Tuesday, 27 of November 2007, Rafael J. Wysocki wrote:
> On Tuesday, 27 of November 2007, David Chinner wrote:
> > On Tue, Nov 27, 2007 at 04:51:38PM +0100, Rafael J. Wysocki wrote:
> > > On Monday, 26 of November 2007, Rafael J. Wysocki wrote:
> > > > On Monday, 26 of November 2007, David Chinner wrote:
> > > > > Now there's a message that I haven't seen in about 3 years.
> > > > > 
> > > > > It indicates that the linux inode connected to the xfs_inode is not
> > > > > the correct one. i.e. that the linux inode cache is out of step with
> > > > > the XFS inode cache.
> > > > > 
> > > > > Basically, that is not supposed to happen. I suspect that the way
> > > > > threads are frozen is resulting in an inode lookup racing with
> > > > > a reclaim. The reclaim thread gets stopped after any use threads,
> > > > > and so we could have the situation that a process blocked in lookup
> > > > > has the XFS inode reclaimed and reused before it gets unblocked.
> > > > > 
> > > > > The question is why is it happening now when none of that code in
> > > > > XFS has changed?
> > > > > 
> > > > > Rafael, when are threads frozen? Only when they schedule or call
> > > > > try_to_freeze()?
> > > > 
> > > > Kernel threads freeze only when they call try_to_freeze().  User space tasks
> > > > freeze while executing the signals handling code.
> > > > 
> > > > > Did the freezer mechanism change in 2.6.23 (this is on 2.6.23.1)?
> > > > 
> > > > Yes.  Kernel threads are not sent fake signals by the freezer any more.
> > > 
> > > Ah, sorry, this change has been merged after 2.6.23.  However, before 2.6.23
> > > we had another important change that caused all kernel threads to have
> > > PF_NOFREEZE set by default, unless they call set_freezable() explicitly.
> > 
> > So try_to_freeze() will never freeze a thread if it has not been
> > set_freezable()? And xfsbufd will never be frozen?
> 
> No, it won't.
> 
> I must have overlooked it, probably because it calls refrigerator() directly
> and not try_to_freeze() ...
> 
> I think something like the appended patch will help, then.

Tino, can you check if this patch helps, please?

Greetings,
Rafael


> ---
> Fix breakage caused by commit 831441862956fffa17b9801db37e6ea1650b0f69
> that did not introduce the necessary call to set_freezable() in
> xfs/linux-2.6/xfs_buf.c .
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  fs/xfs/linux-2.6/xfs_buf.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
> @@ -1750,6 +1750,8 @@ xfsbufd(
>  
>  	current->flags |= PF_MEMALLOC;
>  
> +	set_freezable();
> +
>  	do {
>  		if (unlikely(freezing(current))) {
>  			set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XFS related Oops (suspend/resume related)
  2007-11-29 20:49                     ` Tino Keitel
@ 2007-11-29 22:13                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2007-11-29 22:13 UTC (permalink / raw)
  To: Tino Keitel, David Chinner; +Cc: linux-kernel

On Thursday, 29 of November 2007, Tino Keitel wrote:
> On Thu, Nov 29, 2007 at 22:05:24 +0100, Rafael J. Wysocki wrote:
> 
> [...]
> 
> > Tino, can you check if this patch helps, please?
> 
> Not really. I suspend one to several times a day, and in most cases
> resume works. I thing checking the patch is hard when I have no real
> procedure to reproduce the resume failure.

I see.

Still, I think that the patch is needed.

David, are you going to take it or do you want me to push it upstream?

Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-11-29 21:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-12  6:47 XFS related Oops Tino Keitel
2007-11-12 22:27 ` David Chinner
2007-11-13 10:51   ` Tino Keitel
2007-11-13 23:04     ` David Chinner
2007-11-26 13:07       ` Tino Keitel
2007-11-26 13:12       ` Tino Keitel
2007-11-26 21:08         ` XFS related Oops (suspend/resume related) David Chinner
2007-11-26 22:07           ` Rafael J. Wysocki
2007-11-27 13:20             ` Tino Keitel
2007-11-27 15:46               ` Rafael J. Wysocki
2007-11-27 15:51             ` Rafael J. Wysocki
2007-11-27 21:11               ` David Chinner
2007-11-27 21:53                 ` Rafael J. Wysocki
2007-11-29 21:05                   ` Rafael J. Wysocki
2007-11-29 20:49                     ` Tino Keitel
2007-11-29 22:13                       ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).