LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* XFS or Kernel Problem / Bug
       [not found]                   ` <20060802201805.A2360409@wobbly.melbourne.sgi.com>
@ 2007-01-21 12:30                     ` Stefan Priebe - FH
  2007-01-22  6:18                       ` David Chinner
  2007-01-23 19:49                       ` Chuck Ebbert
  0 siblings, 2 replies; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-21 12:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: stefan

Hello!

I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:

"general protection fault: 0000 [#1]"
"Modules linked in:"
"CPU:    0"
"EIP:    0060:[<c01c8fd2>]    Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b"
"eax: 00000000   ebx: fffe0007   ecx: 0071a4cd   edx: 00000000"
"esi: 00000000   edi: 00000000   ebp: 00000015   esp: ce35f8f0"
"ds: 0000   es: 007b   ss: 0068"
"Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)"
"Stack: 00000232 00000000 00000233 00000000 00000000 00000000 0000000c
00000000 "
"       00000007 00000000 eca90250 eca90278 00000001 eca90200 00000000
000003c3 "
"       00000000 010003c3 ffffffc0 ce35fa58 ce35fa58 00000001 00000000
00000000 "
"Call Trace:"
" [<c01b6c58>] xfs_trans_dqresv+0x3f9/0x405"
" [<c01c6485>] xfs_bmap_add_extent+0x163/0x377"
" [<c01cd2c3>] xfs_bmapi+0xa4e/0x1109"
" [<c01ebbe3>] xfs_iomap_write_delay+0x233/0x2fa"
" [<c01eaa31>] xfs_imap_to_bmap+0x29/0x1d6"
" [<c01eae1a>] xfs_iomap+0x23c/0x3e1"
" [<c01eaebe>] xfs_iomap+0x2e0/0x3e1"
" [<c020a71a>] xfs_bmap+0x1a/0x1e"
" [<c020471e>] __xfs_get_blocks+0x5d/0x195"


and sometimes this one:

"BUG: unable to handle kernel NULL pointer dereference at virtual
address 00000288"
" printing eip:"
"c0142ff7"
"*pde = 00000000"
"Oops: 0000 [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:    0"
"EIP:    0060:[<c0142ff7>]    Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax: 00000000   ebx: 000001ec   ecx: ea029a40   edx: 00008002"
"esi: 00000000   edi: e3b28c9c   ebp: 000001ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 00000001 00000010 000001fc c036d793 000001fc c14765c0
00000010 "
"       080d404c 000001ec e3b28c9c c03e78c0 e3b28d44 ea029a40 000001fc
00000000 "
"       00000000 000001ec dd04beac 00d420b1 00000000 00000000 dd04bd80
45b1fa67 "
"Call Trace:"
" [<c036d793>] sock_def_readable+0x7f/0x81"
" [<c017a03a>] file_update_time+0xad/0xcb"
" [<c0232015>] xfs_iunlock+0x55/0x9f"
" [<c0262eeb>] xfs_write+0xa74/0xc61"
" [<c036a253>] sock_aio_read+0x95/0x99"
" [<c025d9fb>] xfs_file_aio_write+0x8f/0xa0"
" [<c015fb94>] do_sync_write+0xc9/0x10f"
" [<c0133ad6>] autoremove_wake_function+0x0/0x57"
" [<c015f3d5>] generic_file_llseek+0x95/0xbc"
" [<c015facb>] do_sync_write+0x0/0x10f"
" [<c015fc80>] vfs_write+0xa6/0x179"
" [<c015fe24>] sys_write+0x51/0x80"
" [<c0102d3f>] syscall_call+0x7/0xb"

"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "

"EIP: [<c0142ff7>] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18"

Stefan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-21 12:30                     ` XFS or Kernel Problem / Bug Stefan Priebe - FH
@ 2007-01-22  6:18                       ` David Chinner
  2007-01-22  7:51                         ` Stefan Priebe - FH
  2007-01-23 19:49                       ` Chuck Ebbert
  1 sibling, 1 reply; 21+ messages in thread
From: David Chinner @ 2007-01-22  6:18 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: linux-kernel, stefan

On Sun, Jan 21, 2007 at 01:30:15PM +0100, Stefan Priebe - FH wrote:
> Hello!
> 
> I've 3 Servers which works wonderful with 2.6.16.X (also testet the
> latest 2.6.16.37)
> 
> but with 2.6.18.6 i get these errors:

[ EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b ]
[ EIP is at generic_file_buffered_write+0x390/0x6cf ]

Do you have a reproducable test case for these? if not,
do you have any idea what is going on in the system at the time
of the failure?

Can you describe the storage subsystem you are using and post the
output of xfs_growfs -n <mntpt> on the filesystem that is causing
problems?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-22  6:18                       ` David Chinner
@ 2007-01-22  7:51                         ` Stefan Priebe - FH
  2007-01-22  8:03                           ` David Chinner
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-22  7:51 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are "old" Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines. Other P4 Machines with a Tyan Mainboard or a 
Gigabyte Mainboard are not affected. All 300 machines runs the same 
Debian 3.0 with self build kernel. Some of these 5 use a 3ware 
controller and some of them the mainboardcontroller. All systems are 
using IDE.

But i cannot say what happens to these machines at the time of failure. 
Sometimes these servers crashed directly after a few minutes. Sometimes 
they run about 2-3 days... i've now downgraded all servers to 2.6.16.37. 
Cause they are production machines... but i have one machine where we 
can test - if you need something.

Here is the output running 2.6.16.37 at the moment:
xfs_growfs -n /

meta-data=/dev/root              isize=256    agcount=16, agsize=603855 blks
          =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=9661680, imaxpct=25
          =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=4717, version=1
          =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

Stefan

David Chinner schrieb:
> On Sun, Jan 21, 2007 at 01:30:15PM +0100, Stefan Priebe - FH wrote:
>> Hello!
>>
>> I've 3 Servers which works wonderful with 2.6.16.X (also testet the
>> latest 2.6.16.37)
>>
>> but with 2.6.18.6 i get these errors:
> 
> [ EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b ]
> [ EIP is at generic_file_buffered_write+0x390/0x6cf ]
> 
> Do you have a reproducable test case for these? if not,
> do you have any idea what is going on in the system at the time
> of the failure?
> 
> Can you describe the storage subsystem you are using and post the
> output of xfs_growfs -n <mntpt> on the filesystem that is causing
> problems?
> 
> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-22  7:51                         ` Stefan Priebe - FH
@ 2007-01-22  8:03                           ` David Chinner
  2007-01-22  8:07                             ` Stefan Priebe - FH
  2007-01-22  9:42                             ` Stefan Priebe - FH
  0 siblings, 2 replies; 21+ messages in thread
From: David Chinner @ 2007-01-22  8:03 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: David Chinner, linux-kernel

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:
> Hi!
> 
> I'm  not shure but perhaps it isn't an XFS Bug.
> 
> Here is what i find out:
> 
> We've about 300 servers at the momentan and 5 of them are "old" Intel 
> Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
> happens on THESE Machines.

Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious....

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-22  8:03                           ` David Chinner
@ 2007-01-22  8:07                             ` Stefan Priebe - FH
  2007-01-22  9:42                             ` Stefan Priebe - FH
  1 sibling, 0 replies; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-22  8:07 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

Hi!

The update of the IDE layer was in 2.6.19. I don't think it is a 
hardware bug cause all these 5 machines runs fine since a few years with 
2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
machines began to crash periodically. On friday last week we downgraded 
them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
believe it is a hardware problem. Do you really think that could be?

Stefan

David Chinner schrieb:
> On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:
>> Hi!
>>
>> I'm  not shure but perhaps it isn't an XFS Bug.
>>
>> Here is what i find out:
>>
>> We've about 300 servers at the momentan and 5 of them are "old" Intel 
>> Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
>> happens on THESE Machines.
> 
> Hmmm - that points more to a hardware problem than a software problem;
> crashes in generic_file_buffered_write() are relatively uncommon, and
> to have them all isolated to a specific type of hardware is suspicious....
> 
> Wasn't there a major update of the IDE layer in 2.6.18? or was that
> 2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
> boxes to rule out dodgy memory?
> 
> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-22  8:03                           ` David Chinner
  2007-01-22  8:07                             ` Stefan Priebe - FH
@ 2007-01-22  9:42                             ` Stefan Priebe - FH
  2007-01-23  1:10                               ` David Chinner
  1 sibling, 1 reply; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-22  9:42 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...

Stefan

David Chinner schrieb:
> On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:
>> Hi!
>>
>> I'm  not shure but perhaps it isn't an XFS Bug.
>>
>> Here is what i find out:
>>
>> We've about 300 servers at the momentan and 5 of them are "old" Intel 
>> Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
>> happens on THESE Machines.
> 
> Hmmm - that points more to a hardware problem than a software problem;
> crashes in generic_file_buffered_write() are relatively uncommon, and
> to have them all isolated to a specific type of hardware is suspicious....
> 
> Wasn't there a major update of the IDE layer in 2.6.18? or was that
> 2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
> boxes to rule out dodgy memory?
> 
> Cheers,
> 
> Dave.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-22  9:42                             ` Stefan Priebe - FH
@ 2007-01-23  1:10                               ` David Chinner
  2007-01-23  8:31                                 ` Stefan Priebe - FH
  0 siblings, 1 reply; 21+ messages in thread
From: David Chinner @ 2007-01-23  1:10 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: David Chinner, linux-kernel

On Mon, Jan 22, 2007 at 09:07:23AM +0100, Stefan Priebe - FH wrote:
> Hi!
> 
> The update of the IDE layer was in 2.6.19. I don't think it is a 
> hardware bug cause all these 5 machines runs fine since a few years with 
> 2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
> machines began to crash periodically. On friday last week we downgraded 
> them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
> believe it is a hardware problem. Do you really think that could be?

I was thinking more of a driver change that is being triggered on
that particular hardware. FWIW, did you test 2.6.19?

I really need a better idea of the workload these servers are running
and, ideally, a reproducable test case to track something like
this down. At the moment I have no idea what is going on and no
real information on which to even base a guess.

Were there any other messages in the log?

On Mon, Jan 22, 2007 at 10:42:36AM +0100, Stefan Priebe - FH wrote:
> Hi!
> 
> I've another idea... could it be, that it is a barrier problem? Since 
> barriers are enabled by default from 2.6.17 on ...

You could try turning it off. If it does fix the problem, then I'd be
pointing once again at hardware ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-23  1:10                               ` David Chinner
@ 2007-01-23  8:31                                 ` Stefan Priebe - FH
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-23  8:31 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

Hi!

I can give you an idea of the workload :-) I have the same problem on an 
nearly idle Server. There runs only a few cronjobs (normal Debian System 
crons).

The load was not higher than 0.01 on this system the last 3 days and 
this morning it crashes with the same error.

I've not tested 2.6.19.x cause this one has some problems with SATA AHCI 
driver which we need. But i can manuelly update only this system with 
2.6.19.x and wait some days.

There were no other messages in the log.

Cheers,
    Stefan

David Chinner schrieb:
> On Mon, Jan 22, 2007 at 09:07:23AM +0100, Stefan Priebe - FH wrote:
>> Hi!
>>
>> The update of the IDE layer was in 2.6.19. I don't think it is a 
>> hardware bug cause all these 5 machines runs fine since a few years with 
>> 2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
>> machines began to crash periodically. On friday last week we downgraded 
>> them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
>> believe it is a hardware problem. Do you really think that could be?
> 
> I was thinking more of a driver change that is being triggered on
> that particular hardware. FWIW, did you test 2.6.19?
> 
> I really need a better idea of the workload these servers are running
> and, ideally, a reproducable test case to track something like
> this down. At the moment I have no idea what is going on and no
> real information on which to even base a guess.
> 
> Were there any other messages in the log?
> 
> On Mon, Jan 22, 2007 at 10:42:36AM +0100, Stefan Priebe - FH wrote:
>> Hi!
>>
>> I've another idea... could it be, that it is a barrier problem? Since 
>> barriers are enabled by default from 2.6.17 on ...
> 
> You could try turning it off. If it does fix the problem, then I'd be
> pointing once again at hardware ;)
> 
> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-21 12:30                     ` XFS or Kernel Problem / Bug Stefan Priebe - FH
  2007-01-22  6:18                       ` David Chinner
@ 2007-01-23 19:49                       ` Chuck Ebbert
  2007-01-24  7:40                         ` Stefan Priebe - FH
  1 sibling, 1 reply; 21+ messages in thread
From: Chuck Ebbert @ 2007-01-23 19:49 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: linux-kernel, stefan, David Chinner, Chuck Ebbert

Stefan Priebe - FH wrote:
> I've 3 Servers which works wonderful with 2.6.16.X (also testet the
> latest 2.6.16.37)
>
> but with 2.6.18.6 i get these errors:
>
> "general protection fault: 0000 [#1]"
> "Modules linked in:"
> "CPU:    0"
> "EIP:    0060:[<c01c8fd2>]    Not tainted VLI"
> "EFLAGS: 00010246   (2.6.18.6 #1) "
> "EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b"
> "eax: 00000000   ebx: fffe0007   ecx: 0071a4cd   edx: 00000000"
> "esi: 00000000   edi: 00000000   ebp: 00000015   esp: ce35f8f0"
> "ds: 0000   es: 007b   ss: 0068"
> "Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)"
> "Stack: 00000232 00000000 00000233 00000000 00000000 00000000 0000000c
> 00000000 "
> "       00000007 00000000 eca90250 eca90278 00000001 eca90200 00000000
> 000003c3 "
> "       00000000 010003c3 ffffffc0 ce35fa58 ce35fa58 00000001 00000000
> 00000000 "
> "Call Trace:"
> " [<c01b6c58>] xfs_trans_dqresv+0x3f9/0x405"
> " [<c01c6485>] xfs_bmap_add_extent+0x163/0x377"
> " [<c01cd2c3>] xfs_bmapi+0xa4e/0x1109"
> " [<c01ebbe3>] xfs_iomap_write_delay+0x233/0x2fa"
> " [<c01eaa31>] xfs_imap_to_bmap+0x29/0x1d6"
> " [<c01eae1a>] xfs_iomap+0x23c/0x3e1"
> " [<c01eaebe>] xfs_iomap+0x2e0/0x3e1"
> " [<c020a71a>] xfs_bmap+0x1a/0x1e"
> " [<c020471e>] __xfs_get_blocks+0x5d/0x195"
Without the "Code:" line it's hard to tell what happened...
>
>
> and sometimes this one:
>
> "BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000288"
> " printing eip:"
> "c0142ff7"
> "*pde = 00000000"
> "Oops: 0000 [#1]"
> "SMP "
> "Modules linked in: iptable_filter ip_tables x_tables"
> "CPU:    0"
> "EIP:    0060:[<c0142ff7>]    Not tainted VLI"
> "EFLAGS: 00010246   (2.6.18.6 #1) "
> "EIP is at generic_file_buffered_write+0x390/0x6cf"
> "eax: 00000000   ebx: 000001ec   ecx: ea029a40   edx: 00008002"
> "esi: 00000000   edi: e3b28c9c   ebp: 000001ec   esp: dd04bd18"
> "ds: 007b   es: 007b   ss: 0068"
> "Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
> "Stack: e3b28d44 00000001 00000010 000001fc c036d793 000001fc c14765c0
> 00000010 "
> "       080d404c 000001ec e3b28c9c c03e78c0 e3b28d44 ea029a40 000001fc
> 00000000 "
> "       00000000 000001ec dd04beac 00d420b1 00000000 00000000 dd04bd80
> 45b1fa67 "
> "Call Trace:"
> " [<c036d793>] sock_def_readable+0x7f/0x81"
> " [<c017a03a>] file_update_time+0xad/0xcb"
> " [<c0232015>] xfs_iunlock+0x55/0x9f"
> " [<c0262eeb>] xfs_write+0xa74/0xc61"
> " [<c036a253>] sock_aio_read+0x95/0x99"
> " [<c025d9fb>] xfs_file_aio_write+0x8f/0xa0"
> " [<c015fb94>] do_sync_write+0xc9/0x10f"
> " [<c0133ad6>] autoremove_wake_function+0x0/0x57"
> " [<c015f3d5>] generic_file_llseek+0x95/0xbc"
> " [<c015facb>] do_sync_write+0x0/0x10f"
> " [<c015fc80>] vfs_write+0xa6/0x179"
> " [<c015fe24>] sys_write+0x51/0x80"
> " [<c0102d3f>] syscall_call+0x7/0xb"
>
> "Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
> 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85
> 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
>
> "EIP: [<c0142ff7>] generic_file_buffered_write+0x390/0x6cf SS:ESP
> 0068:dd04bd18"
>
Well that's strange. It's here in mm/filemap.c line 2201:

        /*
         * For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC
         */
        if (likely(status >= 0)) {
                if (unlikely((file->f_flags & O_SYNC) ||
IS_SYNC(inode))) { <===
                        if (!a_ops->writepage || !is_sync_kiocb(iocb))
                                status = generic_osync_inode(inode, mapping,
                                                OSYNC_METADATA|OSYNC_DATA);
                }
        }

ebp holds the value of 'inode' and it's obviously wrong (it's also the same
as 'written', which is in ebx.) So when it tries to read inode->i_sb, it
dies.

If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-23 19:49                       ` Chuck Ebbert
@ 2007-01-24  7:40                         ` Stefan Priebe - FH
  2007-01-24 14:57                           ` Chuck Ebbert
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-24  7:40 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, stefan, David Chinner

Hi!

I do everything you like :-) if we can find the bug.

So here are the files (2.6.18.6):
http://server055.de-nserver.de/filemap.o
http://server055.de-nserver.de/filemap.s

Stefan


Chuck Ebbert schrieb:
> Stefan Priebe - FH wrote:
> 
>>I've 3 Servers which works wonderful with 2.6.16.X (also testet the
>>latest 2.6.16.37)
>>
>>but with 2.6.18.6 i get these errors:
>>
>>"general protection fault: 0000 [#1]"
>>"Modules linked in:"
>>"CPU:    0"
>>"EIP:    0060:[<c01c8fd2>]    Not tainted VLI"
>>"EFLAGS: 00010246   (2.6.18.6 #1) "
>>"EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b"
>>"eax: 00000000   ebx: fffe0007   ecx: 0071a4cd   edx: 00000000"
>>"esi: 00000000   edi: 00000000   ebp: 00000015   esp: ce35f8f0"
>>"ds: 0000   es: 007b   ss: 0068"
>>"Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)"
>>"Stack: 00000232 00000000 00000233 00000000 00000000 00000000 0000000c
>>00000000 "
>>"       00000007 00000000 eca90250 eca90278 00000001 eca90200 00000000
>>000003c3 "
>>"       00000000 010003c3 ffffffc0 ce35fa58 ce35fa58 00000001 00000000
>>00000000 "
>>"Call Trace:"
>>" [<c01b6c58>] xfs_trans_dqresv+0x3f9/0x405"
>>" [<c01c6485>] xfs_bmap_add_extent+0x163/0x377"
>>" [<c01cd2c3>] xfs_bmapi+0xa4e/0x1109"
>>" [<c01ebbe3>] xfs_iomap_write_delay+0x233/0x2fa"
>>" [<c01eaa31>] xfs_imap_to_bmap+0x29/0x1d6"
>>" [<c01eae1a>] xfs_iomap+0x23c/0x3e1"
>>" [<c01eaebe>] xfs_iomap+0x2e0/0x3e1"
>>" [<c020a71a>] xfs_bmap+0x1a/0x1e"
>>" [<c020471e>] __xfs_get_blocks+0x5d/0x195"
> 
> Without the "Code:" line it's hard to tell what happened...
> 
>>
>>and sometimes this one:
>>
>>"BUG: unable to handle kernel NULL pointer dereference at virtual
>>address 00000288"
>>" printing eip:"
>>"c0142ff7"
>>"*pde = 00000000"
>>"Oops: 0000 [#1]"
>>"SMP "
>>"Modules linked in: iptable_filter ip_tables x_tables"
>>"CPU:    0"
>>"EIP:    0060:[<c0142ff7>]    Not tainted VLI"
>>"EFLAGS: 00010246   (2.6.18.6 #1) "
>>"EIP is at generic_file_buffered_write+0x390/0x6cf"
>>"eax: 00000000   ebx: 000001ec   ecx: ea029a40   edx: 00008002"
>>"esi: 00000000   edi: e3b28c9c   ebp: 000001ec   esp: dd04bd18"
>>"ds: 007b   es: 007b   ss: 0068"
>>"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
>>"Stack: e3b28d44 00000001 00000010 000001fc c036d793 000001fc c14765c0
>>00000010 "
>>"       080d404c 000001ec e3b28c9c c03e78c0 e3b28d44 ea029a40 000001fc
>>00000000 "
>>"       00000000 000001ec dd04beac 00d420b1 00000000 00000000 dd04bd80
>>45b1fa67 "
>>"Call Trace:"
>>" [<c036d793>] sock_def_readable+0x7f/0x81"
>>" [<c017a03a>] file_update_time+0xad/0xcb"
>>" [<c0232015>] xfs_iunlock+0x55/0x9f"
>>" [<c0262eeb>] xfs_write+0xa74/0xc61"
>>" [<c036a253>] sock_aio_read+0x95/0x99"
>>" [<c025d9fb>] xfs_file_aio_write+0x8f/0xa0"
>>" [<c015fb94>] do_sync_write+0xc9/0x10f"
>>" [<c0133ad6>] autoremove_wake_function+0x0/0x57"
>>" [<c015f3d5>] generic_file_llseek+0x95/0xbc"
>>" [<c015facb>] do_sync_write+0x0/0x10f"
>>" [<c015fc80>] vfs_write+0xa6/0x179"
>>" [<c015fe24>] sys_write+0x51/0x80"
>>" [<c0102d3f>] syscall_call+0x7/0xb"
>>
>>"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
>>88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85
>>9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
>>
>>"EIP: [<c0142ff7>] generic_file_buffered_write+0x390/0x6cf SS:ESP
>>0068:dd04bd18"
>>
> 
> Well that's strange. It's here in mm/filemap.c line 2201:
> 
>         /*
>          * For now, when the user asks for O_SYNC, we'll actually give
> O_DSYNC
>          */
>         if (likely(status >= 0)) {
>                 if (unlikely((file->f_flags & O_SYNC) ||
> IS_SYNC(inode))) { <===
>                         if (!a_ops->writepage || !is_sync_kiocb(iocb))
>                                 status = generic_osync_inode(inode, mapping,
>                                                 OSYNC_METADATA|OSYNC_DATA);
>                 }
>         }
> 
> ebp holds the value of 'inode' and it's obviously wrong (it's also the same
> as 'written', which is in ebx.) So when it tries to read inode->i_sb, it
> dies.
> 
> If you can, post the file mm/filemap.o from your build directory to some
> website.
> And do 'make mm/filemap.s' and post that file too.
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-24  7:40                         ` Stefan Priebe - FH
@ 2007-01-24 14:57                           ` Chuck Ebbert
  2007-01-24 15:03                             ` Stefan Priebe - FH
  0 siblings, 1 reply; 21+ messages in thread
From: Chuck Ebbert @ 2007-01-24 14:57 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: linux-kernel, stefan, David Chinner

Stefan Priebe - FH wrote:
> Hi!
>
> I do everything you like :-) if we can find the bug.
>
> So here are the files (2.6.18.6):
> http://server055.de-nserver.de/filemap.o
> http://server055.de-nserver.de/filemap.s
>
>>
>> If you can, post the file mm/filemap.o from your build directory to some
>> website.
>> And do 'make mm/filemap.s' and post that file too.
>>
>
That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-24 14:57                           ` Chuck Ebbert
@ 2007-01-24 15:03                             ` Stefan Priebe - FH
  2007-01-24 15:13                               ` Chuck Ebbert
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-24 15:03 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, stefan, David Chinner

Hi!

It could be, that the options are now different - cause i my first try 
was to change the kernel options - if that did not help i switched back 
to 2.6.16.37.

Any idea what i can do?

Stefan

Chuck Ebbert schrieb:
> Stefan Priebe - FH wrote:
>> Hi!
>>
>> I do everything you like :-) if we can find the bug.
>>
>> So here are the files (2.6.18.6):
>> http://server055.de-nserver.de/filemap.o
>> http://server055.de-nserver.de/filemap.s
>>
>>> If you can, post the file mm/filemap.o from your build directory to some
>>> website.
>>> And do 'make mm/filemap.s' and post that file too.
>>>
> That doesn't match your oops at all.  Did you use a different compiler
> and/or
> different kernel build options?
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-24 15:03                             ` Stefan Priebe - FH
@ 2007-01-24 15:13                               ` Chuck Ebbert
  2007-01-24 15:34                                 ` Stefan Priebe - FH
  0 siblings, 1 reply; 21+ messages in thread
From: Chuck Ebbert @ 2007-01-24 15:13 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: linux-kernel, stefan, David Chinner

Stefan Priebe - FH wrote:
> It could be, that the options are now different - cause i my first try
> was to change the kernel options - if that did not help i switched
> back to 2.6.16.37.
>
> Any idea what i can do?
>
> Chuck Ebbert schrieb:
>> That doesn't match your oops at all.  Did you use a different compiler
>> and/or
>> different kernel build options?
>>
>>
>
If you don't know what changed you can try different options until the
filemap.s
is the same.  You should see

        movl   156(%ebp),%eax
        testb   16, 48(%eax)


in generic_file_buffered_write.  And you need to regenerate filemap.s
manually
each time.


(Did you test the kernel that you posted these pieces from? If you can
get it to oops
the same way, just post that instead.)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-24 15:13                               ` Chuck Ebbert
@ 2007-01-24 15:34                                 ` Stefan Priebe - FH
  2007-01-24 16:51                                   ` Chuck Ebbert
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-24 15:34 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, stefan, David Chinner

Hi!

Sorry that is not possible - cause it is a production machine.

But i've catched the error and the files from another machine - perhaps 
this helps.

"BUG: unable to handle kernel NULL pointer dereference at virtual 
address 00000288"
" printing eip:"
"c0142ff7"
"*pde = 00000000"
"Oops: 0000 [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:    0"
"EIP:    0060:[<c0142ff7>]    Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax: 00000000   ebx: 000001ec   ecx: ea029a40   edx: 00008002"
"esi: 00000000   edi: e3b28c9c   ebp: 000001ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 00000001 00000010 000001fc c036d793 000001fc c14765c0 
00000010 "
"       080d404c 000001ec e3b28c9c c03e78c0 e3b28d44 ea029a40 000001fc 
00000000 "
"       00000000 000001ec dd04beac 00d420b1 00000000 00000000 dd04bd80 
45b1fa67 "
"Call Trace:"
" [<c036d793>] sock_def_readable+0x7f/0x81"
" [<c017a03a>] file_update_time+0xad/0xcb"
" [<c0232015>] xfs_iunlock+0x55/0x9f"
" [<c0262eeb>] xfs_write+0xa74/0xc61"
" [<c036a253>] sock_aio_read+0x95/0x99"
" [<c025d9fb>] xfs_file_aio_write+0x8f/0xa0"
" [<c015fb94>] do_sync_write+0xc9/0x10f"
" [<c0133ad6>] autoremove_wake_function+0x0/0x57"
" [<c015f3d5>] generic_file_llseek+0x95/0xbc"
" [<c015facb>] do_sync_write+0x0/0x10f"
" [<c015fc80>] vfs_write+0xa6/0x179"
" [<c015fe24>] sys_write+0x51/0x80"
" [<c0102d3f>] syscall_call+0x7/0xb"
"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f 
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85 
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
"EIP: [<c0142ff7>] generic_file_buffered_write+0x390/0x6cf SS:ESP 
0068:dd04bd18"

Files:
http://server113-han.de-nserver.de/filemap.s
http://server113-han.de-nserver.de/filemap.o

Stefan

Chuck Ebbert schrieb:
> Stefan Priebe - FH wrote:
>> It could be, that the options are now different - cause i my first try
>> was to change the kernel options - if that did not help i switched
>> back to 2.6.16.37.
>>
>> Any idea what i can do?
>>
>> Chuck Ebbert schrieb:
>>> That doesn't match your oops at all.  Did you use a different compiler
>>> and/or
>>> different kernel build options?
>>>
>>>
> If you don't know what changed you can try different options until the
> filemap.s
> is the same.  You should see
> 
>         movl   156(%ebp),%eax
>         testb   16, 48(%eax)
> 
> 
> in generic_file_buffered_write.  And you need to regenerate filemap.s
> manually
> each time.
> 
> 
> (Did you test the kernel that you posted these pieces from? If you can
> get it to oops
> the same way, just post that instead.)
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-24 15:34                                 ` Stefan Priebe - FH
@ 2007-01-24 16:51                                   ` Chuck Ebbert
  2007-01-24 17:16                                     ` Stefan Priebe - FH
  0 siblings, 1 reply; 21+ messages in thread
From: Chuck Ebbert @ 2007-01-24 16:51 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: linux-kernel, stefan, David Chinner

Stefan Priebe - FH wrote:
> Sorry that is not possible - cause it is a production machine.
>
> But i've catched the error and the files from another machine -
> perhaps this helps.
>
> "BUG: unable to handle kernel NULL pointer dereference at virtual
> address 00000288"
> " printing eip:"
> "c0142ff7"
> "*pde = 00000000"
> "Oops: 0000 [#1]"
> "SMP "
> "Modules linked in: iptable_filter ip_tables x_tables"
> "CPU:    0"
> "EIP:    0060:[<c0142ff7>]    Not tainted VLI"
> "EFLAGS: 00010246   (2.6.18.6 #1) "
> "EIP is at generic_file_buffered_write+0x390/0x6cf"
> "eax: 00000000   ebx: 000001ec   ecx: ea029a40   edx: 00008002"
> "esi: 00000000   edi: e3b28c9c   ebp: 000001ec   esp: dd04bd18"
> "ds: 007b   es: 007b   ss: 0068"
> "Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
> "Stack: e3b28d44 00000001 00000010 000001fc c036d793 000001fc c14765c0
> 00000010 "
> "       080d404c 000001ec e3b28c9c c03e78c0 e3b28d44 ea029a40 000001fc
> 00000000 "
> "       00000000 000001ec dd04beac 00d420b1 00000000 00000000 dd04bd80
> 45b1fa67 "
> "Call Trace:"
> " [<c036d793>] sock_def_readable+0x7f/0x81"
> " [<c017a03a>] file_update_time+0xad/0xcb"
> " [<c0232015>] xfs_iunlock+0x55/0x9f"
> " [<c0262eeb>] xfs_write+0xa74/0xc61"
> " [<c036a253>] sock_aio_read+0x95/0x99"
> " [<c025d9fb>] xfs_file_aio_write+0x8f/0xa0"
> " [<c015fb94>] do_sync_write+0xc9/0x10f"
> " [<c0133ad6>] autoremove_wake_function+0x0/0x57"
> " [<c015f3d5>] generic_file_llseek+0x95/0xbc"
> " [<c015facb>] do_sync_write+0x0/0x10f"
> " [<c015fc80>] vfs_write+0xa6/0x179"
> " [<c015fe24>] sys_write+0x51/0x80"
> " [<c0102d3f>] syscall_call+0x7/0xb"
> "Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db
> 0f 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b>
> 85 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
> "EIP: [<c0142ff7>] generic_file_buffered_write+0x390/0x6cf SS:ESP
> 0068:dd04bd18"
>
> Files:
> http://server113-han.de-nserver.de/filemap.s
> http://server113-han.de-nserver.de/filemap.o
>
You seem to have some kind of hardware/memory problem.

Disassembly of the failing instruction from the oops:

     8b 7c 24 28               mov    0x28(%esp),%edi
     8b 85 9c 00 00 00         mov    0x9c(%ebp),%eax   <=====

Dump of the object code:

           8b 7c 24 28             mov    0x28(%esp),%edi
           8b 87 9c 00 00 00       mov    0x9c(%edi),%eax

Looks like a bit is flipped.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-24 16:51                                   ` Chuck Ebbert
@ 2007-01-24 17:16                                     ` Stefan Priebe - FH
  2007-01-24 17:56                                       ` Chuck Ebbert
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-24 17:16 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, David Chinner

Hi!

Mhm are you shure? I mean i have this problem on 5 servers - all with 
the same mainboard. I cannot believe, that all 5 servers have a hardware 
problem that starts on the same day.

The other thing is - that they all work fine with 2.6.16.x and all other 
kernels before. I mean some of them were used with 2.6.x since two years 
without any problem...

Stefan


Chuck Ebbert schrieb:
> Stefan Priebe - FH wrote:
> 
>>Sorry that is not possible - cause it is a production machine.
>>
>>But i've catched the error and the files from another machine -
>>perhaps this helps.
>>
>>"BUG: unable to handle kernel NULL pointer dereference at virtual
>>address 00000288"
>>" printing eip:"
>>"c0142ff7"
>>"*pde = 00000000"
>>"Oops: 0000 [#1]"
>>"SMP "
>>"Modules linked in: iptable_filter ip_tables x_tables"
>>"CPU:    0"
>>"EIP:    0060:[<c0142ff7>]    Not tainted VLI"
>>"EFLAGS: 00010246   (2.6.18.6 #1) "
>>"EIP is at generic_file_buffered_write+0x390/0x6cf"
>>"eax: 00000000   ebx: 000001ec   ecx: ea029a40   edx: 00008002"
>>"esi: 00000000   edi: e3b28c9c   ebp: 000001ec   esp: dd04bd18"
>>"ds: 007b   es: 007b   ss: 0068"
>>"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
>>"Stack: e3b28d44 00000001 00000010 000001fc c036d793 000001fc c14765c0
>>00000010 "
>>"       080d404c 000001ec e3b28c9c c03e78c0 e3b28d44 ea029a40 000001fc
>>00000000 "
>>"       00000000 000001ec dd04beac 00d420b1 00000000 00000000 dd04bd80
>>45b1fa67 "
>>"Call Trace:"
>>" [<c036d793>] sock_def_readable+0x7f/0x81"
>>" [<c017a03a>] file_update_time+0xad/0xcb"
>>" [<c0232015>] xfs_iunlock+0x55/0x9f"
>>" [<c0262eeb>] xfs_write+0xa74/0xc61"
>>" [<c036a253>] sock_aio_read+0x95/0x99"
>>" [<c025d9fb>] xfs_file_aio_write+0x8f/0xa0"
>>" [<c015fb94>] do_sync_write+0xc9/0x10f"
>>" [<c0133ad6>] autoremove_wake_function+0x0/0x57"
>>" [<c015f3d5>] generic_file_llseek+0x95/0xbc"
>>" [<c015facb>] do_sync_write+0x0/0x10f"
>>" [<c015fc80>] vfs_write+0xa6/0x179"
>>" [<c015fe24>] sys_write+0x51/0x80"
>>" [<c0102d3f>] syscall_call+0x7/0xb"
>>"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db
>>0f 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b>
>>85 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
>>"EIP: [<c0142ff7>] generic_file_buffered_write+0x390/0x6cf SS:ESP
>>0068:dd04bd18"
>>
>>Files:
>>http://server113-han.de-nserver.de/filemap.s
>>http://server113-han.de-nserver.de/filemap.o
>>
> 
> You seem to have some kind of hardware/memory problem.
> 
> Disassembly of the failing instruction from the oops:
> 
>      8b 7c 24 28               mov    0x28(%esp),%edi
>      8b 85 9c 00 00 00         mov    0x9c(%ebp),%eax   <=====
> 
> Dump of the object code:
> 
>            8b 7c 24 28             mov    0x28(%esp),%edi
>            8b 87 9c 00 00 00       mov    0x9c(%edi),%eax
> 
> Looks like a bit is flipped.
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-24 17:16                                     ` Stefan Priebe - FH
@ 2007-01-24 17:56                                       ` Chuck Ebbert
  2007-01-24 19:27                                         ` Stefan Priebe - FH
  0 siblings, 1 reply; 21+ messages in thread
From: Chuck Ebbert @ 2007-01-24 17:56 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: linux-kernel

Stefan Priebe - FH wrote:
> Hi!
>
> Mhm are you shure? I mean i have this problem on 5 servers - all with
> the same mainboard. I cannot believe, that all 5 servers have a
> hardware problem that starts on the same day.
>
> The other thing is - that they all work fine with 2.6.16.x and all
> other kernels before. I mean some of them were used with 2.6.x since
> two years without any problem...
>
OK it's probably not hardware, but a bit is flipped somehow.

What is different about these servers?

Are you building different kernels for them, or is it just different
drivers loaded?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-24 17:56                                       ` Chuck Ebbert
@ 2007-01-24 19:27                                         ` Stefan Priebe - FH
  2007-01-25 20:52                                           ` Chuck Ebbert
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-24 19:27 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, edmudama

Hi Chuck,
    hi Eric,

cause you both asked me nearly the same i will answer you both in one mail.


 > What is different about these servers?
All 300 machines are mostly different. We have Dual Opteron, single P4 
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many more... 
different mainboards etc.

The only thing i found out is, that all these servers (where the problem 
exist) are using a DFI PM-12 Mainboard with a VIA Chipset.

 > Are you building different kernels for them, or is it just different
 > drivers loaded?
No every machine builds it's own kernel.

Stefan

Chuck Ebbert schrieb:
> Stefan Priebe - FH wrote:
> 
>>Hi!
>>
>>Mhm are you shure? I mean i have this problem on 5 servers - all with
>>the same mainboard. I cannot believe, that all 5 servers have a
>>hardware problem that starts on the same day.
>>
>>The other thing is - that they all work fine with 2.6.16.x and all
>>other kernels before. I mean some of them were used with 2.6.x since
>>two years without any problem...
>>
> 
> OK it's probably not hardware, but a bit is flipped somehow.
> 
> What is different about these servers?
> 
> Are you building different kernels for them, or is it just different
> drivers loaded?
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-24 19:27                                         ` Stefan Priebe - FH
@ 2007-01-25 20:52                                           ` Chuck Ebbert
  2007-01-25 21:29                                             ` Stefan Priebe - FH
  0 siblings, 1 reply; 21+ messages in thread
From: Chuck Ebbert @ 2007-01-25 20:52 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: linux-kernel, edmudama

Stefan Priebe - FH wrote:
> > What is different about these servers?
> All 300 machines are mostly different. We have Dual Opteron, single P4
> with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
> more... different mainboards etc.
>
> The only thing i found out is, that all these servers (where the
> problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.
Any others with VIA chipsets?
>
> > Are you building different kernels for them, or is it just different
> > drivers loaded?
> No every machine builds it's own kernel.
>
OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-25 20:52                                           ` Chuck Ebbert
@ 2007-01-25 21:29                                             ` Stefan Priebe - FH
  2007-01-30 10:44                                               ` Stefan Priebe - FH
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-25 21:29 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel, edmudama

Hi!

OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
Mainboard with VIA Chipset.

But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
in a range of 10 month) have this problem.

So i think it is a mixture of software and hardware problem. Perhaps DFI 
changed something on the mainboard (e.g. new revision) or there was a 
new BIOS Version on it.

But there must also changed something in the kernel.

 > OK, can you post configs for one that works and one that doesn't?
You mean Kernel .configs?

 > And which C compiler(s) do you use? The same for all, I hope...
On all 32bit Machines:

gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--enable-__cxa_atexit --with-system-zlib --enable-nls 
--without-included-gettext --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)

Stefan


Chuck Ebbert schrieb:
> Stefan Priebe - FH wrote:
> 
>>>What is different about these servers?
>>
>>All 300 machines are mostly different. We have Dual Opteron, single P4
>>with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
>>more... different mainboards etc.
>>
>>The only thing i found out is, that all these servers (where the
>>problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.
> 
> Any others with VIA chipsets?
> 
>>>Are you building different kernels for them, or is it just different
>>>drivers loaded?
>>
>>No every machine builds it's own kernel.
>>
> 
> OK, can you post configs for one that works and one that doesn't?
> 
> And which C compiler(s) do you use? The same for all, I hope...
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: XFS or Kernel Problem / Bug
  2007-01-25 21:29                                             ` Stefan Priebe - FH
@ 2007-01-30 10:44                                               ` Stefan Priebe - FH
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Priebe - FH @ 2007-01-30 10:44 UTC (permalink / raw)
  To: Stefan Priebe - FH; +Cc: Chuck Ebbert, linux-kernel, edmudama

Hi!

Any News?

Stefan

Stefan Priebe - FH schrieb:
> Hi!
> 
> OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
> Mainboard with VIA Chipset.
> 
> But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
> in a range of 10 month) have this problem.
> 
> So i think it is a mixture of software and hardware problem. Perhaps DFI 
> changed something on the mainboard (e.g. new revision) or there was a 
> new BIOS Version on it.
> 
> But there must also changed something in the kernel.
> 
>  > OK, can you post configs for one that works and one that doesn't?
> You mean Kernel .configs?
> 
>  > And which C compiler(s) do you use? The same for all, I hope...
> On all 32bit Machines:
> 
> gcc -v
> Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
> Configured with: ../src/configure -v 
> --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
> --mandir=/usr/share/man --infodir=/usr/share/info 
> --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
> --enable-__cxa_atexit --with-system-zlib --enable-nls 
> --without-included-gettext --enable-clocale=gnu --enable-debug 
> --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux
> Thread model: posix
> gcc version 3.3.5 (Debian 1:3.3.5-13)
> 
> Stefan
> 
> 
> Chuck Ebbert schrieb:
>> Stefan Priebe - FH wrote:
>>
>>>> What is different about these servers?
>>>
>>> All 300 machines are mostly different. We have Dual Opteron, single P4
>>> with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
>>> more... different mainboards etc.
>>>
>>> The only thing i found out is, that all these servers (where the
>>> problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.
>>
>> Any others with VIA chipsets?
>>
>>>> Are you building different kernels for them, or is it just different
>>>> drivers loaded?
>>>
>>> No every machine builds it's own kernel.
>>>
>>
>> OK, can you post configs for one that works and one that doesn't?
>>
>> And which C compiler(s) do you use? The same for all, I hope...
>>
> 
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2007-01-30 10:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20060801141545.B2326184@wobbly.melbourne.sgi.com>
     [not found] ` <44CED76B.20507@profihost.com>
     [not found]   ` <20060801142755.C2326184@wobbly.melbourne.sgi.com>
     [not found]     ` <44CED8F4.9080208@profihost.com>
     [not found]       ` <20060801143212.D2326184@wobbly.melbourne.sgi.com>
     [not found]         ` <44CEDA1D.5060607@profihost.com>
     [not found]           ` <20060801143803.E2326184@wobbly.melbourne.sgi.com>
     [not found]             ` <44CF36FB.6070606@profihost.com>
     [not found]               ` <20060802090915.C2344877@wobbly.melbourne.sgi.com>
     [not found]                 ` <44D07AB7.3020409@profihost.com>
     [not found]                   ` <20060802201805.A2360409@wobbly.melbourne.sgi.com>
2007-01-21 12:30                     ` XFS or Kernel Problem / Bug Stefan Priebe - FH
2007-01-22  6:18                       ` David Chinner
2007-01-22  7:51                         ` Stefan Priebe - FH
2007-01-22  8:03                           ` David Chinner
2007-01-22  8:07                             ` Stefan Priebe - FH
2007-01-22  9:42                             ` Stefan Priebe - FH
2007-01-23  1:10                               ` David Chinner
2007-01-23  8:31                                 ` Stefan Priebe - FH
2007-01-23 19:49                       ` Chuck Ebbert
2007-01-24  7:40                         ` Stefan Priebe - FH
2007-01-24 14:57                           ` Chuck Ebbert
2007-01-24 15:03                             ` Stefan Priebe - FH
2007-01-24 15:13                               ` Chuck Ebbert
2007-01-24 15:34                                 ` Stefan Priebe - FH
2007-01-24 16:51                                   ` Chuck Ebbert
2007-01-24 17:16                                     ` Stefan Priebe - FH
2007-01-24 17:56                                       ` Chuck Ebbert
2007-01-24 19:27                                         ` Stefan Priebe - FH
2007-01-25 20:52                                           ` Chuck Ebbert
2007-01-25 21:29                                             ` Stefan Priebe - FH
2007-01-30 10:44                                               ` Stefan Priebe - FH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).