LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: "Haar János" <djani22@netcenter.hu> To: "David Chinner" <dgc@sgi.com> Cc: <dgc@sgi.com>, <linux-xfs@oss.sgi.com>, <linux-kernel@vger.kernel.org> Subject: Re: xfslogd-spinlock bug? Date: Mon, 18 Dec 2006 09:17:50 +0100 [thread overview] Message-ID: <027b01c7227d$0e26d1f0$0400a8c0@dcccs> (raw) In-Reply-To: 20061218062444.GH44411608@melbourne.sgi.com ----- Original Message ----- From: "David Chinner" <dgc@sgi.com> To: "Haar János" <djani22@netcenter.hu> Cc: "David Chinner" <dgc@sgi.com>; <linux-xfs@oss.sgi.com>; <linux-kernel@vger.kernel.org> Sent: Monday, December 18, 2006 7:24 AM Subject: Re: xfslogd-spinlock bug? > On Mon, Dec 18, 2006 at 12:56:41AM +0100, Haar János wrote: > > > On Sat, Dec 16, 2006 at 12:19:45PM +0100, Haar János wrote: > > > > I dont know there is a context between 2 messages, but i can see, the > > > > spinlock bug comes always on cpu #3. > > > > > > > > Somebody have any idea? > > > > > > Your disk interrupts are directed to CPU 3, and so log I/O completion > > > occurs on that CPU. > > > > CPU0 CPU1 CPU2 CPU3 > > 0: 100 0 0 4583704 IO-APIC-edge timer > > 1: 0 0 0 2 IO-APIC-edge i8042 > > 4: 0 0 0 3878668 IO-APIC-edge serial > ..... > > 14: 3072118 0 0 181 IO-APIC-edge ide0 > ..... > > 52: 0 0 0 213052723 IO-APIC-fasteoi eth1 > > 53: 0 0 0 91913759 IO-APIC-fasteoi eth2 > > 100: 0 0 0 16776910 IO-APIC-fasteoi eth0 > .... > > > > Maybe.... > > I have 3 XFS on this system, with 3 source. > > > > 1. 200G one ide hdd. > > 2. 2x200G mirror on 1 ide + 1 sata hdd. > > 3. 4x3.3TB strip on NBD. > > > > The NBD serves through eth1, and it is on the CPU3, but the ide0 is on the > > CPU0. > > I'd say your NBD based XFS filesystem is having trouble. > > > > Are you using XFS on a NBD? > > > > Yes, on the 3. source. > > Ok, I've never heard of a problem like this before and you are doing > something that very few ppl are doing (i.e. XFS on NBD). I'd start > Hence I'd start by suspecting a bug in the NBD driver. Ok, if you have right, this also can be in context with the following issue: http://download.netcenter.hu/bughunt/20061217/messages.txt (10KB) > > > > > Dec 16 12:08:36 dy-base RSP: 0018:ffff81011fdedbc0 EFLAGS: 00010002 > > > > Dec 16 12:08:36 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: > > > ^^^^^^^^^^^^^^^^ > > > Anyone recognise that pattern? > > > > I think i have one idea. > > This issue can stops sometimes the 5sec automatic restart on crash, and this > > shows possible memory corruption, and if the bug occurs in the IRQ > > handling.... :-) > > I have a lot of logs about this issue, and the RAX, RBX always the same. > > And is this the only place where you see the problem? Or are there > other stack traces that you see this in as well? I have used the 2.6.16.18 for a long time, and it was stable, except this issue. (~20 dump with xfslogd) And i try the new releases, and now i have more. :-) What do you think exactly? I can see in the logs, but search for what? The RAX, RBX thing, or the xfslogd-spinlock problem or the old nbd-deadlock + mem corruption? [root@NetCenter netlog]# grep "0000000000000033" messages* messages.1:Dec 11 22:47:21 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.1:Dec 12 18:16:35 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.1:Dec 13 11:40:05 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.1:Dec 14 22:25:32 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.1:Dec 15 06:24:44 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.1:Dec 16 12:08:36 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.11:Oct 3 19:49:44 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.11:Oct 7 01:11:17 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.13:Sep 21 15:35:31 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.15:Sep 3 16:13:35 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.15:Sep 5 21:00:38 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.2:Dec 9 00:10:47 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.2:Dec 9 14:07:01 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.2:Dec 10 04:44:48 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.3:Nov 30 10:59:21 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.3:Dec 2 00:54:23 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.5:Nov 13 10:44:49 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.5:Nov 14 03:14:14 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.5:Nov 14 03:37:07 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.5:Nov 15 01:39:54 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.6:Nov 6 14:48:54 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.6:Nov 7 04:36:13 dy-base RAX: 0000000000000033 RBX: ffff8100057d2080 RCX: ffff810050d638f8 messages.6:Nov 7 04:36:13 dy-base RDX: 0000000000000008 RSI: 0000000000012cff RDI: 0000000000000033 messages.6:Nov 7 11:12:06 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.6:Nov 8 03:20:38 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.6:Nov 8 15:02:16 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.6:Nov 8 15:27:12 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.6:Nov 10 15:29:43 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.6:Nov 11 20:44:14 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.9:Oct 18 15:31:02 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 messages.9:Oct 19 13:53:24 dy-base RAX: 0000000000000033 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 > > > > This implies a spinlock inside a wait_queue_head_t is corrupt. > > > > > > What are you type of system do you have, and what sort of > > > workload are you running? > > > > OS: Fedora 5, 64bit. > > HW: dual xeon, with HT, ram 4GB. > > (the min_free_kbytes limit is set to 128000, because sometimes the e1000 > > driver run out the reserved memory during irq handling.) > > That does not sound good. What happens when it does run out of memory? This is an old problem, on 2.6.16.18 . The default min_free_kbytes is 38xx , and the GIGE controller easily can be overflow this little place. If this happens, the system freez, and i can only use the serial console + sysreq to dump stack: download.netcenter.hu/bughunt/20060530/261618-good.txt download.netcenter.hu/bughunt/20060530/dmesg.txt download.netcenter.hu/bughunt/20060530/dump.txt This problem is already fixed with set the min_free_kbytes to 128M. > Is that when you start to see the above corruptions? I think no, but i am not 100% sure. Cheers, Janos > > Cheers, > > Dave. > -- > Dave Chinner > Principal Engineer > SGI Australian Software Group
next prev parent reply other threads:[~2006-12-18 8:19 UTC|newest] Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top 2006-12-11 23:00 xfslogd-spinlock bug? Haar János 2006-12-12 14:32 ` Justin Piszcz 2006-12-13 1:11 ` Haar János 2006-12-16 11:19 ` Haar János 2006-12-17 22:44 ` David Chinner 2006-12-17 23:56 ` Haar János 2006-12-18 6:24 ` David Chinner 2006-12-18 8:17 ` Haar János [this message] 2006-12-18 22:36 ` David Chinner 2006-12-18 23:39 ` Haar János 2006-12-19 2:52 ` David Chinner 2006-12-19 4:47 ` David Chinner 2006-12-27 12:58 ` Haar János 2007-01-07 23:14 ` David Chinner 2007-01-10 17:18 ` Janos Haar 2007-01-11 3:34 ` David Chinner 2007-01-11 20:15 ` Janos Haar
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='027b01c7227d$0e26d1f0$0400a8c0@dcccs' \ --to=djani22@netcenter.hu \ --cc=dgc@sgi.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-xfs@oss.sgi.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).