LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: 2.6.18-stable release plans?
@ 2007-01-24 13:30 Chris Rankin
  2007-01-24 14:37 ` Hugh Dickins
  0 siblings, 1 reply; 29+ messages in thread
From: Chris Rankin @ 2007-01-24 13:30 UTC (permalink / raw)
  To: linux-kernel

> But 2.6.18.x must be over now, because the -stable team didn't release a 2.6.18.7 to match
> 2.6.19.2, and all of 2.6.x except for 2.6.19.2 has that weird file corruption bug .

Personally, I dumped 2.6.19.x like a hot coal as soon as I tripped over this bug:

http://bugzilla.kernel.org/show_bug.cgi?id=7707

I didn't take much to trigger it, either. But the silence has been deafening.

Cheers,
Chris



	
	
		
___________________________________________________________ 
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. 
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 13:30 2.6.18-stable release plans? Chris Rankin
@ 2007-01-24 14:37 ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2007-01-24 14:37 UTC (permalink / raw)
  To: Chris Rankin; +Cc: linux-kernel

On Wed, 24 Jan 2007, Chris Rankin wrote:
> 
> Personally, I dumped 2.6.19.x like a hot coal as soon as I tripped over this bug:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=7707
> 
> I didn't take much to trigger it, either. But the silence has been deafening.

Oh, the page_remove_rmap BUG, page_mapcount negative.

Sorry for the deafening silence, see I was CC'ed but dropped the ball.

That's surely no reason to dump 2.6.19.x, you'll find the occasional
such report on every(?) release since page mapcount went into 2.6.7.

Oftentimes it's bad RAM (try memtest86), sometimes it's a bad driver
(probably the case for the tainted P report appended to your untainted
one), sometimes it's unidentified memory corruption.  Not once (except
during experimental patch testing) has it been proved due to an actual
VM problem.

Hugh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-02-02  6:47       ` Jon Masters
@ 2007-02-02  8:17         ` Valdis.Kletnieks
  0 siblings, 0 replies; 29+ messages in thread
From: Valdis.Kletnieks @ 2007-02-02  8:17 UTC (permalink / raw)
  To: Jon Masters; +Cc: Chris Rankin, Mark Rustad, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

On Fri, 02 Feb 2007 01:47:38 EST, Jon Masters said:

> I must be weird or something, but I often think about this and the sheer 
> number of clock cycles executing at any one time around the world. Have 
> you ever stopped to think how many copies of schedule() (or whatever) 
> are currently running somewhere in the world? It's just nuts :-)

That's why we count single cycles in fast-path sections of code. :)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-02-02  4:02     ` Valdis.Kletnieks
@ 2007-02-02  6:47       ` Jon Masters
  2007-02-02  8:17         ` Valdis.Kletnieks
  0 siblings, 1 reply; 29+ messages in thread
From: Jon Masters @ 2007-02-02  6:47 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Chris Rankin, Mark Rustad, linux-kernel

Valdis.Kletnieks@vt.edu wrote:

> "With 100 million computers in use today, we should expect roughly 6 million
> single bit errors per year. Computer hardware and software companies must
> receive thousands of "side effect" bug reports and support calls due to memory
> errors alone. The costs of NOT including parity memory must be huge!"

I must be weird or something, but I often think about this and the sheer 
number of clock cycles executing at any one time around the world. Have 
you ever stopped to think how many copies of schedule() (or whatever) 
are currently running somewhere in the world? It's just nuts :-)

More seriously, if nobody cared about this stuff then we wouldn't have 
all this MCE reporting and tools to handle differentiating between 
actual failing DRAMs and temporary bit transitions in ECC memory.

Jon.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 22:37   ` Chris Rankin
  2007-01-24 23:11     ` Alan
@ 2007-02-02  4:02     ` Valdis.Kletnieks
  2007-02-02  6:47       ` Jon Masters
  1 sibling, 1 reply; 29+ messages in thread
From: Valdis.Kletnieks @ 2007-02-02  4:02 UTC (permalink / raw)
  To: Chris Rankin; +Cc: Mark Rustad, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

On Wed, 24 Jan 2007 22:37:20 GMT, Chris Rankin said:
> --- Mark Rustad <mrustad@gmail.com> wrote:
> > Well, do you have ECC memory? If not, it is at least possible that  
> > that the solar flares that occurred last month may have affected your  
> > system.
> 
> I am going to assume that you are being facaetious, because it would be the rarified pinnacle of
> supreme arrogance to suggest that a cosmic ray event is a more likely explanation than a bug in
> the kernel.

Sorry for the late reply, but cosmic ray events (actually, self-induced alpha
particle events from decays within the chipset itself) *are* a likely
explanation:

http://stason.org/TULARC/pc/pc_hardware_faq/2_20_What_does_parity_ECC_memory_protect_the_system_from.html

Most important take-away here:

"With 100 million computers in use today, we should expect roughly 6 million
single bit errors per year. Computer hardware and software companies must
receive thousands of "side effect" bug reports and support calls due to memory
errors alone. The costs of NOT including parity memory must be huge!"


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-25 19:36               ` Ken Moffat
@ 2007-01-26 13:02                 ` Chris Rankin
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Rankin @ 2007-01-26 13:02 UTC (permalink / raw)
  To: Ken Moffat; +Cc: Alan, linux-kernel

--- Ken Moffat <zarniwhoop@ntlworld.com> wrote:
>  I can't, but Dave Jones had a similar problem earlier this month,
> archived at http://uwsg.iu.edu/hypermail/linux/kernel/0701.0/1822.html
> which I think is a followup from
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg105370.html
>  - and seems to be a possible hardware failure (bulging capacitors)
> becoming apparent under load.

Interesting, although I don't believe I have a hardware fault. The box is perfectly stable under
2.6.18.x. Anyway, my particular problem happened very quickly last time so I am hoping that
recompiling xine from scratch will trigger something again. This time, however, I have built a
2.6.19.x kernel with a few memory debugging options turned on.

Cheers,
Chris


		
___________________________________________________________ 
Copy addresses and emails from any email account to Yahoo! Mail - quick, easy and free. http://uk.docs.yahoo.com/trueswitch2.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-25  9:16             ` Chris Rankin
  2007-01-25 19:36               ` Ken Moffat
@ 2007-01-25 23:26               ` Alistair John Strachan
  1 sibling, 0 replies; 29+ messages in thread
From: Alistair John Strachan @ 2007-01-25 23:26 UTC (permalink / raw)
  To: Chris Rankin; +Cc: Ken Moffat, Alan, linux-kernel

On Thursday 25 January 2007 09:16, Chris Rankin wrote:
> But anyway - can someone please tell me what "Eeek! page_mapcount(page)
> went negative! (-1)" is *really* saying/implying? Because I am currently
> translating this as "I WANT TO EAT YOUR FILESYSTEMS".

Hugh already did, multiple times. If there's an external hardware event that 
corrupts memory, code executing on your CPU is no longer going to behave 
deterministically. So cases that are typically "impossible" in the design of 
the code have a chance to trigger.

You can continue to flame 2.6.19, but you're an extreme minority when it comes 
to this kind of bug and as, again, Hugh already said, almost all of the 
reports of this and similar other bugs have led to hardware problems that 
were either unchecked or difficult to detect.

Imagine this scenario. It might seem unrealistic to you, but it's not 
impossible!

First Use of Linux -> Upgrading to 2.6.19
	Undetected hardware error never triggered.

Running 2.6.19
	Hardware error triggers. Linux crashes.

Going back to 2.6.18
	Hardware error has not yet triggered again.

Will it eat your filesystem? Maybe. But it probably won't, if you claim the 
memory is tested, it could have been a single bit error, or a cosmic ray 
event, or a brownout, or anything similar. It's much more likely to simply 
crash your machine, as it did.

Not running the affected kernel again is a sure way to have _nobody_ listen to 
your complaints about 2.6.19 having a real software bug, because you're 
totally unwilling to test the kernel again and see if it triggers. A single 
report is simply not enough evidence. 

Additionally, reports from other users (who may have a million different 
experimental variables involved) are also insufficient, for reasons which 
have already been explained (drivers, proprietary code, et cetera).

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 23:11     ` Alan
  2007-01-24 23:05       ` Chris Rankin
  2007-01-24 23:32       ` Mark Rustad
@ 2007-01-25 21:04       ` Matt Mackall
  2 siblings, 0 replies; 29+ messages in thread
From: Matt Mackall @ 2007-01-25 21:04 UTC (permalink / raw)
  To: Chris Rankin; +Cc: Mark Rustad, Alan, linux-kernel

On Wed, Jan 24, 2007 at 11:11:53PM +0000, Alan wrote:
> > I am going to assume that you are being facaetious, because it would be the rarified pinnacle of
> > supreme arrogance to suggest that a cosmic ray event is a more likely explanation than a bug in
> > the kernel.
> 
> A one off non repeatable error experienced by two people out of the
> millions using it does fit the cosmic ray description quite well. That's
> not to say there isn't a bug, but you don't have enough data to even
> begin debugging it unless its rather more reproducable.

The soft error rate (cosmic rays, alpha decay, etc.) for modern memory
at sea level is estimated to be somewhere around 1000 - 5000
FIT/Mbit[1]. FIT is Failures in Time - errors per billion hours of
use. If you've got 1GB of memory, you've got 8000Mbits. So you'd
expect 8M - 40M errors per billion hours on your machine. Or 8 to 40
errors per 1000 hours. That's about one single-bit error per week to
one per day.

Yes, that's a lot. Can it really be that high? Big supercomputer
installations actually measure it in errors per day or hour.

Most of these errors will go completely unnoticed because they happen
in data structures that aren't revisited (stale cache, unused code,
empty memory). The remainder will often look like random disk read or
write errors or random application bugs/crashes. Sound familiar? That's why
people buy ECC memory.

Now if we say that 10% of of that 1GB of RAM (~100MB) is kernel code/data
(not including page cache) and that, say, 1-10% of errors trigger
BUG/WARN code, we'll see these bug messages once every 100 days to
once every 1000 weeks (per GB per user).

As for the relative error rate vs kernel bugs - there are no shortage
of Linux boxes with trouble-free uptimes much longer than the 100 days
above.

So yes, if a user reports a bug that's attributable to a single bit
memory error that's otherwise unreproduced and unexplained, it's
totally reasonable to chalk it up to cosmic rays until some sort of
pattern of reports emerges.

As for your particular bug:

 Eeek! page_mapcount(page) went negative! (-1)
  page->flags = 14
  page->count = 0
  page->mapping = 00000000

This check occurs whenever the last mapping is removed from a page.
It's a very heavily used piece of code. The check is there as
sanity-checking from when this logic was introduced. If there were a
new bug here that could be triggered by gcc or telnet, odds are very
good that it would trigger for TONS of people.

So more likely theories are: a) pointer scribble from something
completely unrelated or b) cosmic rays. As the nearby data (flags,
count, mapping) doesn't appear to be scribbled on, (a) looks less
promising. 

[1] http://www.tezzaron.com/about/papers/soft_errors_1_1_secure.pdf

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-25  9:16             ` Chris Rankin
@ 2007-01-25 19:36               ` Ken Moffat
  2007-01-26 13:02                 ` Chris Rankin
  2007-01-25 23:26               ` Alistair John Strachan
  1 sibling, 1 reply; 29+ messages in thread
From: Ken Moffat @ 2007-01-25 19:36 UTC (permalink / raw)
  To: Chris Rankin; +Cc: Alan, linux-kernel

On Thu, Jan 25, 2007 at 09:16:04AM +0000, Chris Rankin wrote:
> 
> But anyway - can someone please tell me what "Eeek! page_mapcount(page) went negative! (-1)" is
> *really* saying/implying? Because I am currently translating this as "I WANT TO EAT YOUR
> FILESYSTEMS".
> 
 I can't, but Dave Jones had a similar problem earlier this month,
archived at http://uwsg.iu.edu/hypermail/linux/kernel/0701.0/1822.html
which I think is a followup from
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg105370.html
 - and seems to be a possible hardware failure (bulging capacitors)
becoming apparent under load.

Ken
-- 
das eine Mal als Tragödie, das andere Mal als Farce

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-25  1:00           ` Ken Moffat
@ 2007-01-25  9:16             ` Chris Rankin
  2007-01-25 19:36               ` Ken Moffat
  2007-01-25 23:26               ` Alistair John Strachan
  0 siblings, 2 replies; 29+ messages in thread
From: Chris Rankin @ 2007-01-25  9:16 UTC (permalink / raw)
  To: Ken Moffat; +Cc: Alan, linux-kernel

--- Ken Moffat <zarniwhoop@ntlworld.com> wrote:
> At the moment, you have a problem that nobody recognises.  If you're not
> willing to test if the problem happens repeatably, (you appear to
> have had one failure and immediately reverted to an old kernel), who
> do you think will be able to fix it?

This bug seems to be in the kernel's "memory management", and the last memory-related bug I had
(caused by a bad DIMM on another machine) caused creeping filesystem corruption. However, this
machine is my main desktop, and so I am keen to keep the filesystems intact. So yes, that involves
not running a kernel that has shown itself to be unreliable. 

I was hoping that someone with a deeper knowledge of the differences between 2.6.18 and 2.6.19
would have an idea of what might have triggered this problem, and yes, I was also thinking that
some more people would trip over it and help debug it.

But anyway - can someone please tell me what "Eeek! page_mapcount(page) went negative! (-1)" is
*really* saying/implying? Because I am currently translating this as "I WANT TO EAT YOUR
FILESYSTEMS".

Cheers,
Chris



	
	
		
___________________________________________________________ 
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. 
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
       [not found] <BC3E207A-1A56-4032-9619-910E80281E9C@gmail.com>
@ 2007-01-25  8:51 ` Chris Rankin
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Rankin @ 2007-01-25  8:51 UTC (permalink / raw)
  To: Mark Rustad; +Cc: Alan, linux-kernel

--- Mark Rustad <mrustad@gmail.com> wrote:
> We'll never know if any of these things were correlated with the  
> solar flares because they all seem to be one-off failures. I do find  
> it interesting though. Our systems seem to be doing statistically  
> better this month. What do you think?

Personally, I think it's all a bit moot unless you also have particle detectors above and below
all your machines so that you can interpolate particle tracks. I certainly see no reason why a
random high-energy particle passing through two different machines is ever likely to cause the
same error, unless that error is an outright systems crash. (Although that second report was from
someone with a tainted kernel, which makes it suspect.)

Cheers,
Chris



		
___________________________________________________________ 
Now you can scan emails quickly with a reading pane. Get the new Yahoo! Mail. http://uk.docs.yahoo.com/nowyoucan.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 23:45         ` Chris Rankin
  2007-01-25  1:00           ` Ken Moffat
@ 2007-01-25  3:05           ` Mark Rustad
  1 sibling, 0 replies; 29+ messages in thread
From: Mark Rustad @ 2007-01-25  3:05 UTC (permalink / raw)
  To: Chris Rankin; +Cc: Alan, linux-kernel

On Jan 24, 2007, at 5:45 PM, Chris Rankin wrote:
>> --- Mark Rustad <mrustad@gmail.com> wrote:
>> Exactly. Halting use of a version of the kernel based on a single
>> incident provides no insight to the source of the problem. It could
>> be anything...
>
> There is a world of difference between a polite request for more  
> information (although I gave you
> everything I had), and fobbing someone off with a story about  
> cosmic rays.

I'm sorry. I didn't mean to imply anything like that. I just happened  
to notice that the date of the bug report appeared to correlate  
pretty well with one of the solar flare events last month. I was  
really trying to share some information that just conceivably might  
have been related, based on the earlier messages in this thread  
regarding memory errors.

I don't normally follow solar activity. I have been looking into some  
system failures that happened last month. The systems had been  
running with all bus error detection enabled – the hardware set to  
spontaneously reboot on any uncorrectable error. Since our systems  
are redundant, performing a reset simply means that the redundant  
partner will take over, so the reset is the best way to be certain  
that there is no data corruption. I eventually recalled a radio  
report last month about a coronal mass ejection on the sun and how  
things might be disrupted here. I checked out www.spaceweather.com  
and found that December was a very active month, with three separate  
X-class flares. I have no way to conclude that the failures that I  
have seen were influenced by events on the sun, but it seems  
possible. Compared to our systems, most PCs and even much server- 
class hardware systems are likely to corrupt a bit just keep on going.

We'll never know if any of these things were correlated with the  
solar flares because they all seem to be one-off failures. I do find  
it interesting though. Our systems seem to be doing statistically  
better this month. What do you think?

-- 
Mark Rustad, MRustad@mac.com




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 23:45         ` Chris Rankin
@ 2007-01-25  1:00           ` Ken Moffat
  2007-01-25  9:16             ` Chris Rankin
  2007-01-25  3:05           ` Mark Rustad
  1 sibling, 1 reply; 29+ messages in thread
From: Ken Moffat @ 2007-01-25  1:00 UTC (permalink / raw)
  To: Chris Rankin; +Cc: Mark Rustad, Alan, linux-kernel

On Wed, Jan 24, 2007 at 11:45:57PM +0000, Chris Rankin wrote:
> 
> There is a world of difference between a polite request for more information (although I gave you
> everything I had), and fobbing someone off with a story about cosmic rays.
> 
 Chris,

 I doubt there was a single version of the kernel which ever worked
well for all its users.  In a production environment, reverting to an
older version may be the best short-term answer, but if nobody
recognises the problem you won't get any closer to a proper fix.  At
the moment, you have a problem that nobody recognises.  If you're not
willing to test if the problem happens repeatably, (you appear to
have had one failure and immediately reverted to an old kernel), who
do you think will be able to fix it?  And if it turns out it doesn't
fail repeatably, maybe the responses you've received could be
correct.

 The stable team are only there to maintain the current release of
the kernel.  There is no maintenance of earlier releases (except
Adrian's work on 2.6.16), other than what a distro chooses to do to
backport fixes.  Of course, if the problem _is_ reproducible on your
machine and config, you might get asked to try to identify when it
got introduced (e.g. was 2.6.19 itself, or an arbitrary 2.6.19-rc,
ok?).

Ken
-- 
das eine Mal als Tragödie, das andere Mal als Farce

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 23:32       ` Mark Rustad
@ 2007-01-24 23:45         ` Chris Rankin
  2007-01-25  1:00           ` Ken Moffat
  2007-01-25  3:05           ` Mark Rustad
  0 siblings, 2 replies; 29+ messages in thread
From: Chris Rankin @ 2007-01-24 23:45 UTC (permalink / raw)
  To: Mark Rustad, Alan; +Cc: Chris Rankin, linux-kernel

--- Mark Rustad <mrustad@gmail.com> wrote:
> Exactly. Halting use of a version of the kernel based on a single  
> incident provides no insight to the source of the problem. It could  
> be anything...

There is a world of difference between a polite request for more information (although I gave you
everything I had), and fobbing someone off with a story about cosmic rays.

Cheers,
Chris



		
___________________________________________________________ 
Now you can scan emails quickly with a reading pane. Get the new Yahoo! Mail. http://uk.docs.yahoo.com/nowyoucan.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 23:11     ` Alan
  2007-01-24 23:05       ` Chris Rankin
@ 2007-01-24 23:32       ` Mark Rustad
  2007-01-24 23:45         ` Chris Rankin
  2007-01-25 21:04       ` Matt Mackall
  2 siblings, 1 reply; 29+ messages in thread
From: Mark Rustad @ 2007-01-24 23:32 UTC (permalink / raw)
  To: Alan; +Cc: Chris Rankin, linux-kernel

On Jan 24, 2007, at 5:11 PM, Alan wrote:

>> I am going to assume that you are being facaetious, because it  
>> would be the rarified pinnacle of
>> supreme arrogance to suggest that a cosmic ray event is a more  
>> likely explanation than a bug in
>> the kernel.
>
> A one off non repeatable error experienced by two people out of the
> millions using it does fit the cosmic ray description quite well.  
> That's
> not to say there isn't a bug, but you don't have enough data to even
> begin debugging it unless its rather more reproducable.

Exactly. Halting use of a version of the kernel based on a single  
incident provides no insight to the source of the problem. It could  
be anything...

-- 
Mark Rustad, MRustad@mac.com



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 22:37   ` Chris Rankin
@ 2007-01-24 23:11     ` Alan
  2007-01-24 23:05       ` Chris Rankin
                         ` (2 more replies)
  2007-02-02  4:02     ` Valdis.Kletnieks
  1 sibling, 3 replies; 29+ messages in thread
From: Alan @ 2007-01-24 23:11 UTC (permalink / raw)
  To: Chris Rankin; +Cc: Mark Rustad, linux-kernel

> I am going to assume that you are being facaetious, because it would be the rarified pinnacle of
> supreme arrogance to suggest that a cosmic ray event is a more likely explanation than a bug in
> the kernel.

A one off non repeatable error experienced by two people out of the
millions using it does fit the cosmic ray description quite well. That's
not to say there isn't a bug, but you don't have enough data to even
begin debugging it unless its rather more reproducable.

Alan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 23:11     ` Alan
@ 2007-01-24 23:05       ` Chris Rankin
  2007-01-24 23:32       ` Mark Rustad
  2007-01-25 21:04       ` Matt Mackall
  2 siblings, 0 replies; 29+ messages in thread
From: Chris Rankin @ 2007-01-24 23:05 UTC (permalink / raw)
  To: Alan; +Cc: Mark Rustad, linux-kernel

--- Alan <alan@lxorguk.ukuu.org.uk> wrote:
> A one off non repeatable error experienced by two people out of the
> millions using it does fit the cosmic ray description quite well.

Actually it's "unrepeated", not "non repeatable". And that's because I switched back to 2.6.18.x
immediately since I no longer trusted 2.6.19.x.

Cheers,
Chris





		
___________________________________________________________ 
What kind of emailer are you? Find out today - get a free analysis of your email personality. Take the quiz at the Yahoo! Mail Championship. 
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 16:28 ` Mark Rustad
@ 2007-01-24 22:37   ` Chris Rankin
  2007-01-24 23:11     ` Alan
  2007-02-02  4:02     ` Valdis.Kletnieks
  0 siblings, 2 replies; 29+ messages in thread
From: Chris Rankin @ 2007-01-24 22:37 UTC (permalink / raw)
  To: Mark Rustad; +Cc: linux-kernel

--- Mark Rustad <mrustad@gmail.com> wrote:
> Well, do you have ECC memory? If not, it is at least possible that  
> that the solar flares that occurred last month may have affected your  
> system.

I am going to assume that you are being facaetious, because it would be the rarified pinnacle of
supreme arrogance to suggest that a cosmic ray event is a more likely explanation than a bug in
the kernel.

Cheers,
Chris



		
___________________________________________________________ 
What kind of emailer are you? Find out today - get a free analysis of your email personality. Take the quiz at the Yahoo! Mail Championship. 
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 16:12 ` Hugh Dickins
@ 2007-01-24 17:33   ` Chris Rankin
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Rankin @ 2007-01-24 17:33 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-kernel

--- Hugh Dickins <hugh@veritas.com> wrote:
> All I'm claiming is that it's no more a reason to avoid 2.6.19*
> than to avoid any other release (the kernels before 2.6.7 happened
> to have no such check, but that doesn't imply they were any safer).

There is *one* reason to avoid 2.6.19.x: it has actually bitten me, while none of the others has.
And if I can't trust a kernel to compile something then what good is it?

Cheers,
Chris



	
	
		
___________________________________________________________ 
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. 
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 15:53 Chris Rankin
  2007-01-24 16:12 ` Hugh Dickins
@ 2007-01-24 16:28 ` Mark Rustad
  2007-01-24 22:37   ` Chris Rankin
  1 sibling, 1 reply; 29+ messages in thread
From: Mark Rustad @ 2007-01-24 16:28 UTC (permalink / raw)
  To: Chris Rankin; +Cc: linux-kernel

On Jan 24, 2007, at 9:53 AM, Chris Rankin wrote:

>>> But MY kernel is clearly untainted.
>>> So what other explanation is there apart from a kernel bug?
>
>> If it's me you're asking: I don't know (overheating, cosmic  
>> rays, ...)
>
> I suppose what I'm *really* asking is what the basis is for  
> assuming that this *isn't* a kernel
> bug and can therefore be safely ignored, seeing as the oops is  
> real, the hardware is fine and the
> kernel is untainted? That seems to cover the bases from where I'm  
> sitting.
>
> Cheers,
> Chris
>
> P.S. No micro-heatwaves have occurred here, either. Do we all need  
> to install muon detectors in
> our homes before reporting bugs now, so that we can exclude cosmic  
> ray events too?

Well, do you have ECC memory? If not, it is at least possible that  
that the solar flares that occurred last month may have affected your  
system. There were three X-class solar flares in the month of  
December. Even if you have ECC memory, it is still possible to suffer  
data corruption because many BIOSes do not turn on bus parity error  
detection. And are the memories in your disk drive controllers ECC or  
parity protected? I wouldn't bet on it...

All too often, modern PCs are data corruption accidents waiting to  
happen.

-- 
Mark Rustad, MRustad@mac.com



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 15:53 Chris Rankin
@ 2007-01-24 16:12 ` Hugh Dickins
  2007-01-24 17:33   ` Chris Rankin
  2007-01-24 16:28 ` Mark Rustad
  1 sibling, 1 reply; 29+ messages in thread
From: Hugh Dickins @ 2007-01-24 16:12 UTC (permalink / raw)
  To: Chris Rankin; +Cc: linux-kernel

On Wed, 24 Jan 2007, Chris Rankin wrote:
> 
> I suppose what I'm *really* asking is what the basis is for assuming that this *isn't* a kernel
> bug and can therefore be safely ignored, seeing as the oops is real, the hardware is fine and the
> kernel is untainted? That seems to cover the bases from where I'm sitting.

All I'm claiming is that it's no more a reason to avoid 2.6.19*
than to avoid any other release (the kernels before 2.6.7 happened
to have no such check, but that doesn't imply they were any safer).

It may indeed be due to a kernel bug, but I can't tell you where: sorry.

Hugh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
@ 2007-01-24 15:53 Chris Rankin
  2007-01-24 16:12 ` Hugh Dickins
  2007-01-24 16:28 ` Mark Rustad
  0 siblings, 2 replies; 29+ messages in thread
From: Chris Rankin @ 2007-01-24 15:53 UTC (permalink / raw)
  To: linux-kernel

> > But MY kernel is clearly untainted.
> > So what other explanation is there apart from a kernel bug?

> If it's me you're asking: I don't know (overheating, cosmic rays, ...)

I suppose what I'm *really* asking is what the basis is for assuming that this *isn't* a kernel
bug and can therefore be safely ignored, seeing as the oops is real, the hardware is fine and the
kernel is untainted? That seems to cover the bases from where I'm sitting.

Cheers,
Chris

P.S. No micro-heatwaves have occurred here, either. Do we all need to install muon detectors in
our homes before reporting bugs now, so that we can exclude cosmic ray events too?


		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-24 15:06 Chris Rankin
@ 2007-01-24 15:40 ` Hugh Dickins
  0 siblings, 0 replies; 29+ messages in thread
From: Hugh Dickins @ 2007-01-24 15:40 UTC (permalink / raw)
  To: Chris Rankin; +Cc: linux-kernel

On Wed, 24 Jan 2007, Chris Rankin wrote:
> 
> But MY kernel is clearly untainted.
> So what other explanation is there apart from a kernel bug?

If it's me you're asking: I don't know (overheating, cosmic rays, ...)

Hugh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
@ 2007-01-24 15:06 Chris Rankin
  2007-01-24 15:40 ` Hugh Dickins
  0 siblings, 1 reply; 29+ messages in thread
From: Chris Rankin @ 2007-01-24 15:06 UTC (permalink / raw)
  To: linux-kernel

> That's surely no reason to dump 2.6.19.x, you'll find the occasional
> such report on every(?) release since page mapcount went into 2.6.7.

This was the only time I've seen it, before or since.

> Oftentimes it's bad RAM (try memtest86)

There is nothing wrong with my RAM. I tested it quite extensively when I upgraded to 2 GB.

> sometimes it's a bad driver (probably the case for the tainted P report appended
> to your untainted one), sometimes it's unidentified memory corruption.

But MY kernel is clearly untainted. So what other explanation is there apart from a kernel bug?

Cheers,
Chris



		
___________________________________________________________ 
All New Yahoo! Mail – Tired of unwanted email come-ons? Let our SpamGuard protect you. http://uk.docs.yahoo.com/nowyoucan.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-23  0:23 ` Jesper Juhl
  2007-01-23 20:33   ` Chuck Ebbert
@ 2007-01-24  4:50   ` Daniel Barkalow
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Barkalow @ 2007-01-24  4:50 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Chuck Ebbert, Linux Kernel Mailing List, Chris Wright

On Tue, 23 Jan 2007, Jesper Juhl wrote:

> Now that 2.6.19 is out, most likely not.  -stable releases are made
> for the latest stable 2.6.x kernel, once 2.6.x+1 is out that's the one
> -stable patches are made for (2.6.16 is an exception)..

There's generally a bit of overlap. 2.6.17.14 was about the same time as 
2.6.18.1, and 2.6.18.6 was after 2.6.19.1. But 2.6.18.x must be over now, 
because the -stable team didn't release a 2.6.18.7 to match 2.6.19.2, and 
all of 2.6.x except for 2.6.19.2 has that weird file corruption bug 
(although rarely triggered).

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-23 20:33   ` Chuck Ebbert
@ 2007-01-23 20:56     ` Adrian Bunk
  0 siblings, 0 replies; 29+ messages in thread
From: Adrian Bunk @ 2007-01-23 20:56 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: Jesper Juhl, Linux Kernel Mailing List, Chris Wright

On Tue, Jan 23, 2007 at 03:33:48PM -0500, Chuck Ebbert wrote:
> Jesper Juhl wrote:
> > On 22/01/07, Chuck Ebbert <cebbert@redhat.com> wrote:
> >> Is there going to be another 2.6.18-stable release?
> >>
> >
> > Now that 2.6.19 is out, most likely not.  -stable releases are made
> > for the latest stable 2.6.x kernel, once 2.6.x+1 is out that's the one
> > -stable patches are made for (2.6.16 is an exception)..
> >
> Great... just as 2.6.18 approaches actual stability.
> 
> Adrian, how much longer are you going to support 2.6.16? Would you consider
> moving to 2.6.18 any time soon?

I'll continue to maintain 2.6.16 - moving to 2.6.18 or any other kernel 
would defeat the purpose of what I am doing.

I'm not yet decided whether I'll create other stable kernel branches, 
but it this will happen it won't be 2.6.18, more like 2.6.25 or 2.6.30.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-23  0:23 ` Jesper Juhl
@ 2007-01-23 20:33   ` Chuck Ebbert
  2007-01-23 20:56     ` Adrian Bunk
  2007-01-24  4:50   ` Daniel Barkalow
  1 sibling, 1 reply; 29+ messages in thread
From: Chuck Ebbert @ 2007-01-23 20:33 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Linux Kernel Mailing List, Chris Wright

Jesper Juhl wrote:
> On 22/01/07, Chuck Ebbert <cebbert@redhat.com> wrote:
>> Is there going to be another 2.6.18-stable release?
>>
>
> Now that 2.6.19 is out, most likely not.  -stable releases are made
> for the latest stable 2.6.x kernel, once 2.6.x+1 is out that's the one
> -stable patches are made for (2.6.16 is an exception)..
>
Great... just as 2.6.18 approaches actual stability.

Adrian, how much longer are you going to support 2.6.16? Would you consider
moving to 2.6.18 any time soon?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: 2.6.18-stable release plans?
  2007-01-22 22:13 Chuck Ebbert
@ 2007-01-23  0:23 ` Jesper Juhl
  2007-01-23 20:33   ` Chuck Ebbert
  2007-01-24  4:50   ` Daniel Barkalow
  0 siblings, 2 replies; 29+ messages in thread
From: Jesper Juhl @ 2007-01-23  0:23 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: Linux Kernel Mailing List, Chris Wright

On 22/01/07, Chuck Ebbert <cebbert@redhat.com> wrote:
> Is there going to be another 2.6.18-stable release?
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

Now that 2.6.19 is out, most likely not.  -stable releases are made
for the latest stable 2.6.x kernel, once 2.6.x+1 is out that's the one
-stable patches are made for (2.6.16 is an exception)..


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 29+ messages in thread

* 2.6.18-stable release plans?
@ 2007-01-22 22:13 Chuck Ebbert
  2007-01-23  0:23 ` Jesper Juhl
  0 siblings, 1 reply; 29+ messages in thread
From: Chuck Ebbert @ 2007-01-22 22:13 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Chris Wright

Is there going to be another 2.6.18-stable release?


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-02-02  8:18 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-24 13:30 2.6.18-stable release plans? Chris Rankin
2007-01-24 14:37 ` Hugh Dickins
     [not found] <BC3E207A-1A56-4032-9619-910E80281E9C@gmail.com>
2007-01-25  8:51 ` Chris Rankin
  -- strict thread matches above, loose matches on Subject: below --
2007-01-24 15:53 Chris Rankin
2007-01-24 16:12 ` Hugh Dickins
2007-01-24 17:33   ` Chris Rankin
2007-01-24 16:28 ` Mark Rustad
2007-01-24 22:37   ` Chris Rankin
2007-01-24 23:11     ` Alan
2007-01-24 23:05       ` Chris Rankin
2007-01-24 23:32       ` Mark Rustad
2007-01-24 23:45         ` Chris Rankin
2007-01-25  1:00           ` Ken Moffat
2007-01-25  9:16             ` Chris Rankin
2007-01-25 19:36               ` Ken Moffat
2007-01-26 13:02                 ` Chris Rankin
2007-01-25 23:26               ` Alistair John Strachan
2007-01-25  3:05           ` Mark Rustad
2007-01-25 21:04       ` Matt Mackall
2007-02-02  4:02     ` Valdis.Kletnieks
2007-02-02  6:47       ` Jon Masters
2007-02-02  8:17         ` Valdis.Kletnieks
2007-01-24 15:06 Chris Rankin
2007-01-24 15:40 ` Hugh Dickins
2007-01-22 22:13 Chuck Ebbert
2007-01-23  0:23 ` Jesper Juhl
2007-01-23 20:33   ` Chuck Ebbert
2007-01-23 20:56     ` Adrian Bunk
2007-01-24  4:50   ` Daniel Barkalow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).